Thursday

Dealing with the uncertainty of similarity in cognitive maps and cause maps

In earlier posts [1][2] I outlined how difficult it is finding similar statements within a cause map that possibly can be merged and/or linked together. Such mergers or links indicate potential common grounds among interviewees.

In this post I will be sharing my approach as employed in my first medium sized map.  I was dealing with 7 cognitive maps weaved together into a cause map that eventually amounted to about 350 statements and 450 links. That makes 60 000 statement to compare in order to make sure that I have checked every combination for similarity. Not even close to manageable. So I went with computer-supported strategies.




Simple listing

The most basic comparison you can do is looking at similar words as done in figure 1.



Figure 1 Statement comparison by similar words

One of the most important steps here is to filter out stop words. That means you do not want to flag statements that share words such as "a", "the", "he" or "I". These words usually will not say anything significant about overall statement similarity. A major drawback can be the amount of results: e.g. over 700 in figure 1. While I found a simple listing style with colour highlighting most efficient the masses of statement pairs make the comparing method ineffective. Moreover, there are quite a few false positives. That is to be expected though since the computer just looks blindly for individual word matches.

Jaccard index

This blindness can be counteracted by looking at every word of each statement. To do so, I cleaned the statements (normalising as mentioned in [2]) and calculated the Jaccard index. This is essentially a measure for how many words two statements have in common and that is provided in a range of 0 to 100%. I picked a percentage value that gave me a manageable number of statements I was able to feasibly compare manually. That means I asked the computer to show me all statement pairs that are x% similar. Figure 2 gives you an example.


Figure 2 Jaccard index example

The table in figure 2 is filtered for stop words and does not show pairs that are already linked. As previously, there are false positives again. But that is just bound to happen sometimes. When you have a listing for instance, such as "we relate the association of x, y and z", and you need to separate it into three statements, you have to repeat "relate the association" two additional times. You necessarily create false positives.
Nonetheless, I have found the Jaccard method for comparing most efficient. It consistently gives a manageable amount of results while giving you the flexibility to switch through similarity values. So if you end up with too many results, up the percentage.

Keyword density

In another strategy I followed a theme based approach. I took the entire map and calculated the keyword "density". This is straightforward and involves ranking all words by number of occurrences. I then took each keyword and listed all statements that contain the same keyword (figure 3).


Figure 3: Statement comparisons by keyword ranks

Almost half of the results fell into the last rank where only two statements shared the same keyword. If the number of results is too high, discarding the lower ranks seems to be a good trade-off. A possible drawback can be high counts of keywords. If your keyword "x" occurs 15 times in total in different statements, you will get a list of 15 statements (figure 3 bottom). Even a 15 statement list may already be quite tedious to compare. So you would probably prefer to compare each statement to all other matches. But 15 statements already require 105 comparisons (figure 3 top; see also[1]). Not ideal either.
Despite all the drawbacks, ultimately, I was quite satisfied with my results. I was able to find about 50 new links additional to the 400 existing ones.

Not the end of the line

Be aware though, as pointed out in [2], that the kinds of calculations described above are done in natural language processing, i.e. "language science". There are people that do this for a living. My approach should be considered crude - yet was sufficiently effective for my needs. Additionally, one of the most important mergers I found was by means of recalling from memory and entirely unrelated to consciously comparing. Remember what I have said in [2]. Similarity is one way to look for statements that can be linked or merged. I doubt that there is a method that gets you around familiarising yourself with your maps. I often even just browse randomly over the map and see what I can see.

In summary

To sum this post up, in a pragmatic way, in my next mapping project I would start with the Jaccard index. I would list all statements that are 100% similar and incrementally decrease the value.

I then would turn to a keyword analysis and look at the top ranks. I would slightly focus on statement numbers that are far apart (134,254) rather than being direct neighbours (92, 93, 94). This way I know that I am more likely dealing with cross links, i.e. two different people's original cognitive maps.

In a last step I would use the simple word similarity listing as a screening tool. At that point I would expect to be decently familiar with my map. So while browsing (or skimming) through the listing I might pick up a couple of comparisons that slipped through previously.

Finally, before, during and after my next mapping project I would keep searching for other similarity measures and maybe play around with some more elaborate approaches (see e.g. [3]).

__
As a disclaimer: When I wrote this post I did have a practical use case in mind. If Causal Cognitive Mapping is a method in, say, your PhD dissertation, you probably will be better off working through all your statements in a manual comparison.

(C) CC BY (https://creativecommons.org/licenses/by/4.0/), Jo. Richter, http://causal-cognitive-mapping.blogspot.de/2017/01/dealing-with-uncertainty-of-similarity.html

[1] http://causal-cognitive-mapping.blogspot.com/2017/01/surprising-finding-similarity-in.html 
[2] http://causal-cognitive-mapping.blogspot.de/2017/01/the-tragedy-of-similarity-in-cognitive.html
[3] http://www-nlp.stanford.edu/IR-book/

No comments:

Post a Comment

Use the select box below and choose "Anonymous" if you wish to comment as guest.