Thursday

The tragedy of similarity in cognitive maps and cause maps

Last post I outlined that a relatively small sized map of 50 statements already requires at least 1250 individual comparisons. That is the minimum number required in order to make sure you have checked all statements for their similarity. This in turn is important for checking whether they can be merged into one statement or linked together. This, finally, is important for instance when you want to weave together cognitive maps and find where interviewees have common grounds within an issue.

To get an idea of the sizes and dimensions have a look at figure 1.



Figure 1 Statements and comparisons required




The bad news is that these numbers alone make finding similar statements impossible in a manual way. Computer support is required. But how should a computer know which statements are similar?

The straightforward approach is to look for the same words. If two statements contain the same words, they are equal. Easy.  But what if only some words are the same? Difficult. First of all you can attempt to normalise [1] the text within the statements. The basic idea is that people might talk about the same thing but do so using different words or wording.

Sometimes for instance we tell the same story but in a different time, i.e. the tense of the verb. Person A might have a particular situation of his in mind and tells the story in past tense how it happened to him. Person B however might have several situations in mind and speaks from a more general point of view, hence present tense. Both may talk about the same thing in their stories but the tense differences blur the similarity on a per word basis. For the purpose of comparison, cutting the “ed”s or “ing”s is one strategy to counteract that. The process of getting to the root forms of words is called stemming [2].

A different technique is to cut out certain words that do not add meaning to the overall statement (e.g. articles). “The dog barks” has the same meaning, for the purpose of comparison, as “dog barks”. This process is called stop word [3] filtering. Once your statements are normalised you might be lucky and have a few extra exact matches.

Unfortunately though, it is more likely that all you have accomplished is a reduction in noise. That means you have removed some differences that are negligible for the purpose of a comparison. But the comparison itself still needs to be done.

So you have to refer to natural language processing [4] (NLP - not to confuse with “neuro linguistic programming”) and hence are knee deep in “language science”. One similarity measure for instance is the Jaccard index [5] that gives you a similarity percentage value. It does so by comparing all words two statements have in common to all unique words of the two statements combined. You could even do this on a character level. Clearly, this suggests that computer support is tied to mapping best practice since you cannot feasibly conduct such analyses in a manual way.

However, there is one big caveat. Whether two statements can be linked with one another or even merged may lie entirely beyond linguistic characteristics. After all, Causal Cognitive Mapping is about causally linking statements. Consider:



Figure 2 Causality matters - not necessarily similarity

Figure 2 shows two statements linked with zero word overlap hence zero similarity. Now imagine statement 2 was a possible action of somebody else in another cognitive map in the opposite corner of your cause map and not linked to 1. A similarity analysis would not flag the two as connected.

To sum this post up: Statement similarity is insufficient as a means to find mergers and links. As a means to do so and given a relatively small map size finding similar statements with absolute certainty is impossible. Keep that in mind when weaving together individual cognitive maps into a cause map.

Since this and the last post were a bit grim and doom, next post I am going to share a practical example how a solid comparison may be accomplished.


(C) CC BY (https://creativecommons.org/licenses/by/4.0/), Jo. Richter, http://causal-cognitive-mapping.blogspot.de/2017/01/the-tragedy-of-similarity-in-cognitive.html

[1] https://en.wikipedia.org/wiki/Text_normalization
[2] https://en.wikipedia.org/wiki/Stemming
[3] https://en.wikipedia.org/wiki/Stop_words
[4] http://www-nlp.stanford.edu/IR-book/
[5] https://en.wikipedia.org/wiki/Jaccard_index

No comments:

Post a Comment

Use the select box below and choose "Anonymous" if you wish to comment as guest.