I'm working on a research project for school. I've written some text mining software that analyzes legal texts in a collection and spits out a score that indicates how similar they are. I ran the program to compare each text with every other text, and I have data like this (although with many more points):
codeofhammurabi.txt crete.txt 0.570737
codeofhammurabi.txt iraqi.txt 1.13475
codeofhammurabi.txt magnacarta.txt 0.945746
codeofhammurabi.txt us.txt 1.25546
crete.txt iraqi.txt 0.329545
crete.txt magnacarta.txt 0.589786
crete.txt us.txt 0.491903
iraqi.txt magnacarta.txt 0.834488
iraqi.txt us.txt 1.37718
magnacarta.txt us.txt 1.09582
Now I need to plot them on a graph. I can easily invert the scores so that a small value now indicates texts that are similar and a large value indicates texts that are dissimilar: the value can be the distance between points on a graph representing the texts.
codeofhammurabi.txt crete.txt 1.75212
codeofhammurabi.txt iraqi.txt 0.8812
codeofhammurabi.txt magnacarta.txt 1.0573
codeofhammurabi.txt us.txt 0.7965
crete.txt iraqi.txt 3.0344
crete.txt magnacarta.txt 1.6955
crete.txt us.txt 2.0329
iraqi.txt magnacarta.txt 1.1983
iraqi.txt us.txt 0.7261
magnacarta.txt us.txt 0.9125
SHORT VERSION: Those values directly above are distances between points on a scatter plot (1.75212 is the distance between the codeofhammurabi point and the crete point). I can imagine a big system of equations with circles representing the distances between points. What's the best way to make this graph? I have MATLAB, R, Excel, and access to pretty much any software I might need.
If you can even point me in a direction, I'll be infinitely grateful.
If you want circles representing the distances between points, this would work in R (I used the first table in your example):