Wikipedia Link Networks
2009
For a final project in a class on Networks, I worked with students, Devin Gaffney and Max Darham, on mapping and analyzing the topology of internal links to other articles within different small Wikipedias (less than 5000 articles).
We wrote a program in Ruby that parses Wikipedia pages, determines all relevant internal links, and then stores them. The data can then be printed in the format of Sage (Python) code, which can be pasted into the Sage environment. There, we are able to visualize and analyze the network, determining metrics such as the average clustering coefficient, the average degree, the number of unconnected nodes, or the largest cliques, etc. Also, we can look at the types of topics that have the highest degree, or that are the most linked to.
The Wikipedias we looked at included the Novial, Scottish, Anglo-Saxon, Emilian-Romagnol, and Kongo Wikipedias.
You can download our code here.
One of our graphs was included in a talk, Microplexes, by Anil Bawa-Cavia (Urbagram).