Wikipedia Link Networks

2009

For a final project in a class on Networks, I worked with students, Devin Gaffney and Max Darham, on mapping and analyzing the topology of internal links to other articles within different small Wikipedias (less than 5000 articles).

We wrote a program in Ruby that parses Wikipedia pages, determines all relevant internal links, and then stores them. The data can then be printed in the format of Sage (Python) code, which can be pasted into the Sage environment. There, we are able to visualize and analyze the network, determining metrics such as the average clustering coefficient, the average degree, the number of unconnected nodes, or the largest cliques, etc. Also, we can look at the types of topics that have the highest degree, or that are the most linked to.

The Wikipedias we looked at included the Novial, Scottish, Anglo-Saxon, Emilian-Romagnol, and Kongo Wikipedias.

You can download our code here.

One of our graphs was included in a talk, Microplexes, by Anil Bawa-Cavia (Urbagram).

Eml_wikipedia_thumb Kg_wikipedia2_thumb Anglo-saxon_half_thumb Novial_half_thumb Scottish_half_thumb Picture_2_thumb Picture_1_thumb