2 Degrees of Academic Separation using Google Scholar v1

Another post, another neat force-directed graph. This one illustrates the interconnections between professors and students who have been co-authors on some of my papers and presentations, as scrapped from Google Scholar citations.  It could be described as the first version of a rough illustration of my 2 degrees of separation in academia.


The dark orange circle in the center is myself, light blue circles are papers/presentations, light orange circles are co-authors, and dark-blue circles are co-authors of my co-authors (i.e., have not necessarily directly worked with me on a project).

Unfortunately, as of today, not all of my co-authors have Google Scholar pages, so there are a number of co-authors whose connections and branches are under-represented.  In addition, Google Scholar does not necessarily accumulate all of a given author’s papers/presentations and often makes mistakes misattributing papers to profiles.  So, the veracity of the information represented here should be taken with a grain of salt unless I find a better service for generating these networks.

For some more information on how this was created, click-through to the post.

As with the VSS DNA graph I made before the Visual Sciences Society Annual Meeting this past May, I used Python, NetworkX, and D3.js.  In addition, I took advantage of another Python module, GoogleScholar, to screen-scrape information from the Google Scholar profiles.

Starting with my Google Scholar citation profile, I loop through the individual entries and extract the titles and co-authors of each entry.  The names and titles are connected as nodes using NetworkX.  I then had a list of co-authors:

To create the connections, I search for the co-authors names on Google Scholar (the profiles that were used are linked above) and do the same thing, extracting the titles and (co-?)co-authors names.  This allowed me to produce a network diagram illustrating individuals who have been my co-authors, along with co-authors of those co-authors.  Many of my co-authors did not have profiles when I generated this first version and there were a few with technical problems (e.g., one profile was populated with a large number of papers from another individual with the same name as my co-author, but a different person, and pruning these problematic entries would have been labor intensive).  Still, it is a neat illustration worth sharing.

I am not currently including the code on this page because it is quite messy and “non-pythonic”, but I’m happy to share it if there is interest.  In addition, since this image was produced with D3.js, there is an interactive version of the graph available. I chose not to include it because it can be quite computationally taxing with the large number of nodes and connections and therefore not the best for directly including on the blog.

UPDATE June 20, 2014: I removed the co-author labels from the lead image because I don’t want to give the false impression that specific co-authors are better connected than others.  Since this visualization is dependent on a 3rd party scraping service, it is problematic to draw any conclusions about “connectedness” from this representation.

2 thoughts to “2 Degrees of Academic Separation using Google Scholar v1”

  1. Hi, Steve! How have you been? Nice post! It’s interesting that I just thought about starting to learn D3 yesterday. Do you have some good resources you can recommend?

    1. I usually jump from resource to resource and like to use examples to fuel my understanding, so unfortunately all I can suggest is to look at the documentation provided and find some examples that might interest and guide you. For example, mbostock has a great set of examples.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.