Background
The idea for this entry came from the amazing Japanese earthquake map written by Paul Nicholls in which the Japanese earthquakes are shown in a timeline on a google map.
I thought it would be interesting to do something with the Biodiversity Heritage Library data for the Life and Literature Code Challenge. I figured that plotting the place of publication over time might show some interesting patterns.
Approach
First I had to get the place of publication out of the BHL data. The Google Geocode API did a fair job of this, although it did get a bit messed up with some of the older titles. I pre-processed the publication field with a few perl scripts to make life a little easier for the geocoder. I also ran up against the geocoder’s API limits, so I don’t have results for all the titles in the BHL.
Once I had some geocodes, I loaded the data into MySQL where I could run some queries to check how well the geocoding went. I then exported this to a large JSON object that gets loaded by the web page. Making use of a jQuery UI button and slider I was able to animate the slider programatically and draw circles on the google map.
You can see it in action here:
http://peterneish.github.io/bhl-mapper/
There are no doubt plenty of geocoding errors still in there (i edited out the main ones that I could see), but the overall patterns are still evident. It slows down a bit on Internet Explorer, but the results are ok when I test on firefox, chrome and safari).
The results are quiet interesting and show how publications start off in Europe and then progress to the rest of the world during the 19th century. The late 19th and early 20th centuries are the boom time and the results taper off  during the 20th century (I guess as we enter the time period covered by copyright). Not many dots popping up over Australia – hopefully that will change as the BHL – Australia digitising gets underway.
Next…
This example makes use of only the bare minimum of data from the title table in the BHL (and I didn’t even use every record due to performance issues when too many dots were being created at once). It would be possible to get a lot more dates by linking up the item and title data and running the analysis off the items rather than the titles.
What would be really interesting is if we could link in the page and taxon name data and dynamically generate the data so that you could look at the publications for a particular taxon over time – it might be interesting to see where items about ants are being published or eucalypts for example.
This is a work in progress. Comments welcome.
EDIT: Code is now available on github.
Leave a Reply