September 21st, 2011

Life and Literature Code Challenge Entry

Background

The idea for this entry came from the amazing Japanese earthquake map written by Paul Nicholls in which the Japanese earthquakes are shown in a timeline on a google map.

I thought it would be interesting to do something with the Biodiversity Heritage Library data for the Life and Literature Code Challenge. I figured that plotting the place of publication over time might show some interesting patterns.

 

Approach

First I had to get the place of publication out of the BHL data. The Google Geocode API did a fair job of this, although it did get a bit messed up with some of the older titles. I pre-processed the publication field with a few perl scripts to make life a little easier for the geocoder. I also ran up against the geocoder’s API limits, so I don’t have results for all the titles in the BHL.

Once I had some geocodes, I loaded the data into MySQL where I could run some queries to check how well the geocoding went. I then exported this to a large JSON object that gets loaded by the web page. Making use of a jQuery UI button and slider I was able to animate the slider programatically and draw circles on the google map.

You can see it in action here:

 

bhl.neish.net

There are no doubt plenty of geocoding errors still in there (i edited out the main ones that I could see), but the overall patterns are still evident. It slows down a bit on Internet Explorer, but the results are ok when I test on firefox, chrome and safari).

The results are quiet interesting and show how publications start off in Europe and then progress to the rest of the world during the 19th century. The late 19th and early 20th centuries are the boom time and the results taper off  during the 20th century (I guess as we enter the time period covered by copyright). Not many dots popping up over Australia – hopefully that will change as the BHL – Australia digitising gets underway.

 

Next…

This example makes use of only the bare minimum of data from the title table in the BHL (and I didn’t even use every record due to performance issues when too many dots were being created at once). It would be possible to get a lot more dates by linking up the item and title data and running the analysis off the items rather than the titles.

What would be really interesting is if we could link in the page and taxon name data and dynamically generate the data so that you could look at the publications for a particular taxon over time – it might be interesting to see where items about ants are being published or eucalypts for example.

This is a work in progress. Comments welcome.

 

 

August 6th, 2011

Building a mobile app backend using MongoDB and Slim – a PHP REST framework

I’ve been toying around in my spare time with HTML5 and building a geographically enabled web app  (possibly making it into a full blown mobile app down the track using PhoneGap or Appcelerator). Anyway, I started off with the back end.

I chose MongoDB as the data store (a nosql database with really simple out of the box geographic indexing). There are a few threads about mongo’s geohashing algorithms not coping at very fine scales, but for my purposes it has all the accuracy I need and the geohashing mechanism is really fast.

I needed a REST interface that my mobile app could use to retrieve nearest locations and to add new locations – both fairly simple requirements as the bulk of the computational work is elegantly handled by the backend database. All I needed was something to build the routes and add in my own validation – this is where Slim comes in.

Slim is a micro framework to build REST services and it does this one thing very well. It allows me to build routes for GET POST PUT and DELETE requests and hand them off to appropriate functions in my data model.  A minimal example:

<?php

           require 'Slim/Slim.php';
           require 'models/LocationStore.php';

           $app = new Slim();
           $ls  = new LocationStore();

           $app->get('/near/:lat,:lon', function ($lat, $lon) {
                header("Content-Type: application/json");
                echo json_encode($ls->getNear($lat, $lon));
           }); 

           $app->run();

?>

So a GET request to http://myserver/near/-37.8,143.2 would be routed to a function that queries my database for locations near the latitude and longitude passed in. I can also use some neat features built in to Slim to validate the passed in values against a regular expression.

There is more to it of course and a number of templating tools can be plugged in to make it into a more fully featured web framework.

Next up is the front end HTML5 code, but that’s a subject for another post.

Slim framework website: http://www.slimframework.com/

 

 

 

July 1st, 2011

jQuery and DB/Textworks

Below are my slides from a talk I gave to the Melbourne Inmagic User group on jQuery and DB/Textworks. Unfortunately all the web stuff is behind a firewall, so I can’t link to it, however, I’ll put the plugins I developed up on google code.

 

February 9th, 2011

Upgrading Debian 5 to 6

debian logoIt’s never much fun upgrading major versions of operating systems and I always get slightly uncomfortable during the process. Knowing you have a good backup is always handy (thanks linode). In this case I was going from Debian 5 to 6 and as usual, the Debian folks have made it smooth sailing.

This guide on the linode pages was very useful:

http://library.linode.com/troubleshooting/upgrade-to-debian-6-squeeze

All went smoothly when I upgraded my linode from Lenny to Squeeze except that mysql would not start.

This post http://www.robtucker.co.uk/2009/05/16/upgrading-mysql-50-to-51-on-debian-50 had the answer I needed.

In short, I had to comment out the ‘skip-bdb’ entry in /etc/mysql/my.cnf and then issue:

apt-get -f install mysql-server

February 5th, 2011

Creek near my house flooding in Melbourne

Sorry about the low light – it was pretty dark at the time.

December 4th, 2009

Hosting at Linode

I’ve started hosting all my personal sites at linode. I have root access to my own Virtual machine and I have installed Debian 5.0, lighttpd, mysql, and a bunch of Drupal and WordPress sites. I have found an incredible performance boost compared to shared hosting and it only costs slightly more. I’d highly recommend this for anyone with some knowledge of linux administration. The only downside is backups, but I solved that using backup-manager and creating an Amazon S3 account where the backups get stored (all very easy).

Check out the details at linode.com.

EDIT: Linode now has a backup solution that is simple and automatic – just what you want!

Tags:
September 18th, 2009

Attention span of a developer

I play around with a lot of technology and like installing and testing things, but I wonder if I am missing any great systems because of install fatigue – how long is too long to get a system installed, run through the configuration and get some test data in there and running? There is nothing like a good ‘quick start tutorial’ or a ‘build a blog in fine minutes’ to get you going.

A great example of this was when I recently installed geoserver.  Now this is not a simple piece of software, but I was able to follow the documentation to install and get the software up and running in about 20 minutes. If more developers would take the time to produce this kind of documentation with step by step instructions it would really help with adoption of your technology.

September 7th, 2009

nswsphere streamed

There’s a lot of talk about the carbon footprint of attending conferences these days and on Friday I attended my first virtual conference. NSWsphere provided a live stream of the conference. The job they did was excellent (numerous cameras, direct mic etc.) and I was able to watch easily without getting frustrated.

The second important factor was the live twitter stream. This allowed me to tap into some of the intangibles that you get from going to a conference – the important chit-chat on what everyone thought of the presentations. The advantage of twitter was that it was happening as the presenters were talking, so I didn’t have to wait until the session ended to get people’s views.

The last advantage was that I could tune in to just the presenters I was interested in. I simply printed out the agenda and switched over when they were on.

So despite being in another state, I was still able to get something out if this conference without traveling, without spewing out tonnes of carbon and while keeping up with most of my real job. Obviously face-to-face meetings with colleagues are better and I would choose that if I could, but especially for conferences where you might just have a peripherally interest and can’t justify the cost of attending in person, it might be worth giving this a try.