Information and data are seen as the new oil of the digital economy, but without the necessary tools and related infrastructure that oil is going to stay in the ground. The recent cuts to the National Library of Australia will have a significant affect on one of these pieces of national research infrastructure – Trove.
As well as being the gateway to the collections of the National Library, Trove aggregates content from other Australian libraries and cultural institutions. By providing a single point of access to the information Trove can make the best use of the nation’s intellectual capital – researchers can be more efficient and duplication is reduced. This is a smart way to do things.
In a previous role I was involved in similar national project that aimed to bring together data on Australia’s biodiversity – the Atlas of Living Australia (ALA). The ALA combined data from museums, herbaria, citizen scientists and other sources in a single online resource. In contrast to Trove (which was funded from internal resources), the ALA was funded through the National Collaborative Research Infrastructure Strategy (NCRIS). By providing a single resource for all Australian biodiversity information the ALA was seen as an important piece of research infrastructure. And while NCRIS has not been without controversy over funding, the fact that the government made a long-term commitment to support a national biodiversity platform has lead to the success of the ALA.
In addition to helping people find information, both Trove and the ALA provide platforms that encourage new ways of interacting with the data. The government needs to properly fund digital infrastructure across all disciplines wherever it exists so that Australia’s talented researchers can get on with their work.
We had some good comments from the judges and were keen to expand the functionality of the application, however we soon ran up against some limitations in the GBIF API. The main issue was not being able to generate a species list for a user defined geographic area (we could get a whole country at a time, but that didn’t really help this use case). This is something that has been raised before, so we’ll await developments.
The main changes to our updated entry have been at the backend. The application was almost completely re-written using backbone.js as the data framework. This should allow any future functionality to be easily integrated. Users can now see all sounds for an area (before we had to limit it) and they can add or remove species easily. We’ve also added better filtering for season and taxon type (we can add environmental and time of day filters when we get the data).
The six finalists can be viewed on the Challenge Round 2 website. All the projects look really interesting and are quite diverse in the problems they are tackling.
I’ve been working with Ben Raymond of Grevillea colour distribution fame on an entry to the GBIF Ebbe Nielsen Challenge, which aims to inspire scientists, informaticians, data modelers, cartographers and other experts to create innovative applications of open-access biodiversity data.
You can see a sneak peak at our entry at peterneish.github.io/gbif-soundscape/. Its a system that pulls together sound recordings of animals to build up a soundscape from various localities.
Here’s another visualisation of some data from the 2013 Australian Federal Election.
I wanted to see how consistent voting patterns were across booths within an electorate. From handing out how to vote cards at my local polling booth I had the feeling that not every booth is the same.
Fortunately the Australian Electoral Commission publishes a live feed of all data by booth. Its the same data that the news outlets use, so its pretty good. It follows the Election Markup Language standard, so extracting the data was not too hard. The AEC had added in their own elements, but they used namespaces which made it fairly simple to process using a fairly basic perl script.
Data is based on first preferences only. Because of the way heatmaps work, if booths are closer together the intensity increases, so to some extent the heat map is determined by the layout of the booths. However, distinct patterns are discernible if you explore the data. Only parties that registered votes in at least 200 booths have been included.
Absentee voting at capital cities booths are mapped at the booth rather than in the electorate that the vote was for, so capital cities tend to show high results for every party.
Using a PERL script I was able to group the ticket preferences into groups and then create a matrix in a format suitable for input to a d3js chord diagram.
Getting useful data is always a compromise. Some parties have multiple tickets (ie they split their preference flows in two or more tickets). Independents do not have parties, but work in a group. I grouped independents by using the highest preferenced candidate for that ticket. The coalition have more than one party, so I had to combine these under coalition.
To measure how highly a party was preferenced, I took the average position on the ballot for each member of the party and average these if there were more than one ticket. For some parties with split tickets this meant that they might end up without preferencing any party particularly highly.
Next was how to visualise the data. A chord diagram seemed the natural way to show how parties preference each other. The problem is that there is too much data to show every preference allocation (by definition every party preferences every other candidate). So I needed to draw the line on how much data to show. I arbitrarily decided that any party that averaged a position in the top 25% of the ballot order was highly preferenced by the other party.
The visualisation shows some interesting things. Whether there is a symmetrical or asymmetrical relationship between parties can easily be seen. Als,o the wider the party is around the circle, the more other parties preference it. The ALP and coalition have fairly narrow widths while the bullet train party and family first are relatively wide.
Of course preferences are a lot more complex than shown here. The order of preferences and how they flow once quotas are allocated can have a subtle and profound affect on the election outcome. If you are considering voting below the line Antony Green has some good advice.
At the moment it is fairly basic and is not entirely intuitive (no back-button or bookmarking support, inconsistent actions), but these things are fairly easy to add. D3js takes a while to get your head around, but the combination of a well thought out API from the Atlas of Living Australia combined with d3js and twitter bootstrap has allowed me to produce this application relatively easily.
A quick note on installing Linux Mint on an Acer eMachines netbook (em350) for anyone going down the same path.
This netbook is a few years old and the installed OS Â (Windows 7 starter) had slowed almost to a crawl. After some painful session trying to work out what was causing excessive load under Windows I decided that installing Linux was the best option.
The Horizon LMS from SirsiDynix provides a utility to export marc records from the system. This is quite robust, except if your bib records are quite large and have a lot of subject tems. The following VBscript allows you to export smaller batches of marc records and skip over known bad records.
Yesterday I spent a really interesting morning at Museum Victoria attending and presenting at a workshop on Linked Open Data in Libraries, Archives and Musuems (LOD-LAM) organised by Culture Victoria. As usual with these things, there were varying levels of technical and background knowledge on the topic. However, I think the level of the presentations was spot on and judging by the number of questions and discussions happening there was a lot of interest in this topic.
Mia Ridge (@mia_out) gave an excellent introduction to Linked Data and how it is used in libraries, galleries and museums. She’s put together a wiki with links to all sorts of useful stuff.
Below are the slides from my talk on activities at the Victorian Parliamentary Library (astute viewer will notice they are mostly lifted straight out of my VALA talk).