In this TED talk from earlier this year, Tim Berners Lee, inventor of the world wide web, outlines what he calls the ‘next web’, i.e. a web underpinned by structured machine-readable data, e.g. XML, JSON, CSV, with hackable, consistent URLs and well-described relationships.
That’s all a bit of mouthful, but what it means in essence is that the next big phase of web development will be about making data available to machines, whereas the first phase of the Internet development was more about making documents accessible to humans.
This has of course happened in many commercial enterprises where data and data management is a key component of their business. However, it hasn’t really happened to any great extent on the public Internet, and most obviously it hasn’t happened in Government (though as we’ll see there are some significant steps in this direction).
The transparency movement, as it has come to be known, believes that Government should make the data it collects from departments available in machine readable form to its citizens who can then take that information and analyse it using various tools to extract and present meaning, which they then use to lobby and change Government. The transparency movement believes that this makes Government more accountable, citizens more engaged and the whole system more efficient, scrutinised by what some have come to see as a perpetual, ‘crowdsourced’ audit.
Politically speaking, data transparency has ‘traditionally’ been the domain by the progressive left. However, recently the conservative right has come to see transparency as the means by which consumers can hold ‘big Government’ to account. In fact the Conservative party in the UK are so serious about this that the recently hired Tom Steinberg, the man behind MySociety, and a long standing proponent of transparent Government, to consult on the matter.
So whatever the make up of the next UK Government, what is clear is that data transparency will be high on the agenda. As it is on the agenda of the Obama government. And as I mentioned earlier on, there’s already been considerable work done on this, Tom Steinberg’s (him again) The Power of Information report makes the case for machine readable data. The Government hired Tim Berners-Lee earlier in the year to consult on data transparency. In addition data.gov.uk has recently launched – though the site is still behind a password, it brings together data from various Government departments in machine readable form in a similar manner to the US equivalent linked above.
O.K., so it seems clear that much of the UK Government’s data is going to be available in machine readable format in the next couple of years or so, which is great, but what use is that? Can I determine anything from Government data? And can I, like the transparency movement suggests, hold Government to account, by doing so?
This next section of this posts examines these questions in the context of a real world example take from recently released Home Office crime statistics.
Finding and interrogating the data
At the moment, as we’ve noted, most Government data is not available in machine readable form so that means you have to search around for it, usually in a bunch of PDFs (a non-machine readable format) tucked away on obscure Government web sites. Fortunately the nice people at the Guardian Data Store have been doing just this. In this case we’re going to use data on violent crime in London taken by the Guardian Datastore from official Home Office Figures.
Presenting the data
O.K. We’ve talked a lot about data so far, but what about the other side of the coin? How can non-programmers interrogate this data? What’s the best way to present findings to extract meaning? And what do you do with it when you’ve extracted meaning? How do you actually make those pretty visualizations?
There a wide range of data visualization techniques from the commonly used tables, histograms, pie charts and bar graphs to newer forms like scatterplots, spark lines and mind maps. Which you chose depends on your dataset, and many of the tools discussed below allow you to import data and then experiment with different types of visualization.
There are lots of different tools you can use to turn data into interesting visualisations – including Flash/Actionscript and Java libraries like Flare and its precursor Prefuse and the increasingly powerful Google Visualization API a tool for publishing and presenting data. However, as these require some coding chops, it means they’re not exactly ideal for the ‘average citizen’ looking to understand Government data. Fortunately, there are also a range of web-based tools which require a lot less technical expertise (though you do need to know your way around a spread sheet, and a basic understanding of statistics.) Perhaps the two best known of which are Many Eyes Wikified and Swivel
In this example I’m going to use Many Eyes Wikified (only to save time and because I know a little bit about it: at the moment I think it’s a bit flaky and limited in certain areas.)
Having found the machine readable source above, all you have to do is import that data into Many Eyes Wikified. As MEW is a wiki this is achieved by editing the page. Once the data has been imported, MEW allows us to view that data in an number of different types of visualisation. I haven’t got time to go into them here, but it’s easy and important to experiment, and as I’m sure you’re aware different types of visualization can reveal different trends. For the purpose of this exercise I’ve plumped for a relatively unsexy bar chart of the violent crime figures broken down by London borough.
Interpreting the data
When I looked at the bar chart of violent crime by borough over time something stood out – there is a big spike in violent crime in Lewisham over the reported period. It’s worth noting that the scales on the axis change on the Many Eyes Wikified charts – in my opinion, this is a serious deficiency which makes comparison difficult. However, despite this weakness in the MEW software a quick check of the data revealed that the trend was correct. Violent crime in Lewisham had risen from 22 to 32 incidences per thousand at a time when it had been dropping in most other boroughs.
So what’s going on here? Has the data actually revealed something? Has Lewisham become a notably more violent place in the last three years, or is to do with ‘better reporting’, etc.
Well, there’s only one way to find out. To ask Government. There are a number of ways to do this, however, to begin with I’m going to ask the Home Office who produced the figures.
This is a copy of the e-mail I just sent to the Home Office.
I’ve been looking at your own violent crime figures for London over the last few years, as detailed in this spreadsheet.
And while I’ve noticed that the overall trend is down, I’ve also noticed that there has been a marked rise in violent incidents in Lewisham from 22 to 32 per 1000 (a roughly 40% rise over four years). I wonder if you could explain this trend? And what if any measures you are taking to mitigate it.
Thanks for your time,
I’ll keep you updated with my progress.
So, as you can see, it’s not exactly an easy task finding, interrogating and understanding the data, let alone holding Government to account, and it seems unlikely to me that we’ll see armies of ‘citizens’ sitting hunched over copies Excel, analysing data late into the night. I think it’s more likely that this is a function more likely to fall to researchers, journalists and academics, many of whom will have different agendas to the ‘average’ citizen. (In an information-based society knowledge really is power.)
There are also questions of data governance – can we trust the data Government passes us, or perhaps more importantly the data that individual organs of the state like the police or health service pass onto central Government. Who audits the auditors?
And of course Government deals with many issues that aren’t better understood by analysising data; Hans Rosling, a great champion of data transparency, makes this point in the context ofhuman rights data.
Indeed it has been argued by several people, including former transparency advocate Lawrence Lessig, that this constant scrutiny could well serve to undermine legitimacy in Government – constant fault finding leads to a perception of ineffectiveness, as well as a culture of fear in which it is impossible to take difficult, but ultimately good decisions. It seems to me this prefigured by the recent MP expenses debacle in the UK, and though this was a self-inflicted wound, it doesn’t take too much imagination to see repeats of the same exercise with newspapers and the vested interests behind them attacking expenditure on Europe or rail or whateever they take issue with.
On balance, I still believe that data transparency is a good thing, however, we should be aware of who is pushing these agendas and why. In addition I’d argue that it’s not enough for Government to just dump raw data out, it needs to help with the understanding of this data, perhaps providing better, more accessible tools, a place where data can be discussed and a mechanism that allows discoveries can be acted upon.
Because while the release of raw data may well lead to more transparent Government, there’s no guarantee that it will lead to a better Government.
(Finally, I’ll post updates about violent crime in Lewisham when I get them.)