“London calling to the faraway towns
Now war is declared – and battle come down
London calling to the underworld
Come out of the cupboard, you boys and girls”
London was the centre of gravity for the global data Web on Wednesday 24th February 2010. Such was the sense of anticipation in the air at the ‘Linked Data Meetup 2’, it almost felt like we were waiting for the start of rock concert instead of a series of presentations, especially given the event was being held in the concert hall of the University of London Student’s Union, the same place I’ve seen one or two bands over the years. Over two hundred people had signed up to attend, and many had to be turned away.
Tom Heath from Talis was first up, and although he didn’t come on stage with a guitar or start singing, he did do a great job of meeting up to the moment, as did all the speakers on this really inspiring day. Tom was talking about the implications of Linked Data for application development, and he focussed on what he sees as the need for computing to move on from the all-pervasive metaphor of the document, if the true possibilities of the Web are to be realised. We tend to see the Web as a library of documents, but it has the power to be so much more. He thinks we should see the Web as something more like an ‘exploratorium’ of real things in the world. These things should all be given http URIs, and be connected by typed links. The power this brings is that applications are then able to grasp the meaning of what things are, and how they relate to each other. Applications can probe this global Web of Data, performing calculations, identifying correlations, and making decisions. The machine processing power we can apply behind this idea supercharges the Web way beyond what is possible currently with a library of documents. Yes, of course we are already doing this to an extent with data in conventional databases, but the Linked Data Web gives us the possibility to connect all data together and wrap it with meaning.
Tom gave some examples of what becomes possible, one of which he based around the concept of pebbledash. Some things we might associate with pebbledash include getting it repaired, weatherproofing, its cost, its removal, is colour, or whatever. These concepts essentially exist in isolation on the library document Web as it is now, and we as humans have to explore by hand to make the connections ourselves. This is a time consuming, messy and error prone process. If the concepts existed in Linked Data form, the Web Exploratorium can bring these concepts together for us, promising faster, more complete and more accurate calculations. This means we can make better evaluations and more informed decisions. Tom left us with a provocative challenge – “if you don’t feel constrained by the document, then you’re not thinking hard enough”.
I caught up with Tom after his talk for a short interview:
Tom Scott from the BBC gave us a look at their Linked Data based wildlife finder. You can browse the BBC archives by meaning and concepts to build your own stories, and create journeys through the things that you care about. Wildlife programs tend to be based around particular animal behaviours, and they’ve been able to build pages around these concepts. All the resources have URLs, so if want to find out about the sounds that lions make, there’s an address for it. Some of the data is pulled in from other parts of the BBC, but much of it comes from outside sources, especially DBPedia. If the data needs editing, this is done on Wikipedia so the whole community benefits. I think Tom glossed over the problem of what happens when data gets vandalised, where basically, they just go back and correct it. There are real issues for Linked Data here. When someone goes to Wikipedia directly, they can make a judgement call on whether to trust the data. When it’s surfaced on the BBC website, it’s not clear that the data has been pulled in from external sources, but people will place trust in it because it’s on the website of a trusted organisation.
John Sheridan and Jeni Tennison representing data.gov.uk gave a focussed and passionate view on ‘How the Web of Data Will Be Won’. John believes the possibilities that come from making governmental data available as Linked Data are extraordinary, and you could tell he meant it. In answer to ‘Why Linked Data?’, the answer is simple, it’s the most Web-centric approach to using data. It’s “data you can click on”. It’s also by its very nature, distributed. This is really important, because one of the disadvantages of a centralised approach is that it takes a lot of planning, and this really slows things down. The Linked Data distributed model of data publishing, where others do the combining makes the whole thing much quicker and easier.
Jeni emphasised that we’ve all got to be brutally practical to get things done. Publishing research papers is all well and good, but it’s time to get on with it and find ways to get the data out there. Doing stuff is what matters. For her, much of this is about working out successful and simple patterns for publishing data in RDF that others can use. This is the pragmatic way to get as much Linked Data on the Web as quickly and as cheaply as possible. It’s about laying some tracks down so that others can follow with confidence. The patterns cover everything from a common way to structure consistent http URIs, to policy patterns that ensure persistence, and patterns for workflows and version control. This all sounds spot on to me. We’ve seen many examples of poor practice with URIs, so it’s really encouraging to hear that there’s work going on to get things right from the start. This gives Linked Data a good chance of staying the course. The government have lots of useful stats on all kinds of things, but patterns are needed to get this data out of its Excel based bunker. I sensed many in the room were reassured to hear there’s work going on to provide APIs so that developers won’t necessarily have to work directly with the RDF and SPARQL endpoints. The fact that Jeni’s dev8d session on this topic was so packed I couldn’t even get near the door, never mind get in, indicates there’s a real need and demand for these APIs.
Lin Clark from DERI Galway gave us a quick look at the Drupal content management system. Drupal is used to power data.gov.uk and comes with RDF pretty much out of the box. I got a better look in Lin’s dev8d Drupal 7 workshop. I have to say it does look a doddle to install, and mapping the content to RDF for the SPARQL endpoint seems to be a cinch too.
Silver Oliver took us back to the document metaphor with a talk on journalism and Linked Data. Not unsurprisingly, journalism is especially attached to the document metaphor, with newspapers treating the Web as just another distribution channel. Martin Belam from The Guardian was in the audience and responded, indicating that they are looking at trying to do things better. Silver said a challenge the BBC have is to get the news and sports sections to become more like the Wildlife Finder, but the information models aren’t yet in place yet. There has been some progress with sport events and actors, but it appears that news vocabularies are complex.
Georgi Kobilarov gave us a quick preview of uberblic.org, a single point of access to a Web of integrated data. I have become somewhat unconvinced by attempts to provide single points of access, mainly because of the largely failed notion of portals, and also the fact that lots of people seem to want to provide them, so they’re not very ‘single’. I think in this case though, it’s more that Georgi is demonstrating the sort of thing that can be done, and isn’t trying to claim territory. Uberlic.org takes data sources including Wikipedia, Geonames, and Musicbrainz, and provides a single API for them.
The meet-up concluded with a really engaging panel session hosted by Paul Miller. On the panel were Ian Davis from Talis, Jeni Tenison, Tom Scott and Timo Hannay from Nature Publishing Group. There was some discussion over who has responsibility for minting URIs. It might be clear when it’s government departments minting URIs for schools, but what about when it’s less obvious? Who do we trust? Do we trust DBPedia URIs for example? A judgement has to be made. The risk of having data monopolies such as Wikipedia was raised. It’s in the nature of the Web for these to emerge, but can we rely on these long-term, and what happens if they disappear?
There was quite an interesting discussion on the tension between the usability and the persistence of http URIs. Musicbrainz provides opaque URIs that work well for the BBC, and these are re-used in their URIs, but Wikipedia URIs are problematic. They may be human readable, but they change when the title of a page is changed. Tom Scott said he would “trade readability for persistence any day”.
Ian Davis talked about some of the technical barriers when building Linked Data applications. The approach is very different to building applications on known databases that are under your control. Programming across the whole Web is a new thing, and it’s the challenge for our generation of developers. Timo Hannay ended the meeting on a fittingly profound note, referring to Chris Anderson’s ‘End of Theory’. He feels that embracing Linked Data is essential, and to not do so will mean that science “simply cannot progress”.
Linked Data may not quite have declared war, but many think it’s time for our data to come out of the cupboard and rise from the underworld.