Archive for February, 2010

London Calling

Friday, February 26th, 2010 by Adrian Stevenson

“London calling to the faraway towns
Now war is declared – and battle come down
London calling to the underworld
Come out of the cupboard, you boys and girls”

‘London Calling’ by The Clash [rdf]. Lyrics by Joe Strummer [rdf], 1979

London was the centre of gravity for the global data Web on Wednesday 24th February 2010. Such was the sense of anticipation in the air at the ‘Linked Data Meetup 2’, it almost felt like we were waiting for the start of rock concert instead of a series of presentations, especially given the event was being held in the concert hall of the University of London Student’s Union, the same place I’ve seen one or two bands over the years. Over two hundred people had signed up to attend, and many had to be turned away.

Tom Heath from Talis was first up, and although he didn’t come on stage with a guitar or start singing, he did do a great job of meeting up to the moment, as did all the speakers on this really inspiring day. Tom was talking about the implications of Linked Data for application development, and he focussed on what he sees as the need for computing to move on from the all-pervasive metaphor of the document, if the true possibilities of the Web are to be realised. We tend to see the Web as a library of documents, but it has the power to be so much more. He thinks we should see the Web as something more like an ‘exploratorium’ of real things in the world. These things should all be given http URIs, and be connected by typed links. The power this brings is that applications are then able to grasp the meaning of what things are, and how they relate to each other. Applications can probe this global Web of Data, performing calculations, identifying correlations, and making decisions. The machine processing power we can apply behind this idea supercharges the Web way beyond what is possible currently with a library of documents. Yes, of course we are already doing this to an extent with data in conventional databases, but the Linked Data Web gives us the possibility to connect all data together and wrap it with meaning.

Tom gave some examples of what becomes possible, one of which he based around the concept of pebbledash. Some things we might associate with pebbledash include getting it repaired, weatherproofing, its cost, its removal, is colour, or whatever. These concepts essentially exist in isolation on the library document Web as it is now, and we as humans have to explore by hand to make the connections ourselves. This is a time consuming, messy and error prone process. If the concepts existed in Linked Data form, the Web Exploratorium can bring these concepts together for us, promising faster, more complete and more accurate calculations. This means we can make better evaluations and more informed decisions. Tom left us with a provocative challenge – “if you don’t feel constrained by the document, then you’re not thinking hard enough”.

I caught up with Tom after his talk for a short interview:

Tom Scott from the BBC gave us a look at their Linked Data based wildlife finder. You can browse the BBC archives by meaning and concepts to build your own stories, and create journeys through the things that you care about. Wildlife programs tend to be based around particular animal behaviours, and they’ve been able to build pages around these concepts. All the resources have URLs, so if want to find out about the sounds that lions make, there’s an address for it. Some of the data is pulled in from other parts of the BBC, but much of it comes from outside sources, especially DBPedia. If the data needs editing, this is done on Wikipedia so the whole community benefits. I think Tom glossed over the problem of what happens when data gets vandalised, where basically, they just go back and correct it. There are real issues for Linked Data here. When someone goes to Wikipedia directly, they can make a judgement call on whether to trust the data. When it’s surfaced on the BBC website, it’s not clear that the data has been pulled in from external sources, but people will place trust in it because it’s on the website of a trusted organisation.

John Sheridan and Jeni Tennison representing data.gov.uk gave a focussed and passionate view on ‘How the Web of Data Will Be Won’. John believes the possibilities that come from making governmental data available as Linked Data are extraordinary, and you could tell he meant it. In answer to ‘Why Linked Data?’, the answer is simple, it’s the most Web-centric approach to using data. It’s “data you can click on”. It’s also by its very nature, distributed. This is really important, because one of the disadvantages of a centralised approach is that it takes a lot of planning, and this really slows things down. The Linked Data distributed model of data publishing, where others do the combining makes the whole thing much quicker and easier.

Jeni emphasised that we’ve all got to be brutally practical to get things done. Publishing research papers is all well and good, but it’s time to get on with it and find ways to get the data out there. Doing stuff is what matters. For her, much of this is about working out successful and simple patterns for publishing data in RDF that others can use. This is the pragmatic way to get as much Linked Data on the Web as quickly and as cheaply as possible. It’s about laying some tracks down so that others can follow with confidence.  The patterns cover everything from a common way to structure consistent http URIs, to policy patterns that ensure persistence, and patterns for workflows and version control. This all sounds spot on to me. We’ve seen many examples of poor practice with URIs, so it’s really encouraging to hear that there’s work going on to get things right from the start. This gives Linked Data a good chance of staying the course. The government have lots of useful stats on all kinds of things, but patterns are needed to get this data out of its Excel based bunker. I sensed many in the room were reassured to hear there’s work going on to provide APIs so that developers won’t necessarily have to work directly with the RDF and SPARQL endpoints. The fact that Jeni’s dev8d session on this topic was so packed I couldn’t even get near the door, never mind get in, indicates there’s a real need and demand for these APIs.

Lin Clark from DERI Galway gave us a quick look at the Drupal content management system. Drupal is used to power data.gov.uk and comes with RDF pretty much out of the box. I got a better look in Lin’s dev8d Drupal 7 workshop. I have to say it does look a doddle to install, and mapping the content to RDF for the SPARQL endpoint seems to be a cinch too.

Silver Oliver took us back to the document metaphor with  a talk on journalism and Linked Data. Not unsurprisingly, journalism is especially attached to the document metaphor, with newspapers treating the Web as just another distribution channel. Martin Belam from The Guardian was in the audience and responded, indicating that they are looking at trying to do things better. Silver said a challenge the BBC have is to get the news and sports sections to become more like the Wildlife Finder, but the information models aren’t yet in place yet. There has been some progress with sport events and actors, but it appears that news vocabularies are complex.

Georgi Kobilarov gave us a quick preview of uberblic.org, a single point of access to a Web of integrated data. I have become somewhat unconvinced by attempts to provide single points of access, mainly because of the largely failed notion of portals, and also the fact that lots of people seem to want to provide them, so they’re not very ‘single’. I think in this case though, it’s more that Georgi is demonstrating the sort of thing that can be done, and isn’t trying to claim territory. Uberlic.org takes data sources including Wikipedia, Geonames, and Musicbrainz, and provides a single API for them.

The meet-up concluded with a really engaging panel session hosted by Paul Miller. On the panel were Ian Davis from Talis,  Jeni Tenison, Tom Scott and Timo Hannay from Nature Publishing Group. There was some discussion over who has responsibility for minting URIs. It might be clear when it’s government departments minting URIs for schools, but what about when it’s less obvious?  Who do we trust? Do we trust DBPedia URIs for example? A judgement has to be made. The risk of having data monopolies such as Wikipedia was raised. It’s in the nature of the Web for these to emerge, but can we rely on these long-term, and what happens if they disappear?

There was quite an interesting discussion on the tension between the usability and the persistence of http URIs. Musicbrainz provides opaque URIs that work well for the BBC, and these are re-used in their URIs, but Wikipedia URIs are problematic. They may be human readable, but they change when the title of a page is changed. Tom Scott said he would “trade readability for persistence any day”.

Ian Davis talked about some of the technical barriers when building Linked Data applications. The approach is very different to building applications on known databases that are under your control. Programming across the whole Web is a new thing, and it’s the challenge for our generation of developers. Timo Hannay ended the meeting on a fittingly profound note, referring to Chris Anderson’s ‘End of Theory’. He feels that embracing Linked Data is essential, and to not do so will mean that science “simply cannot progress”.

Linked Data may not quite have declared war, but many think it’s time for our data to come out of the cupboard and rise from the underworld.

YouTube Preview Image

The Case for Manchester Open Data City

Monday, February 8th, 2010 by Adrian Stevenson

As part of what might be considered my extra-curricular activities, I’ve been attending Manchester’s thriving Social Media Cafe from when it began back in November 2008. I initially got involved with this group more from the perspective of being a director of the Manchester Jazz Festival and a Manchester music blogger in the guise of The Ring Modulator. The interesting thing is that it usually turns out to be more relevant to my UKOLN ‘day’ job, this being the case when Julian Tait, one of the media cafe’s founders, asked me to give a talk on Linked Data, which I duly did last year.

The crossover is even more apparent now that Julian, as part of his role in Future Everything, has become involved in a project to make Manchester the UK’s first Open Data City. He spoke about this at the last excellent cafe meeting, and did a great job helping amplify some thoughts I’ve been having on this.

Julian Tait speaking at the Manchester Social Media Cafe

Julian Tait speaking at the Manchester Social Media Cafe, BBC Manchester, 2nd February 2010

Julian posed the question of what an open data city might mean, suggesting the notion that in a very real sense, data is the lifeblood of a city. It allows cities to function, to be dynamic and to evolve. If the data is more open and more available, then perhaps this data/blood can flow more freely. The whole city organism could then be a stronger, fitter and a healthier place for us to live.

Open data  has the potential to make our cities fairer and more democratic places. It is well known that information is a valuable and precious commodity, with many attempting to control and restrict access to it. Julian touched on this, inviting to us to think how different Manchester could be if everyone had access to more data.

He also mentioned the idea of hyper-local data specific to a postcode that could allow information to be made available to people on a street by street scale. This sounds very like the Postcode Paper mentioned by Zach Beauvais from Talis at a recent CETIS meeting. There was mention of the UK government’s commitment to open data via the data.gov.uk initiative, though no specific mention was made of linked data. In the context of the Manchester project, I think the ‘linked’ part may be some way down the road, and we’re really just talking about the open bit here. Linked Data and open data do often get conflated in an unhelpful way. Paul Walk, a colleague of mine at UKOLN, recently wrote a blog post, ‘Linked, Open, Semantic?‘ that helps to clarify the confusion.

Julian pointed us to two interesting examples, ‘They Work For You‘ and ‘MySociety‘, where open data is being absorbed into the democratic process thereby helping citizens hold government to account. There’s also the US innovation competition, ‘Apps for Democracy‘, Julian quoting an ear-catching statistic that an investment of 50,000 dollars is estimated to have generated a stunning return of 2.3 million dollars. Clearly an exemplar case study for open data there.

4IP‘s forthcoming Mapumental looks to be a visually engaging use of open data, providing intuitive visualisations of such things as house price indexes and public transport data mappings. Defra Noise Mapping England was also mentioned as the kind of thing that could be done, but which demonstrates the constraints of not being open. Its noise data can’t actually be combined with other data. One can imagine the benefits of being able to put this noise pollution data with house prices or data about road or air traffic.

Another quirky example mentioned was the UK developed SF Trees for iPhone app that uses San Francisco Department of Public Works data to allow users to identify trees in the city.

So open data is all about people becoming engaged, empowered, and informed. Julian also drew our attention to some of the potential risks and fears associated with this mass liberation of data. Will complex issues be oversimplified? Will open transparent information cause people to make simplistic inferences and come to invalid conclusions? Subtle complexities may be missed with resulting mis-information. But surely we’re better off with the information than without? There are always risks.

Open data should also be able to provide opportunities for saving money, Julian noting that this is indeed one of the major incentives behind the UK’s ‘smarter government‘ as well as US and Canadian government initiatives.

After the talk there was some lively debate, though I have to say I was somewhat disappointed by the largely suspicious and negative reaction. Perhaps this is an inevitable and healthy wariness of any government sanctioned initiative, but it appears that people fear that the openness of our data could result in some undesirable consequences. There was a suggestion for example, that data about poor bin collection in an  area could adversely affect house prices, or that hyper-local geographical data about traffic to heart disease information websites could be used by life insurance companies. Perhaps hyper-local data risks ghettoising people even more? Clearly the careful anonymisation of data is very important. Nevertheless, it was useful to be able to gauge people’s reactions to the idea of an open data city, as any initiative like this clearly needs people on board if it is to be a success.