Closure of this blog

February 4th, 2013 by Brian Kelly

This blog was closed after Adrian Stevenson left UKOLN to start a new post working at Mimas.

There’s No Business Like No Business Case

March 18th, 2011 by Adrian Stevenson

‘What is the Business Administrative Case for Linked Data?’ parallel session, JISC Conference 2011, BT Convention Centre, Liverpool, UK. 15th March 2011

BT Convention Centre, JISC Conference 2011One of the parallel sessions at this years JISC Conference in Liverpool promised to address the “business value to the institution” of linked data, being aimed at “anyone who wants a clear explanation in terms of how it applies to your institutional business strategy for saving money..”. I was one of a number of people invited to be on the panel by the session host, David Flanders, but such was the enthusiasm, I was beaten to it by the other panellists, despite replying within a few hours.

The session kicked off with a five minute soapbox from each of the panellists before opening up to a wider discussion. First up was Hugh Glaser from Seme4. He suggested that Universities have known for a long time that they need to improve data integration and fusion, but have found this a difficult problem to solve.  You can get consultants in to do this, but it’s expensive and the IT solutions often end up driving the business process instead of the other way round. Everything single modification has to be paid for, such that your business process often ends up being frozen. However, Linked Data offers the possibility of solving these problems at low risk and low cost, being more of an evolutionary than revolutionary solution.  The success of data.gov.uk was cited, it having taken only seven months to release a whole series of datasets. Hugh emphasised that not only has the linked data approach been implemented quickly and cheaply here, but also it hasn’t directly impinged upon or skewed the business process.

He also talked about his work with the British Museum, the problem there being that data has been held separately in different parts of the organisation resulting in seven different databases.  These have been converted into linked data form and published openly, now allowing the datasets to be integrated. Hugh mentioned that another bonus of this approach is that you don’t necessarily have to write all your applications yourself. The finance section of data.gov.uk lists five applications contributed by developers not involved with the government.

Linked Data Panel Session at JISC Conference 2011 (L-R: David Flanders, Hugh Glaser, Wilbert Kraan, Bijan Parsia, Graham Klyne)

Linked Data Panel Session at JISC Conference 2011 (L-R: David Flanders, Hugh Glaser, Wilbert Kraan, Bijan Parsia, Graham Klyne)

Wilbert Kraan from CETIS described an example where linked data makes it possible to do things existing technologies simply can’t. The example was based on PROD, a database of JISC projects provided as linked data. Wilbert explained that they are now able to ask questions of the dataset not possible before. They can now put information on where projects have taken place on a map, also detailing the type of institution, and its rate of uptake. The neat trick is that CETIS don’t have to collect any data themselves, as many other people are gathering data and making it available openly. As the PROD data is linked data, it can be linked in to this other data. Wilbert suggested that it’s hard to say if money is saved, because in many cases, this sort of information wouldn’t be available at all without the application of linked data principles.

Lecturer in Computer Science at the University of Manchester, Bijan Parsia talked about the notion of data disintermediation, which is the idea that linked data cuts out intermediaries in the data handling process, thereby eliminating points of frictions. Applications such as visualisations can be built upon linked data without the need to climb the technical and administrative hurdles around a proprietary dataset. Many opportunities then exist to build added value over time.

The business case favoured by Graham Klyne was captured by the idea that it enables “uncoordinated reuse of information” as espoused by Clark Parsia, an example being the simplicity with which it’s possible to overlay faceted browse functionality on a dataset without needing to ask permission. Graham addressed the question of why there are still few compelling linked data apps. He believes this comes down to the disconnect between who pays and who benefits. It is all too often not the publishers themselves who benefit, so we need to do everything possible to remove the barriers to data publishing. One solution may be to find ways to give credit for dataset publication, in the same way we do for publishing papers in the academic sector.

David then asked the panel for some one-liners on the current barriers and pain points. For Wilbert, it’s simply down to lack of knowledge and understanding in the University enterprise sector, where the data publisher is also the beneficiary. Hugh felt it’s about the difficulty of extracting the essence of the benefit of linked data.  Bijan suggested that linked data infrastructure is still relatively immature, and Graham felt that the realisation of benefits is too separate from the costs of publication, although he acknowledged that it is getting better and cheaper to do.

We then moved on to questions and discussion.  The issue of data quality was raised. Hugh suggested that linked data doesn’t solve problem of quality, but it can help expose quality issues, and therefore their correction. He pointed out that there may be trust and therefore quality around a domain name, such as http://www.bl.uk/ for data from the British Library. Bijan noted that data quality is really no more of an issue than it is for the wider Web, but that it would help to have mechanisms for reporting issues back to the publisher. Hugh believes linked data can in principle help with sustainability, in that people can fairly straightforwardly pick up and re-host linked data. Wilbert noted one advantage of linked data is that you can do things iteratively, and build things up over time without having to make a significant upfront commitment. Hugh also reminded us of the value of linked data to the intranet. Much of the British Library data is closed off, but has considerable value internally. Linked data doesn’t have to open to be useful.

The session was very energetic, being somewhat frantic and freewheeling at times. I was a little frustrated that some interesting discussion points didn’t have the opportunity to develop, but overall the session managed to cover a lot of ground for a one-hour slot. Were any IT managers convinced enough to look at linked data further? For that I think we’ll have to wait and see. For now, as Ethan Merman would say, “let’s go on with the show”.

Same Old Same Old?

January 19th, 2011 by Adrian Stevenson

Cultural Heritage and the Semantic Web British Museum and UCL Study Day, British Museum, London,  January 13th 2011

“The ability of the semantic web to cheaply but effectively integrate data and breakdown data silos provides museums with a long awaited opportunity to present a richer, more informative and interesting picture. For scholars, it promises the ability to uncover relationships and knowledge that would otherwise be difficult, if not impossible, to discover otherwise.”

Such was the promise of the ‘Cultural Heritage and the Semantic Web Study Day’ held in the hallowed halls of the Museum last week. Dame Wendy Hall from the University of Southampton opened the case for the defence, citing some anecdotes and giving us an outline of ‘Microcosm’, a cataloguing system she helped develop that was used for the Mountbatten archive back in 1987. Microcosm employed the semantic concept of entity-based reasoning using object-concept-context triples. Hall talked through some of the lessons learnt from her involvement in the early days of the Web. She believes that big is beautiful, i.e. the network is everything, and that scruffy works, meaning that link fail is ok, as it’s part of what makes the Web scale. Open and free standards are also very important. Hall mentioned a number of times that people said the Web wouldn’t scale, whether it was the network itself, or the ability to search it, but time had proved them wrong. Although Hall didn’t make the point explicitly, the implication was that the same thing would prove to be the case for the semantic web.  As to why it didn’t take off ten years ago, she believes it’s because the Artificial Intelligence community became very interested, and took it down “an AI rat hole” that it’s only now managing to re-emerge from. She does acknowledge that publishing RDF is harder than publishing web pages, but believes that it is doable, and that we are now past the tipping point, partly due to the helping push from data.gov.uk.

Dame Wendy Hall speaking at the British Museum on January 13th 2011

Dame Wendy Hall speaking at the British Museum on January 13th 2011

Ken Hamma spoke about ‘The Wrong Containers’. I found the gist of the talk somewhat elusive, but I think the general message was the now fairly familiar argument that we, in this case the cultural heritage sector (read ‘museums’ here), are too wedded to our long cherished notions of the book and the catalogue, and we’ve applied these concepts inappropriately as metaphors to the Web environment. He extended this reasoning to the way in which museums have similarly attempted to apply their practices and policies to the Web, having historically acted as gatekeepers and mediators. Getting institutions to be open and free with their data is a challenge, many asking why they should share. Hamma believes the museum needs to break free from the constraints of the catalogue, and needs to rethink its containers.

John Sheridan from The National Archives framed his talk around the Coalition Agreement, which provides the guiding principles for the publication of public sector information, or as he put it, the “ten commandments for the civil service”. The Agreement mandates what is in fact a very liberal licensing regime with a commitment to publishing in open standards, and the National Archives have taken this opportunity to publish data in Linked Data form and make it available via the data.gov.uk website. John acknowledged that not all data consumers will want data in RDF form from a SPARQL endpoint, so they’ve also developed Linked Data APIs with the facility to deliver data in other formats, the software code for this is being available open source. John also mentioned that the National Archives have generated a vocabulary for government organisational structure called the ‘Central Government Ontology’, and they’ve also been using a datacube to aid the creation of lightweight vocabularies for specific purposes. John believes that it is easier to publish Linked Data now than it was just a year ago, and ‘light years easier’ than five years ago.

Data provenance is a current important area for the National Archives, and they now have some ‘patterns’ for providing provenance information.  He also mentioned that they’ve found the data cleansing tools available from Google Refine to be very useful. It has extensions for reconciling data that they’ve used against the government data sets, as well as extensions for creating URIs and mapping data to RDF. This all sounded very interesting, with John indicating that they are now managing to enable non-technical people to publish RDF simply by clicking, and without having to go anywhere near code.

John certainly painted a rosy picture of how easy it is to do things, one I have to say I don’t find resonates that closely with my own experience on the Locah project, where we’re publishing Linked Data for the Archives Hub and Copac services. I had a list of questions for John that I didn’t get to ask on the day. I’ll be sure to point John to these:

  • What are the processes for publication of Linked Data, and how are these embedded to enable non-technical people to publish RDF?
  • Are these processes documented and available openly, such as in step-by-step guides?
  • Do you have generic tools available for publishing Linked Data that could be used by others?
  • How did you deal with modelling existing data into RDF? Are there tools to help do this?
  • Does the RDF data published have links to other data sets, i.e. is it Linked Data in this sense?
  • Would they consider running or being involved in hands on Linked Data publishing workshops?

Hugh Glaser from Seme4 outlined a common problem existing at the British Museum and many other places: that many separate research silos exist within organisations. The conservation data will be in one place, the acquisition data in another place, and the cataloguing data in yet another. Fusing this data together for the museum website by traditional means is very expensive, but the use of Linked Data based on the CIDOC Conceptual Reference Model ontology for the catalogue, and the <sameAs> service to tie things together, can make things more cost effective.  He then gave a quick demo of RKBExplorer, a service that displays digests of semantic relationships. Despite Hugh’s engaging manner, I’m not sure the demonstrations would have been enough to persuade people of the benefits of Linked Data to the cultural heritage sector.

In the short panel session that followed, John Sheridan noted that the National Archives are using named graphs to provide machine-readable provenance trails for legislation.data.gov.uk, employing the Open Provenance Model Vocabulary in combination with Google Refine processing. Hugh made the interesting point that he thinks we can get too hung up on the modelling of data, and the publication of RDF. As a result, the data published ends up being too complex and not fit for purpose. For example, when we’re including provenance data, we might want to ask why we are doing this. Is it for the user, or really just for ourselves, serving no real purpose. Big heavyweight models can be problematic in this respect.

The problem of having contradictory assertions about the same thing also came up. In Linked Data, all voices can be equal, so attribution may be important. However, even with the data the British Museum creates, there will be some contradictory assertions.  John Sheridan pointed out that data.gov.uk has aided the correction of data. The publication of data about bus stops revealed that 20,000 specified locations weren’t in the right place, these then being corrected by members of the public. Hugh reminded us that the domain of a Web URI, such as http://www.britishmuseum.org/does itself provide a degree of attribution and trust.

Alanas Kiryakov from Ontotext was the first speaker to sound a warning note or two, with what I thought was an admirably honest talk. Ontotext provide a service called FactForge, and to explain this, Alanas talked a little about how we can make inferences using RDF, for example, the statement ‘is a parent’ infers the inverse statement ‘is a child’. He noted that the BBC were probably the first to use concept extraction, RDF and triple stores on a large scale site, the solution having been chosen over a traditional database solution, with the semantic web delivering a cheaper product.

So why is the semantic web is still not used more? Alanas believes it’s because there are still no well-established Linked Data ‘buys’ to convince business enterprise.  Linked Data he suggests is like teenage sex – many talk about it, but not many do it. Nevertheless, he does believe that Linked Data facilitates better data integration, and adds value to proprietary data through better description whilst being able to make data more open. However, Linked Data is hard for people to comprehend, and its sheer diversity comes at a price. Getting specific information out of DBpedia is tough. The Linked Data Web is also unreliable, exhibiting high down times. One point that really struck me was how slow he says the distributed data web is, with a SPARQL query over just two or three servers being unacceptably slow.

Overcoming these limitations of the Linked Data Web form the basis of Ontotext’s ‘reason-able’ approach, which is to group selected datasets (DBPedia, Freebase, Geonames, UMBEL, Wordnet, CIA World Factbook, Lingvoj, MusicBrainz ) and ontologies (Dublin Core, SKOS, RSS, FOAF) into a compound set which is then cleaned-up and post processed.  It does strike me that this re-centralising, dare I say it, portal approach seems to defeat much of the point of the Linked Data Web, with inherent issues arising from a not-unbound data set and out of sync data, albeit I realise it aims to provide an optimised and pragmatic solution. Alanas suggests that many real time queries would be impossible without services like Factforge.

Alanas then explained how the wide diversity of the Linked Data Web often leads to surprising and erratic results, the example given that the most popular entertainer in Germany according to a SPARQL query is the philosopher Nietzsche, as demonstrated using the Factforge query interface.  This arises from what Alanas calls the ‘honey and sting’ of owl:sameas, the semantic web concept that allows for assertions to be made that two given names or identifiers refer to the same individual or entity. This can generate a great multiplicity of statements, and give rise to many different versions of the same result.

Dominic Oldman from the British Museum’s Information Services Development section concluded the day talking about the ResearchSpace project based at the British Museum. The project aims to create a research collaboration and digital publication environment, the idea being that the data becomes a part of the space, along with the tools and collaboration. It consists of things like blogging tools, forums, and wikis in an environment alongside the data, which is imported in RDF using the CIDOC Conceptual Reference Model. An example was shown of a comparison of a drawing and painting of same thing by an artist, and the ability to bring these together.

Ground Control to Major Tom

March 17th, 2010 by Adrian Stevenson

Terra Future Seminar, Ordnance Survey, Southampton, 10th March 2010

The Linked Data movement was once again in rude health at last week’s ‘Terra Future’ seminar where it joined forces with the UK GIS crowd at the Ordnance Survey HQ on a sun soaked Southampton morning.

Peter ter Haar from Ordnance Survey opened the day by raising the question, “why bring geo-spatial data and Linked Data together?”. He suggested it is important because the word “where” exists in 80% of our questions. Where things happen matters greatly to us.

It was down to the ever-polished ‘Major’ Tom Heath from Talis to set the Linked Data store. He talked through a clever analogy between virtual data links and physical transport links. Linked Data promises to improve the speed and efficiency of virtual data networks in the same way the development of road and rail networks improved physical travel over the era of canal networks. Tom’s journey to work from Bristol to Birmingham is only made possible by an interlinking network of roads, cycle routes and railways built using agreed standards such as rail track gauges. Similarly, many data applications will only be made possible by a network of standardised Linked Data. As building physical networks has added value to the places connected, so will building virtual networks add value to things that are connected.

Tom Heath

Tom Heath speaking at Terra Future 2010

Liz Ratcliffe from Ordnance Survey gave us a brief history of geography, complete with some great slides, explaining how much of the subject is about linking different aspects of geography together, whether it be information about physical geography, climatology, coastal data, environmental data, glaciology and so on. Liz was the first to mention the geographical concept of topographical identifiers (TOIDs). Geography uses TOIDs to connect information, with every geographical feature being described by its TOID. It’s also possible to ‘hang’ your own information on TOIDs. The difference between the concepts of location and place were explained, location being a point or position in physical space, and place being a portion of space regarded as distinct and measured off. Liz concluded with the seemingly obvious, but perhaps taken for granted observation that “everything happens somewhere”. Tom Heath made the point on twitter that the next step here is to make the links between the topographic IDs explicit, and then expose them to the Web as http URIs. John Goodwin reported that he’s “working on it”, so it sounds like we can look forward to some progress here.

Silver Oliver from the BBC stood in for a mysteriously absent Tom Scott. It was more or less a re-run of his Linked Data London talk on what the BBC are doing in the area of news and journalism, and is covered below.

Next up was an unscheduled surprise guest, none other than ‘inventor of the Web’, Sir Tim Berners-Lee himself for a quick pep talk, expressing how crucially important geo-spatial data is for Linked Data. He observed that one of the first things people do is to map things. This is very valuable information, so we need to surface this geospatial data in ways that can be linked.  Surprise guest star number two, Nigel Shadbolt from the University of Southampton and ‘Information Advisor’ to the Prime Minister, then took to the podium for a lightning talk on data.gov.uk. He gave part of the credit for the fact that data.gov.uk was launched on time to agile programming methodologies, and suggested this was worthwhile considering when thinking about procurements. He then gave us a quick tour of some of the interesting work going on, including the ASBOrometer iPhone app  that measures the level of anti-social behaviour at your current geo-location based on data from data.gov.uk.

Tim Berners-Lee

Tim Berners-Lee speaking at Terra Future 2010

John Sheridan from the UK Government’s Office of Public Sector Information (OPSI), then talked some more about data.gov.uk. He mentioned that an important part of the data.gov.uk exercise was to re-use existing identifiers by co-opting them for things when moving them into the Linked Data space. The point is to not start anew. They’re trying to find ways to make publishing data in RDF form as easy as possible by finding patterns for doing things based on Christopher Alexander’s design patterns ideas. He also asked “how the hell would we meet the INSPIRE directives without Linked Data?”. These directives require the assignment of identifiers for spatial objects and the publishing of geospatial information. Now that data.gov.uk has been launched, the next step is to build capability and tools around the design patterns they’ve been constructing.

Brian Higgs from Dudley Metropolitan Borough Council talked a little about how location is important for local government in delivering its services, for which geo-information services are business critical. Ian Holt, Senior Technical Product Manager for Ordnance Survey Web Services then spoke about some of the interesting map things that have been happening as result of Web 2.0. He illustrated how it’s possible to create your own maps with a GPS unit using the Open Street Map service. He mentioned how the world community had filled in the data on the Haiti open map following the recent earthquakes, greatly helping the recovery effort. Tim Berners-Lee also referred to this in a recent TED talk.

Hugh Glaser from the University of Southampton closed the presentations with some technical demonstrations of sameAs.org, a service that helps you to find co-references between different data sets, and rkbexplorer.com, a ‘human interface to the ReSIST Knowledge Base’. A sameAs.org search on for example, John Coltrane will give you a set of URIs that all refer to the same concept of John Coltrane. Hugh admitted that gathering all the links is a challenge, and that some RDF is not reliable. All that’s needed to have a ‘same as‘ data problem, is for one person to state that the band ‘Metallica’ is the same as the album ‘Metallica’. Hugh also alluded to some of issues around trusting data, leading Chris Gutteridge to make a mischievous suggestion, “Hughs examples of sameAs for metallica gives me an evil idea: sameAs from bbc & imdb to torrents on thepiratebay”. I raised this as an issue for Linked Data on twitter leading to a number of responses, Ben O’Steen suggesting that “If people are out to deceive you, or your system, they will regardless of tech. Needing risk management is not new!”.  I think this is a fair point. Many of these same issues have been tackled in that past, for example in the area of relational databases.  The adoption of Linked Data is still fairly new, and it seems perfectly plausible that Linked Data will be able to address and resolve these same issues in time.

All in all, it was a great day, with the strong sense that real things are happening, and a great sense of excitement and optimism for the future of open, linked and geo-located data.

'Open Your Data' Sticker given out by Tim Berners-Lee

'Open Your Data' Sticker given out by Tim Berners-Lee

The full archive of #terrafuture tweets is available via twapper.

Collective Intelligence Amplification

March 15th, 2010 by Adrian Stevenson

JISC Developer Days, University of London Union, London, 24th-27th February 2010

Following straight on from the ‘Linked Data Meet-up 2‘, I was immediately into the JISC UKOLN Dev8d Developer Days (http://dev8d.org/) held at the same location. Although I may be considered to be a little biased given I work for UKOLN, I have to say I was mightily impressed by this fantastic event. The details that went into the organisation, as well as the multitude of original ideas to enhance the event were well beyond anything I’ve seen before.

I was mainly there to get a few video interviews, and I’ve included these below. It was great to chat to Ed Summers from the Library of Congress who passed on his usual code4lib to attend dev8d, and gave us a few comments on how the events compare. It was also exciting to hear that Chuck Severance is intending to enhance the degree course he teaches on back in the US, using things he’s learnt at dev8d. All the interviewees clearly found the event to be really useful for creating and collaborating on new ideas in a way that just isn’t possible to the same degree as part of the usual working week. Just walking around the event listening in to some of the conversations, I could tell some great developer brains were working optimally. The workshops, expert sessions and project zones all added to the overall effect of raising the collective intelligence a good few notches. I’m sure we’ll hear about some great projects arising directly from these intense hot housing days.

You can get more reflections via the dev8d and JISC Information Environment Team blogs.

Ed Summers Chuck Severance Tim Donahue
John O’Brien Steve Coppin Chris Keene
Marcus Ramsden Lin Clark Tom Heath

As if by Magic …

March 15th, 2010 by Adrian Stevenson

‘Repositories and the Cloud’ event, The Magic Circle, London, 23rd February 2010

Last year I was asked to join the organising committee for the Eduserv JISC ‘Repositories in the Cloud‘ event that was held at the fantastic Magic Circle venue near Euston Station. The sell out day was a great success, the speakers giving an excellent overview of the current state of the art for cloud computing applications in the area of repository storage in particular. ‘Compute’ in the cloud was also discussed as one of main benefits of cloud technology in helping to reduce bandwidth by placing the compute next to the storage.

It was clear from the event that it’s still quite early days in the use of cloud technologies for repositories, and many have the usual concerns based around the security, control and licensing of the data that you often hear for cloud storage in general. The idea of going for a hybrid approach sounded like a sensible option, where you may keep critical and important data on your own servers, and use the cloud for less critical data, or perhaps use it more as a backup service.

Video recordings of the presentations by Michele Kimpton, CBO DuraSpace, ‘DuraCloud – Open technologies and services for managing durable data in the cloud’, Alex Wade, Director Scholarly Communication, Microsoft Research, ‘Cloud Services for Repositories’, and Les Carr, EPrints, University of Southampton, ‘EPrints Cloud Visions’ are now available on the Eduserv event page.

I caught up with most of the speakers, and a number of the attendees for some quick video reactions, thoughts and commentary. These are available on the event page and I’ve included them here:

London Calling

February 26th, 2010 by Adrian Stevenson

“London calling to the faraway towns
Now war is declared – and battle come down
London calling to the underworld
Come out of the cupboard, you boys and girls”

‘London Calling’ by The Clash [rdf]. Lyrics by Joe Strummer [rdf], 1979

London was the centre of gravity for the global data Web on Wednesday 24th February 2010. Such was the sense of anticipation in the air at the ‘Linked Data Meetup 2’, it almost felt like we were waiting for the start of rock concert instead of a series of presentations, especially given the event was being held in the concert hall of the University of London Student’s Union, the same place I’ve seen one or two bands over the years. Over two hundred people had signed up to attend, and many had to be turned away.

Tom Heath from Talis was first up, and although he didn’t come on stage with a guitar or start singing, he did do a great job of meeting up to the moment, as did all the speakers on this really inspiring day. Tom was talking about the implications of Linked Data for application development, and he focussed on what he sees as the need for computing to move on from the all-pervasive metaphor of the document, if the true possibilities of the Web are to be realised. We tend to see the Web as a library of documents, but it has the power to be so much more. He thinks we should see the Web as something more like an ‘exploratorium’ of real things in the world. These things should all be given http URIs, and be connected by typed links. The power this brings is that applications are then able to grasp the meaning of what things are, and how they relate to each other. Applications can probe this global Web of Data, performing calculations, identifying correlations, and making decisions. The machine processing power we can apply behind this idea supercharges the Web way beyond what is possible currently with a library of documents. Yes, of course we are already doing this to an extent with data in conventional databases, but the Linked Data Web gives us the possibility to connect all data together and wrap it with meaning.

Tom gave some examples of what becomes possible, one of which he based around the concept of pebbledash. Some things we might associate with pebbledash include getting it repaired, weatherproofing, its cost, its removal, is colour, or whatever. These concepts essentially exist in isolation on the library document Web as it is now, and we as humans have to explore by hand to make the connections ourselves. This is a time consuming, messy and error prone process. If the concepts existed in Linked Data form, the Web Exploratorium can bring these concepts together for us, promising faster, more complete and more accurate calculations. This means we can make better evaluations and more informed decisions. Tom left us with a provocative challenge – “if you don’t feel constrained by the document, then you’re not thinking hard enough”.

I caught up with Tom after his talk for a short interview:

Tom Scott from the BBC gave us a look at their Linked Data based wildlife finder. You can browse the BBC archives by meaning and concepts to build your own stories, and create journeys through the things that you care about. Wildlife programs tend to be based around particular animal behaviours, and they’ve been able to build pages around these concepts. All the resources have URLs, so if want to find out about the sounds that lions make, there’s an address for it. Some of the data is pulled in from other parts of the BBC, but much of it comes from outside sources, especially DBPedia. If the data needs editing, this is done on Wikipedia so the whole community benefits. I think Tom glossed over the problem of what happens when data gets vandalised, where basically, they just go back and correct it. There are real issues for Linked Data here. When someone goes to Wikipedia directly, they can make a judgement call on whether to trust the data. When it’s surfaced on the BBC website, it’s not clear that the data has been pulled in from external sources, but people will place trust in it because it’s on the website of a trusted organisation.

John Sheridan and Jeni Tennison representing data.gov.uk gave a focussed and passionate view on ‘How the Web of Data Will Be Won’. John believes the possibilities that come from making governmental data available as Linked Data are extraordinary, and you could tell he meant it. In answer to ‘Why Linked Data?’, the answer is simple, it’s the most Web-centric approach to using data. It’s “data you can click on”. It’s also by its very nature, distributed. This is really important, because one of the disadvantages of a centralised approach is that it takes a lot of planning, and this really slows things down. The Linked Data distributed model of data publishing, where others do the combining makes the whole thing much quicker and easier.

Jeni emphasised that we’ve all got to be brutally practical to get things done. Publishing research papers is all well and good, but it’s time to get on with it and find ways to get the data out there. Doing stuff is what matters. For her, much of this is about working out successful and simple patterns for publishing data in RDF that others can use. This is the pragmatic way to get as much Linked Data on the Web as quickly and as cheaply as possible. It’s about laying some tracks down so that others can follow with confidence.  The patterns cover everything from a common way to structure consistent http URIs, to policy patterns that ensure persistence, and patterns for workflows and version control. This all sounds spot on to me. We’ve seen many examples of poor practice with URIs, so it’s really encouraging to hear that there’s work going on to get things right from the start. This gives Linked Data a good chance of staying the course. The government have lots of useful stats on all kinds of things, but patterns are needed to get this data out of its Excel based bunker. I sensed many in the room were reassured to hear there’s work going on to provide APIs so that developers won’t necessarily have to work directly with the RDF and SPARQL endpoints. The fact that Jeni’s dev8d session on this topic was so packed I couldn’t even get near the door, never mind get in, indicates there’s a real need and demand for these APIs.

Lin Clark from DERI Galway gave us a quick look at the Drupal content management system. Drupal is used to power data.gov.uk and comes with RDF pretty much out of the box. I got a better look in Lin’s dev8d Drupal 7 workshop. I have to say it does look a doddle to install, and mapping the content to RDF for the SPARQL endpoint seems to be a cinch too.

Silver Oliver took us back to the document metaphor with  a talk on journalism and Linked Data. Not unsurprisingly, journalism is especially attached to the document metaphor, with newspapers treating the Web as just another distribution channel. Martin Belam from The Guardian was in the audience and responded, indicating that they are looking at trying to do things better. Silver said a challenge the BBC have is to get the news and sports sections to become more like the Wildlife Finder, but the information models aren’t yet in place yet. There has been some progress with sport events and actors, but it appears that news vocabularies are complex.

Georgi Kobilarov gave us a quick preview of uberblic.org, a single point of access to a Web of integrated data. I have become somewhat unconvinced by attempts to provide single points of access, mainly because of the largely failed notion of portals, and also the fact that lots of people seem to want to provide them, so they’re not very ‘single’. I think in this case though, it’s more that Georgi is demonstrating the sort of thing that can be done, and isn’t trying to claim territory. Uberlic.org takes data sources including Wikipedia, Geonames, and Musicbrainz, and provides a single API for them.

The meet-up concluded with a really engaging panel session hosted by Paul Miller. On the panel were Ian Davis from Talis,  Jeni Tenison, Tom Scott and Timo Hannay from Nature Publishing Group. There was some discussion over who has responsibility for minting URIs. It might be clear when it’s government departments minting URIs for schools, but what about when it’s less obvious?  Who do we trust? Do we trust DBPedia URIs for example? A judgement has to be made. The risk of having data monopolies such as Wikipedia was raised. It’s in the nature of the Web for these to emerge, but can we rely on these long-term, and what happens if they disappear?

There was quite an interesting discussion on the tension between the usability and the persistence of http URIs. Musicbrainz provides opaque URIs that work well for the BBC, and these are re-used in their URIs, but Wikipedia URIs are problematic. They may be human readable, but they change when the title of a page is changed. Tom Scott said he would “trade readability for persistence any day”.

Ian Davis talked about some of the technical barriers when building Linked Data applications. The approach is very different to building applications on known databases that are under your control. Programming across the whole Web is a new thing, and it’s the challenge for our generation of developers. Timo Hannay ended the meeting on a fittingly profound note, referring to Chris Anderson’s ‘End of Theory’. He feels that embracing Linked Data is essential, and to not do so will mean that science “simply cannot progress”.

Linked Data may not quite have declared war, but many think it’s time for our data to come out of the cupboard and rise from the underworld.

YouTube Preview Image

The Case for Manchester Open Data City

February 8th, 2010 by Adrian Stevenson

As part of what might be considered my extra-curricular activities, I’ve been attending Manchester’s thriving Social Media Cafe from when it began back in November 2008. I initially got involved with this group more from the perspective of being a director of the Manchester Jazz Festival and a Manchester music blogger in the guise of The Ring Modulator. The interesting thing is that it usually turns out to be more relevant to my UKOLN ‘day’ job, this being the case when Julian Tait, one of the media cafe’s founders, asked me to give a talk on Linked Data, which I duly did last year.

The crossover is even more apparent now that Julian, as part of his role in Future Everything, has become involved in a project to make Manchester the UK’s first Open Data City. He spoke about this at the last excellent cafe meeting, and did a great job helping amplify some thoughts I’ve been having on this.

Julian Tait speaking at the Manchester Social Media Cafe

Julian Tait speaking at the Manchester Social Media Cafe, BBC Manchester, 2nd February 2010

Julian posed the question of what an open data city might mean, suggesting the notion that in a very real sense, data is the lifeblood of a city. It allows cities to function, to be dynamic and to evolve. If the data is more open and more available, then perhaps this data/blood can flow more freely. The whole city organism could then be a stronger, fitter and a healthier place for us to live.

Open data  has the potential to make our cities fairer and more democratic places. It is well known that information is a valuable and precious commodity, with many attempting to control and restrict access to it. Julian touched on this, inviting to us to think how different Manchester could be if everyone had access to more data.

He also mentioned the idea of hyper-local data specific to a postcode that could allow information to be made available to people on a street by street scale. This sounds very like the Postcode Paper mentioned by Zach Beauvais from Talis at a recent CETIS meeting. There was mention of the UK government’s commitment to open data via the data.gov.uk initiative, though no specific mention was made of linked data. In the context of the Manchester project, I think the ‘linked’ part may be some way down the road, and we’re really just talking about the open bit here. Linked Data and open data do often get conflated in an unhelpful way. Paul Walk, a colleague of mine at UKOLN, recently wrote a blog post, ‘Linked, Open, Semantic?‘ that helps to clarify the confusion.

Julian pointed us to two interesting examples, ‘They Work For You‘ and ‘MySociety‘, where open data is being absorbed into the democratic process thereby helping citizens hold government to account. There’s also the US innovation competition, ‘Apps for Democracy‘, Julian quoting an ear-catching statistic that an investment of 50,000 dollars is estimated to have generated a stunning return of 2.3 million dollars. Clearly an exemplar case study for open data there.

4IP‘s forthcoming Mapumental looks to be a visually engaging use of open data, providing intuitive visualisations of such things as house price indexes and public transport data mappings. Defra Noise Mapping England was also mentioned as the kind of thing that could be done, but which demonstrates the constraints of not being open. Its noise data can’t actually be combined with other data. One can imagine the benefits of being able to put this noise pollution data with house prices or data about road or air traffic.

Another quirky example mentioned was the UK developed SF Trees for iPhone app that uses San Francisco Department of Public Works data to allow users to identify trees in the city.

So open data is all about people becoming engaged, empowered, and informed. Julian also drew our attention to some of the potential risks and fears associated with this mass liberation of data. Will complex issues be oversimplified? Will open transparent information cause people to make simplistic inferences and come to invalid conclusions? Subtle complexities may be missed with resulting mis-information. But surely we’re better off with the information than without? There are always risks.

Open data should also be able to provide opportunities for saving money, Julian noting that this is indeed one of the major incentives behind the UK’s ‘smarter government‘ as well as US and Canadian government initiatives.

After the talk there was some lively debate, though I have to say I was somewhat disappointed by the largely suspicious and negative reaction. Perhaps this is an inevitable and healthy wariness of any government sanctioned initiative, but it appears that people fear that the openness of our data could result in some undesirable consequences. There was a suggestion for example, that data about poor bin collection in an  area could adversely affect house prices, or that hyper-local geographical data about traffic to heart disease information websites could be used by life insurance companies. Perhaps hyper-local data risks ghettoising people even more? Clearly the careful anonymisation of data is very important. Nevertheless, it was useful to be able to gauge people’s reactions to the idea of an open data city, as any initiative like this clearly needs people on board if it is to be a success.

Standards, Profiles, Interoperability – Some Notes from the Front

January 11th, 2010 by Adrian Stevenson

Sword logoTomorrow I’ll be attending the CETIS ‘Future of Interoperability Standards Meeting‘ in Bolton on behalf of the SWORD project that I manage as part of my role at UKOLN. Invitees have been to asked to provide a position paper that “… should focus on thoughts or opinions on the experience of developing both formal and informal specifications and standards, working with standards bodies and potential ways forward to achieve interoperability“. This is quite a tricky one for me, as most of the work on the SWORD profile was done by the previous project manager, Julie Allinson in SWORD phase one, and our SWORD partner from the University of Cambridge, Jim Downing for phase two. I had hoped to get Jim along to this meeting, but it wasn’t possible, so I’ll be the main SWORD representative. Consequently, rather than go into the specifics, I’d thought I’d give a few observations from my experience managing the project, these necessarily being more general. Whether this constitutes a position paper, I’m not sure.

  • Go with the Web
    Mainstream is good. Even if it doesn’t seem to fit, it’s probably a good idea. SWORD went with the Atom Publishing Protocol. It was developed for blogs.
  • Do you really need to standardise?
    Maybe de-facto is good enough? Why waste time and effort? The DSpace, Fedora, EPrints, Microsoft Zentity and Intralibrary repositories all ship with SWORD in their current releases. Microsoft have adopted SWORD as their de-facto standard for deposit and have implemented it in their ‘Article Authoring Add-in for Word‘. ‘Nuff said.
  • Prove a point
    Develop some test implementations and demo clients. Show the thing works.
  • Be agile
    … and don’t be too prescriptive.
  • You can never be too simple
    Do one simple thing well. Don’t try to do everything. It’s got to be clear.
  • Get the message out
    Some say that the marketing is the most important thing.
  • Don’t just say it, do it
    Practice what you preach. You know the quote, there’s too many standards and specifications. Re-cycle. Be strong, don’t re-invent.
  • Allow for serendipity
    … and embrace it when it happens.
  • Don’t go it alone
    Get people on board. Bring people with you.
  • Having great developers makes life easier
    It means you can get things done.

Semantic Technologies: Which Way Now? – A UKOLN Response

January 11th, 2010 by Adrian Stevenson

Last December myself and Paul Walk were invited to give a UKOLN response to the presentations at the CETIS Semantic Technologies Working Group meeting in Glasgow. Paul couldn’t make it, so it was left to me to come up with a response on the spot. Zach Beauvais from Talis gave an introductory talk on Talis’ activities and Adam Cooper followed with a summary of the ‘Giant Global Graph‘ session from the recent CETIS conference.

I mentioned a new UKOLN project called ‘RepUK’ that will be providing a Linked Data interface to an aggregation of scholarly materials from a number of UK repositories using the Talis platform. I then outlined a few issues around Linked Data, as well as mentioning the new Manchester OpenData project. Following on from discussions at the CETIS conference, I highlighted the difficulty of convincing IT managers and VCs that providing Linked Data interfaces to institutional systems is a worthwhile venture. The slides below provide a pointer to the full range of issues I raised.



After lunch Thanassis Tiropanis gave us an overview of the SemTech Project roadmap and recommendations (pdf).  Following this there was a general discussion about the way ahead for the project, but I’m not sure there were any clear decisions from the day. Nevertheless, it was a useful day for myself and hopefully a productive one for CETIS in determining where to go next.