eFragments » jisc

There’s No Business Like No Business Case

Adrian Stevenson — Fri, 18 Mar 2011 12:53:37 +0000

‘What is the Business Administrative Case for Linked Data?’ parallel session, JISC Conference 2011, BT Convention Centre, Liverpool, UK. 15th March 2011

One of the parallel sessions at this years JISC Conference in Liverpool promised to address the “business value to the institution” of linked data, being aimed at “anyone who wants a clear explanation in terms of how it applies to your institutional business strategy for saving money..”. I was one of a number of people invited to be on the panel by the session host, David Flanders, but such was the enthusiasm, I was beaten to it by the other panellists, despite replying within a few hours.

The session kicked off with a five minute soapbox from each of the panellists before opening up to a wider discussion. First up was Hugh Glaser from Seme4. He suggested that Universities have known for a long time that they need to improve data integration and fusion, but have found this a difficult problem to solve. You can get consultants in to do this, but it’s expensive and the IT solutions often end up driving the business process instead of the other way round. Everything single modification has to be paid for, such that your business process often ends up being frozen. However, Linked Data offers the possibility of solving these problems at low risk and low cost, being more of an evolutionary than revolutionary solution. The success of data.gov.uk was cited, it having taken only seven months to release a whole series of datasets. Hugh emphasised that not only has the linked data approach been implemented quickly and cheaply here, but also it hasn’t directly impinged upon or skewed the business process.

He also talked about his work with the British Museum, the problem there being that data has been held separately in different parts of the organisation resulting in seven different databases. These have been converted into linked data form and published openly, now allowing the datasets to be integrated. Hugh mentioned that another bonus of this approach is that you don’t necessarily have to write all your applications yourself. The finance section of data.gov.uk lists five applications contributed by developers not involved with the government.

Linked Data Panel Session at JISC Conference 2011 (L-R: David Flanders, Hugh Glaser, Wilbert Kraan, Bijan Parsia, Graham Klyne)

Wilbert Kraan from CETIS described an example where linked data makes it possible to do things existing technologies simply can’t. The example was based on PROD, a database of JISC projects provided as linked data. Wilbert explained that they are now able to ask questions of the dataset not possible before. They can now put information on where projects have taken place on a map, also detailing the type of institution, and its rate of uptake. The neat trick is that CETIS don’t have to collect any data themselves, as many other people are gathering data and making it available openly. As the PROD data is linked data, it can be linked in to this other data. Wilbert suggested that it’s hard to say if money is saved, because in many cases, this sort of information wouldn’t be available at all without the application of linked data principles.

Lecturer in Computer Science at the University of Manchester, Bijan Parsia talked about the notion of data disintermediation, which is the idea that linked data cuts out intermediaries in the data handling process, thereby eliminating points of frictions. Applications such as visualisations can be built upon linked data without the need to climb the technical and administrative hurdles around a proprietary dataset. Many opportunities then exist to build added value over time.

The business case favoured by Graham Klyne was captured by the idea that it enables “uncoordinated reuse of information” as espoused by Clark Parsia, an example being the simplicity with which it’s possible to overlay faceted browse functionality on a dataset without needing to ask permission. Graham addressed the question of why there are still few compelling linked data apps. He believes this comes down to the disconnect between who pays and who benefits. It is all too often not the publishers themselves who benefit, so we need to do everything possible to remove the barriers to data publishing. One solution may be to find ways to give credit for dataset publication, in the same way we do for publishing papers in the academic sector.

David then asked the panel for some one-liners on the current barriers and pain points. For Wilbert, it’s simply down to lack of knowledge and understanding in the University enterprise sector, where the data publisher is also the beneficiary. Hugh felt it’s about the difficulty of extracting the essence of the benefit of linked data. Bijan suggested that linked data infrastructure is still relatively immature, and Graham felt that the realisation of benefits is too separate from the costs of publication, although he acknowledged that it is getting better and cheaper to do.

We then moved on to questions and discussion. The issue of data quality was raised. Hugh suggested that linked data doesn’t solve problem of quality, but it can help expose quality issues, and therefore their correction. He pointed out that there may be trust and therefore quality around a domain name, such as http://www.bl.uk/ for data from the British Library. Bijan noted that data quality is really no more of an issue than it is for the wider Web, but that it would help to have mechanisms for reporting issues back to the publisher. Hugh believes linked data can in principle help with sustainability, in that people can fairly straightforwardly pick up and re-host linked data. Wilbert noted one advantage of linked data is that you can do things iteratively, and build things up over time without having to make a significant upfront commitment. Hugh also reminded us of the value of linked data to the intranet. Much of the British Library data is closed off, but has considerable value internally. Linked data doesn’t have to open to be useful.

The session was very energetic, being somewhat frantic and freewheeling at times. I was a little frustrated that some interesting discussion points didn’t have the opportunity to develop, but overall the session managed to cover a lot of ground for a one-hour slot. Were any IT managers convinced enough to look at linked data further? For that I think we’ll have to wait and see. For now, as Ethan Merman would say, “let’s go on with the show”.

Collective Intelligence Amplification

Adrian Stevenson — Mon, 15 Mar 2010 11:51:24 +0000

JISC Developer Days, University of London Union, London, 24th-27th February 2010

Following straight on from the ‘Linked Data Meet-up 2‘, I was immediately into the JISC UKOLN Dev8d Developer Days (http://dev8d.org/) held at the same location. Although I may be considered to be a little biased given I work for UKOLN, I have to say I was mightily impressed by this fantastic event. The details that went into the organisation, as well as the multitude of original ideas to enhance the event were well beyond anything I’ve seen before.

I was mainly there to get a few video interviews, and I’ve included these below. It was great to chat to Ed Summers from the Library of Congress who passed on his usual code4lib to attend dev8d, and gave us a few comments on how the events compare. It was also exciting to hear that Chuck Severance is intending to enhance the degree course he teaches on back in the US, using things he’s learnt at dev8d. All the interviewees clearly found the event to be really useful for creating and collaborating on new ideas in a way that just isn’t possible to the same degree as part of the usual working week. Just walking around the event listening in to some of the conversations, I could tell some great developer brains were working optimally. The workshops, expert sessions and project zones all added to the overall effect of raising the collective intelligence a good few notches. I’m sure we’ll hear about some great projects arising directly from these intense hot housing days.

You can get more reflections via the dev8d and JISC Information Environment Team blogs.

Click here to view the embedded video. Ed Summers	Click here to view the embedded video. Chuck Severance	Click here to view the embedded video. Tim Donahue
Click here to view the embedded video. John O’Brien	Click here to view the embedded video. Steve Coppin	Click here to view the embedded video. Chris Keene
Click here to view the embedded video. Marcus Ramsden	Click here to view the embedded video. Lin Clark	Click here to view the embedded video. Tom Heath

As if by Magic …

Adrian Stevenson — Mon, 15 Mar 2010 09:12:28 +0000

‘Repositories and the Cloud’ event, The Magic Circle, London, 23rd February 2010

Last year I was asked to join the organising committee for the Eduserv JISC ‘Repositories in the Cloud‘ event that was held at the fantastic Magic Circle venue near Euston Station. The sell out day was a great success, the speakers giving an excellent overview of the current state of the art for cloud computing applications in the area of repository storage in particular. ‘Compute’ in the cloud was also discussed as one of main benefits of cloud technology in helping to reduce bandwidth by placing the compute next to the storage.

It was clear from the event that it’s still quite early days in the use of cloud technologies for repositories, and many have the usual concerns based around the security, control and licensing of the data that you often hear for cloud storage in general. The idea of going for a hybrid approach sounded like a sensible option, where you may keep critical and important data on your own servers, and use the cloud for less critical data, or perhaps use it more as a backup service.

Video recordings of the presentations by Michele Kimpton, CBO DuraSpace, ‘DuraCloud – Open technologies and services for managing durable data in the cloud’, Alex Wade, Director Scholarly Communication, Microsoft Research, ‘Cloud Services for Repositories’, and Les Carr, EPrints, University of Southampton, ‘EPrints Cloud Visions’ are now available on the Eduserv event page.

I caught up with most of the speakers, and a number of the attendees for some quick video reactions, thoughts and commentary. These are available on the event page and I’ve included them here:

Michele Kimpton	Alex Wade	Les Carr	Brad McLean
Ross MacIntyre	Michael Guthrie	John Salter	Rob Sanderson
Simeon Warner	Kevin Ashley	Jane Stevenson	Dave Tarrant

London Calling

Adrian Stevenson — Fri, 26 Feb 2010 13:48:13 +0000

“London calling to the faraway towns
Now war is declared – and battle come down
London calling to the underworld
Come out of the cupboard, you boys and girls”

‘London Calling’ by The Clash [rdf]. Lyrics by Joe Strummer [rdf], 1979

London was the centre of gravity for the global data Web on Wednesday 24th February 2010. Such was the sense of anticipation in the air at the ‘Linked Data Meetup 2’, it almost felt like we were waiting for the start of rock concert instead of a series of presentations, especially given the event was being held in the concert hall of the University of London Student’s Union, the same place I’ve seen one or two bands over the years. Over two hundred people had signed up to attend, and many had to be turned away.

Tom Heath from Talis was first up, and although he didn’t come on stage with a guitar or start singing, he did do a great job of meeting up to the moment, as did all the speakers on this really inspiring day. Tom was talking about the implications of Linked Data for application development, and he focussed on what he sees as the need for computing to move on from the all-pervasive metaphor of the document, if the true possibilities of the Web are to be realised. We tend to see the Web as a library of documents, but it has the power to be so much more. He thinks we should see the Web as something more like an ‘exploratorium’ of real things in the world. These things should all be given http URIs, and be connected by typed links. The power this brings is that applications are then able to grasp the meaning of what things are, and how they relate to each other. Applications can probe this global Web of Data, performing calculations, identifying correlations, and making decisions. The machine processing power we can apply behind this idea supercharges the Web way beyond what is possible currently with a library of documents. Yes, of course we are already doing this to an extent with data in conventional databases, but the Linked Data Web gives us the possibility to connect all data together and wrap it with meaning.

Tom gave some examples of what becomes possible, one of which he based around the concept of pebbledash. Some things we might associate with pebbledash include getting it repaired, weatherproofing, its cost, its removal, is colour, or whatever. These concepts essentially exist in isolation on the library document Web as it is now, and we as humans have to explore by hand to make the connections ourselves. This is a time consuming, messy and error prone process. If the concepts existed in Linked Data form, the Web Exploratorium can bring these concepts together for us, promising faster, more complete and more accurate calculations. This means we can make better evaluations and more informed decisions. Tom left us with a provocative challenge – “if you don’t feel constrained by the document, then you’re not thinking hard enough”.

I caught up with Tom after his talk for a short interview:

Click here to view the embedded video.

Tom Scott from the BBC gave us a look at their Linked Data based wildlife finder. You can browse the BBC archives by meaning and concepts to build your own stories, and create journeys through the things that you care about. Wildlife programs tend to be based around particular animal behaviours, and they’ve been able to build pages around these concepts. All the resources have URLs, so if want to find out about the sounds that lions make, there’s an address for it. Some of the data is pulled in from other parts of the BBC, but much of it comes from outside sources, especially DBPedia. If the data needs editing, this is done on Wikipedia so the whole community benefits. I think Tom glossed over the problem of what happens when data gets vandalised, where basically, they just go back and correct it. There are real issues for Linked Data here. When someone goes to Wikipedia directly, they can make a judgement call on whether to trust the data. When it’s surfaced on the BBC website, it’s not clear that the data has been pulled in from external sources, but people will place trust in it because it’s on the website of a trusted organisation.

John Sheridan and Jeni Tennison representing data.gov.uk gave a focussed and passionate view on ‘How the Web of Data Will Be Won’. John believes the possibilities that come from making governmental data available as Linked Data are extraordinary, and you could tell he meant it. In answer to ‘Why Linked Data?’, the answer is simple, it’s the most Web-centric approach to using data. It’s “data you can click on”. It’s also by its very nature, distributed. This is really important, because one of the disadvantages of a centralised approach is that it takes a lot of planning, and this really slows things down. The Linked Data distributed model of data publishing, where others do the combining makes the whole thing much quicker and easier.

Jeni emphasised that we’ve all got to be brutally practical to get things done. Publishing research papers is all well and good, but it’s time to get on with it and find ways to get the data out there. Doing stuff is what matters. For her, much of this is about working out successful and simple patterns for publishing data in RDF that others can use. This is the pragmatic way to get as much Linked Data on the Web as quickly and as cheaply as possible. It’s about laying some tracks down so that others can follow with confidence. The patterns cover everything from a common way to structure consistent http URIs, to policy patterns that ensure persistence, and patterns for workflows and version control. This all sounds spot on to me. We’ve seen many examples of poor practice with URIs, so it’s really encouraging to hear that there’s work going on to get things right from the start. This gives Linked Data a good chance of staying the course. The government have lots of useful stats on all kinds of things, but patterns are needed to get this data out of its Excel based bunker. I sensed many in the room were reassured to hear there’s work going on to provide APIs so that developers won’t necessarily have to work directly with the RDF and SPARQL endpoints. The fact that Jeni’s dev8d session on this topic was so packed I couldn’t even get near the door, never mind get in, indicates there’s a real need and demand for these APIs.

Lin Clark from DERI Galway gave us a quick look at the Drupal content management system. Drupal is used to power data.gov.uk and comes with RDF pretty much out of the box. I got a better look in Lin’s dev8d Drupal 7 workshop. I have to say it does look a doddle to install, and mapping the content to RDF for the SPARQL endpoint seems to be a cinch too.

Silver Oliver took us back to the document metaphor with a talk on journalism and Linked Data. Not unsurprisingly, journalism is especially attached to the document metaphor, with newspapers treating the Web as just another distribution channel. Martin Belam from The Guardian was in the audience and responded, indicating that they are looking at trying to do things better. Silver said a challenge the BBC have is to get the news and sports sections to become more like the Wildlife Finder, but the information models aren’t yet in place yet. There has been some progress with sport events and actors, but it appears that news vocabularies are complex.

Georgi Kobilarov gave us a quick preview of uberblic.org, a single point of access to a Web of integrated data. I have become somewhat unconvinced by attempts to provide single points of access, mainly because of the largely failed notion of portals, and also the fact that lots of people seem to want to provide them, so they’re not very ‘single’. I think in this case though, it’s more that Georgi is demonstrating the sort of thing that can be done, and isn’t trying to claim territory. Uberlic.org takes data sources including Wikipedia, Geonames, and Musicbrainz, and provides a single API for them.

The meet-up concluded with a really engaging panel session hosted by Paul Miller. On the panel were Ian Davis from Talis, Jeni Tenison, Tom Scott and Timo Hannay from Nature Publishing Group. There was some discussion over who has responsibility for minting URIs. It might be clear when it’s government departments minting URIs for schools, but what about when it’s less obvious? Who do we trust? Do we trust DBPedia URIs for example? A judgement has to be made. The risk of having data monopolies such as Wikipedia was raised. It’s in the nature of the Web for these to emerge, but can we rely on these long-term, and what happens if they disappear?

There was quite an interesting discussion on the tension between the usability and the persistence of http URIs. Musicbrainz provides opaque URIs that work well for the BBC, and these are re-used in their URIs, but Wikipedia URIs are problematic. They may be human readable, but they change when the title of a page is changed. Tom Scott said he would “trade readability for persistence any day”.

Ian Davis talked about some of the technical barriers when building Linked Data applications. The approach is very different to building applications on known databases that are under your control. Programming across the whole Web is a new thing, and it’s the challenge for our generation of developers. Timo Hannay ended the meeting on a fittingly profound note, referring to Chris Anderson’s ‘End of Theory’. He feels that embracing Linked Data is essential, and to not do so will mean that science “simply cannot progress”.

Linked Data may not quite have declared war, but many think it’s time for our data to come out of the cupboard and rise from the underworld.

Click here to view the embedded video.

Standards, Profiles, Interoperability – Some Notes from the Front

Adrian Stevenson — Mon, 11 Jan 2010 18:05:01 +0000

Tomorrow I’ll be attending the CETIS ‘Future of Interoperability Standards Meeting‘ in Bolton on behalf of the SWORD project that I manage as part of my role at UKOLN. Invitees have been to asked to provide a position paper that “… should focus on thoughts or opinions on the experience of developing both formal and informal specifications and standards, working with standards bodies and potential ways forward to achieve interoperability“. This is quite a tricky one for me, as most of the work on the SWORD profile was done by the previous project manager, Julie Allinson in SWORD phase one, and our SWORD partner from the University of Cambridge, Jim Downing for phase two. I had hoped to get Jim along to this meeting, but it wasn’t possible, so I’ll be the main SWORD representative. Consequently, rather than go into the specifics, I’d thought I’d give a few observations from my experience managing the project, these necessarily being more general. Whether this constitutes a position paper, I’m not sure.

Go with the Web
Mainstream is good. Even if it doesn’t seem to fit, it’s probably a good idea. SWORD went with the Atom Publishing Protocol. It was developed for blogs.
Do you really need to standardise?
Maybe de-facto is good enough? Why waste time and effort? The DSpace, Fedora, EPrints, Microsoft Zentity and Intralibrary repositories all ship with SWORD in their current releases. Microsoft have adopted SWORD as their de-facto standard for deposit and have implemented it in their ‘Article Authoring Add-in for Word‘. ‘Nuff said.
Prove a point
Develop some test implementations and demo clients. Show the thing works.
Be agile
… and don’t be too prescriptive.
You can never be too simple
Do one simple thing well. Don’t try to do everything. It’s got to be clear.
Get the message out
Some say that the marketing is the most important thing.
Don’t just say it, do it
Practice what you preach. You know the quote, there’s too many standards and specifications. Re-cycle. Be strong, don’t re-invent.
Allow for serendipity
… and embrace it when it happens.
Don’t go it alone
Get people on board. Bring people with you.
Having great developers makes life easier
It means you can get things done.

Semantic Technologies: Which Way Now? – A UKOLN Response

Adrian Stevenson — Mon, 11 Jan 2010 15:50:46 +0000

Last December myself and Paul Walk were invited to give a UKOLN response to the presentations at the CETIS Semantic Technologies Working Group meeting in Glasgow. Paul couldn’t make it, so it was left to me to come up with a response on the spot. Zach Beauvais from Talis gave an introductory talk on Talis’ activities and Adam Cooper followed with a summary of the ‘Giant Global Graph‘ session from the recent CETIS conference.

I mentioned a new UKOLN project called ‘RepUK’ that will be providing a Linked Data interface to an aggregation of scholarly materials from a number of UK repositories using the Talis platform. I then outlined a few issues around Linked Data, as well as mentioning the new Manchester OpenData project. Following on from discussions at the CETIS conference, I highlighted the difficulty of convincing IT managers and VCs that providing Linked Data interfaces to institutional systems is a worthwhile venture. The slides below provide a pointer to the full range of issues I raised.

After lunch Thanassis Tiropanis gave us an overview of the SemTech Project roadmap and recommendations (pdf). Following this there was a general discussion about the way ahead for the project, but I’m not sure there were any clear decisions from the day. Nevertheless, it was a useful day for myself and hopefully a productive one for CETIS in determining where to go next.

Linked Data and the Semantic Web: What Are They and Should I Care?

Adrian Stevenson — Mon, 11 Jan 2010 10:28:55 +0000

I gave this presentation on Linked Data and the Semantic Web at one of our UKOLN staff seminars last year on the 5th of November. It was well received, so I thought it was worth including here. I will be giving an updated version at a MIMAS developer meeting in Manchester on the 17th of February. The meeting is mainly for MIMAS staff, but it may be possible for other people to attend. Let me know if you’d like to come along.