More reflections from the coal face

September 22nd, 2010 by Jane

This post has permanently moved to http://archiveshub.ac.uk/locah/2010/09/22/creating-linked-data-more-reflections-from-the-coal-face/

Please update any links and bookmarks.

We apologise for any inconvenience.

 

Tags: , , , , ,

9 Responses to “More reflections from the coal face”

  1. Finding inconsistencies with data even within a very limited collection from a single institution, you have my sympathies dealing with a much more varied set of records!

    I guess in a project like this, data quality is always going to come up, and it is sometimes necessary to walk a line between working with what you have, and improving the data. One route might be to have elements that simply recreate data from the original record and some that represent data (where possible) in an ‘improved’ way.

    For example in the British Library RDF representation of BNB records I note that they seem to just be dumping the contents of the 250$a field (edition) as “isbd:hasEditionStatement”. While this gets the data out there in RDF, it seems a bit of a missed opportunity (to me) of expressing the edition as a pure integer (’1′, ’2′) as opposed to the textual content of 250$a (’2nd’, ’3rd’).

    However, actually the 250$a can hold different kinds of edition statement, and this won’t always translate to an integer (e.g. could be something like ‘Special education ed.’)

    In this case I wonder if there is an opportunity to do both where it is easy and leave it where it is hard – so if 250$a is simple ’2nd ed.’ grab the numeric and put into a new data element, but where it isn’t so obvious just leave it as the ‘edition statement’ they are already doing.

    It feels like the translation to RDF gives this opportunity as you can do this easily without a huge overhead?

  2. Avatar of Jane Jane says:

    Yes, this is an interesting area for exploring. Its funny really. We had quite a shift in emphasis with the Hub data over the last few years where we are very firmly in the camp of ‘we don’t change the data because it is the contributors’ data’. This makes me think that it might be worth consulting contributors on this idea of making some changes – not to the essence of the content but just to help with consistency and meaning. Some things would, I hope, be uncontroversial, such as changing references slightly to make them all consistent (this is do do with what goes into attributes and what is content), or adding the language – although we can’t assume that it is English of course.

    If I could make one change to the data – what would that be? That’s an interesting question and I think it might be that I would add geographic names as index terms. That’s because I can see so much potential in geodata in terms of use cases and visualisations. Some descriptions that have come to us from exports just have the geographic location as ‘UK’. Sigh. But many don’t use this field at all, even when it would be very relevant to do so.

  3. Jakob says:

    There are very good tutorials on data modeling. Just because they are mainly focused on relational databases, does not mean they are not helpful at all! For instance Silverston’s “Data Model Resource Book” series and Simsion’s “Data modeling essentials”. My favorite is “Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design” by Terry Halpin – it gives a good overview of different data modeling techniques and provides a better alternative to UML and ERM in my point of view. By the way a major drawback of RDFS and OWL is its lack of a *visual* modeling language.

    @Owen Stephens As you already pointed out “edition” does not simply map to integers. Maybe it is better to model editions via “nextEdition” and “prevEdition” relationships? Integers imply intervals, but editions are ordinal only. I bet there are even cases when editions form a directed acyclic graph instead of a simple order.

  4. Another thought on the back of your comment – I think another opportunity is to get others contributing to this information. So, perhaps suggest a way of enabling ‘the public’ (basically, anyone) to make statements about location which you capture as linked data statements. You could both provide an easy to use interface that you support (perhaps they can pin stuff on a map, which produces some triples?), and just advice to linked data experts/tinkerers “this is how we’d like you to make ‘location’ statements about our stuff”.

    Where you get reliable information (e.g. confirmed by a number of people, or by specific people), you could then offer this back to the source archives… But even if they didn’t take it, the ability to make this kind of statement is, of course, built into the linked data model – so it may not matter so much if the source archive want to integrate back into their source data.

  5. Jane Stevenson says:

    Yes, that’s got potential. We have had the idea of user contributions to data for a while. I think LD only strengthens the case for doing this, with the principle of data enrichment through different sources.

  6. Lukas Koster says:

    Good to read that we’re not the only ones struggling with practical implementation issues of linked data. I will publish a blog post about our project this week (URL will be http://commonplace.net/2010/10/dutch-culture-link/)

  7. [...] blog. Pete Johnston has also posted about our approach to URI patterns, and our blog post on the challenges of exposing linked data has been well [...]

  8. [...] have also been finding numerous examples of inconsistencies, such as where the ‘creator’ is ‘Joe Bloggs and others’ rather than just a name for [...]