Archives Hub Linked Data Release

May 9th, 2011 by Adrian Stevenson

This post has permanently moved to http://archiveshub.ac.uk/locah/2011/05/09/archives-hub-linked-data-release/

Please update any links and bookmarks.

We apologise for any inconvenience.

Tags: , , , , , , , , ,

14 Responses to “Archives Hub Linked Data Release”

  1. Richard Light says:

    Hi,

    The extraction of some fields includes the element name and attributes. See for example the Beverley Skinner entry:

    Processing: p xmlns=”"Description by Althea Greenan, MAKE 2002. Submitted to the Archives Hub as part of Genesis 2009 Project./p

    Otherwise … good work!!

  2. [...] the Locah announcement. Tags: academia, archives hub, jisc, locah Comment (RSS) [...]

  3. Hi Richard,

    The intent was, for several of the EAD elements, just to pass the content through as an XML Literal, but I’m not sure it’s being handled correctly i.e. the XML markup is being escaped.

    I’m not sure it’s terribly useful to use the XML Literals anyway, so it might be better just to “dumb it down” to a plain literal. I’ll have a look at it….

  4. Richard Light says:

    I didn’t even know that RDF allowed XML literals as a “value”. The Turtle and JSON formats will presumably not support that?

  5. Hi, this looks promising.

    Couple of minor practical suggestions:
    I think it might be useful to add a rollover underline behaviour, to make it clear where one link ends and another begins, when mousing over sequences of links.

    Also, how about including a search within the page? People could use CTRL-F if they know about that, but if they don’t, and just want to scan down to see if a particular string is mentioned, this could be helpful.

    Best wishes
    Martin

  6. JasonZ says:

    Hi Peter,

    Good work!

    I have two questions. 1. Concepts such as fonds, series, …, file and the concept Level are not defined in the same file. Are there any specific reasons to do so?
    2. I noticed in one sample that you use the unesco thesaurus. But only part of the whole thesaurus is loaded. Why?

  7. Hi Richard,

    XML Literals for RDF are defined here

    http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral

    I think they are supported in Turtle and the Talis RDF/JSON format

    http://docs.api.talis.com/platform-api/output-types/rdf-json

    as just another typed literal.

    But it seems my misunderstanding of some of the subtleties of XML Namespaces has slightly scuppered my attempt to use them! In the end, I don’t think using the XML Literal adds much so I think a short term fix is to tweak the transform process to replace them with plain literals.

  8. Hi Jason,

    Your questions raise some interesting issues, I think. I’m not saying what we have at the moment is “right” but I’ll try to explain how it comes about!

    In separating the URI of the class “Level” from the URIs of the individual levels (instances of that class), we’re essentially partitioning our “URI-space” between:

    - the “ontology”/”vocabulary”: the set of classes and properties – which we hope will remain relatively stable, at least once we get through a bit more testing and which may be referenced by many datasets (I’m working on another project with another university to create a dataset which will reference the LOCAH classes and properties). And currently we just maintain that data as a “hand-edited” XML document.

    - the “instance data”: the descriptions of archival resources (and people, places etc) derived from EAD docs, which is likely to be more “dynamic” – in the sense that we might tweak the current descriptions (e.g. to fix the XML Literal problem) or to add more data over time. This data is derived from the EAD XML docs and stored in a triple store with the “linked data” pages generated by queries against the store.

    (I think this is what is sometimes referred to as the T-Box/A-Box distinction?)

    Having said all that, the case of the “levels” is arguably a case where there is a set of conceptualisations which is common to multiple datasets, and maybe they should be in our /def/ URI-space.

    And in fact we do supplement the data from the EAD docs (where the info “about” the level is just a minimal mention in an attribute value) with some additional data providing textual definitions, which we load to the triple store alongside the data derived from the EAD docs.

    Which sort of leads into the territory of your second question….

    With the exception of the “levels” case, we aren’t “loading any thesauri” as such. The descriptions of concepts and concept schemes are all generated from the EAD XML data.

    i.e. we coin URIs for concepts only for those concepts that are “mentioned” in the EAD docs (in the controlaccess element). For each of those mentioned concepts, we’re also generating a triple to “say” that it is a member of the named “concept scheme” like

    http://data.archiveshub.ac.uk/id/conceptscheme/unesco

    So when the data is merged in the triple store, we have triples relating that concept scheme to each of the member concepts that were mentioned in the EAD docs. And that is what provides the list on the “description” of the scheme:

    http://data.archiveshub.ac.uk/id/conceptscheme/unesco

    i.e. it’s “saying”:

    “there’s a thing of type skos:ConceptScheme called UNESCO and here’s a list of member concepts from that thesaurus – but there may be other member concepts we don’t know about”

    And if in the future we extend the input dataset and process more EAD docs, then we may find additional UNESCO concepts mentioned, and the list on that page would grow.

    (In an ideal world, I guess we’d just be citing URIs for the UNESCO concepts provided by its maintenance agency, as we do for the language URIs, and we wouldn’t bother providing data.archiveshub.ac.uk URIs for them.)

  9. Hi Martin

    Thanks for your comments. We’re aware that some of the styling and layout elements are not as user friendly as they could be at the moment. We’ll be collating the feedback regarding usability amongst other things, and we’ll see what we can do, either in the short term, or if it’s more substantial work for a second release.

    Cheers, Adrian

  10. Wow.

    The data looks very different to what we archivists are used to when inputting data or viewing data on the web.
    I think it’s going to take us a while to get our heads around this!

    I know we’ve been talking about it for a while, but this is the first time I’ve seen it for archive data. And the main thing that struck me is that the data is very much for someone else (like a developer) rather than for an archivist.
    It both is ‘our data’ and not our data at the same time… if any of that makes sense.

    Looking forward to seeing more! Brave new worlds and all that
    Teresa

  11. eFoundations says:

    LOCAH releases Linked Archives Hub dataset…

    The LOCAH project, one of the two JISC-funded projects to which I’ve been contributing, this week announced the availability of an initial batch of data derived from a small subset of the Archives Hub EAD data as linked data. The……

  12. [...] comment on the blog post announcing the release of the Hub Linked Data maybe sums up what many archivists will think: “the main thing that struck me is that the [...]

  13. [...] Archives Hub Linked Data Release « LOCAH Project (tags: rdf linkeddata catalogue archives opac library uk bibliothèques metadonnees) [...]