Application profiles and metadata for repositories
RSS icon Email icon Home icon
  • Drupal, RDFa and the “fauxpository”

    Posted on May 19th, 2010 Talat Chaudhri 2 comments

    Drupal 7 is likely to be released soon, and will include native support for RDFa. The RDF module for Drupal 6 already allows this functionality. Why is this important? Because it makes relationships between resources much easier to describe through Drupal’s user-friendly interface and, in the process, would allow documents to be available as linked data.

    In Drupal terminology, a “node” is effectively a metadata record, and various Drupal modules enable the easy customisation of metadata. In effect, you could build a repository on the basis of Drupal, by-passing the need for platform-specific knowledge tied to open source software that has increasingly moved towards the “enterprise solution” space, along with all of the technical tie-in that it usually entails. For the service provider, it is not dissimilar to the tie-in experienced with commercial software, especially in the case of information librarians or other professionals who are not developers, or even developers are not part of that particular open source development team.

    Application Profiles are essentially structured metadata comprising elements and (usually) relationships, and are therefore inherently linked data solutions. They vary in complexity according to their particular functional requirements: for instance, in the world of scholarly publications, there is a spectrum between the straightforward, unstructured way that DSpace implements Dublin Core (which should perhaps be called the DSpace Application Profile), the simplified FRBR structure of the Scholarly Works Application Profile (SWAP) and the complex entity-relationship model of CERIF, the standard developed for Current Research Information Systems (CRISs). This latter is a de facto application profile, even if it is not normally referred to as such.

    Why should Drupal be any better than the repository platforms that already exist? In many ways, it depends on what you need to do with it, and on the resources at your disposal. But the advantage is that Drupal is a flexible Content Management Framework that is designed to be leveraged for any sort of content, and for new modules to be designed easily for new purposes. After all, what does a repository actually do that other websites cannot? They put metadata records and bitstreams (the actual documents or files) on the Web, and add a few additional services such as OAI-PMH, SWORD and statistics. But repositories are only a particular specialised subset of content management systems. Drupal is accessible to any PHP developer without any initial requirement of particular specialist platform knowledge, which is relatively easy to obtain. The community is large and support is quite easily available, as are modules that can be adapted for local purposes. It is designed to be easy to customise and theme.

    Sarah Currier recently talked about the idea of a “fauxpository”. If I remember correctly, she pointed out that it could even be based on WordPress. This is clearly a workable idea, although hardly suitable for production use as a university service. I would maintain that Drupal could easily be suitable for such a use with relatively little work, and could make use of and adapt application profiles in a way that the major open source repository platforms have been slow to do, and are still only just beginning to enable as something of an afterthought. UKOLN are investigating how Drupal can be used to make it possible to make use of the JISC’s Dublin Core Application Profiles (DCAPs), and using Drupal is intended to show how it can work independently of tie-in to any specific platforms.

  • Linked data and Dublin Core Application Profiles in EPrints 3.2.0

    Posted on March 23rd, 2010 Talat Chaudhri 2 comments

    EPrints 3.2.0 was released on 10th March 2010. It has some remarkable new features relating to linked data and, consequently, to Dublin Core Application Profiles based on multiple entity domain models such as SWAP, IAP and TBMAP (the GAP does not have a domain model). Here are the key points:

    Linked Data Support

    • Ability to establish arbitrary relations between objects or provide additional metadata in triple form.

    Semantic Web / Linked Data (RDF)

    We have made a (difficult) decision to move these features to 3.2.1 (due out soon after 3.2.0) because testing showed it caused a significant slow down.

    We’re rewriting it to do the same thing but with much less overhead!

    However, as may be seen on the EPrints wiki, the latter section read as follows until 11th March 2010:

    Semantic Web Support

    • RDF+XML Format
    • N3 Format
    • URIs for all objects, including non dataobjs. [sic] eg. Authors, Events, Locations.
    • BIBO Ontology
    • Extendable
    • URIs now use content negotiation to decide which export plugin to redirect to, based on mime-types supplied by plugins and the “accept” header.
    • Relations between eprints and documents

    If this is understood on face value, it appears that there has been significant progress in enabling features that would allow the full implementation of the JISC’s DCAPs based on the simplified FRBR model, although we must wait for some important details until the promised version 3.2.1, which is to be released “soon after 3.2.0″ according to the statement above. Although objects may be described with “arbitrary relations” and “additional metadata” (additional to what?) can be described in triple form, there are not yet URIs for all entities, such as Authors and so on. Presumably, the support for BIBO would be more demanding that the support required for the cut-down version of FRBR as seen, for example, in SWAP.

    This is all very promising, especially in the light of the same functionality being promised in DSpace 2.0, which were not yet implemented in the recent release of DSpace 1.6.0. However, all of this must come with the caveat that, until this is tried out in practice, it is not certain which levels of implementation are possible: clearly, the actual metadata fields can easily be adopted by any repository, but what about the relationships between entities, and the relationships with other complex objects? How exactly will these be implemented in practice? For the purposes of linked data, we also have to wait until EPrints 3.2.1 for metadata in the RDF+XML format.

    To this end, although UKOLN cannot offer a publicly accessible test repository with user access, we hope wherever possible to implement and test these pieces of repository software for their usability with SWAP, IAP, TBMAP, GAP and DC-Ed in the first instance, since the majority of repositories in the UK HE sector use these platforms. Of course, we would also like to do the same with Fedora at some point in the future. However, if you have evidence of any such implementations, even for test purposes, and if you are happy for us to evaluate these, we would be very happy to hear from you.

  • JISC Repositories and Preservation Programme Meeting, 6-7 May 2009

    Posted on May 8th, 2009 Talat Chaudhri 1 comment

    Application profiles received considerable attention at the two-day Repositories and Preservation Programme Meeting held by JISC at the Aston Business School, Birmingham.

    Workshop: Application Profiles in Practice, 6 May 2009

    This was an event in two parts: firstly, an introduction to the user testing methodology being developed by the AP Support project in collaboration with the IEMSR and the IE Demonstrator project; secondly, an iteration of the paper prototyping element of the user testing. On this occasion the audience was comprised largely of experts rather than an especially representative group of typical users – quite understandably, given the nature of the meeting. (While it is very helpful to engage repository managers in user testing, it is more difficult to involve entirely non-specialist users, so there is a need for further work in facilitating this.) The session proved to be a success in raising considerable interest in current developments in application profiles.

    It was always the intention to use this particular event as a platform for consulting colleagues in the repositories community about the usefulness of the approach. In this respect, the workshop was highly successful: attendees responded positively to the intention of engaging users in order to analyse and address the strengths and weaknesses of the various application profiles, raising some insightful questions and contributing to an animated debate. Rachel Bruce of JISC commended the workshop in her speech closing the Programme Meeting on the following day.

    “Working with the Repositories Community: WRAP Project” (Jenny Delasalle, Warwick University), 6 May 2009

    Jenny Delasalle referred to the difficulties faced in pioneering an implementation of SWAP in an institutional repository based on EPrints 3.0. Unlike in its successor EPrints 3.1, versioning was unsupported at the time, which to a great extent hampered the SWAP effort in WRAP at Warwick. She considered that in its present form, SWAP represents too complex a metadata model for adoption by the typical IR. But since there is not necessarily a need to employ all of the SWAP metadata terms (any more than one would necessarily need to employ all of the terms in DC Simple or Qualified DC), it must be presumed that the FRBR structure and the lack of automated means to populate fields with structural metadata represent a significant part of the problem. It would be useful to get a clarification from Jenny on this.

    That the feasibility of complex metadata schemas could be radically improved by the use of text mining to autopopulate metadata fields, thus requiring far less input and/or correction from the user, was raised later in the Forum in the discussion “How can text mining support repository tasks?”, convened by James Farnhill of JISC and led principally by Brian Rea of NaCTeM, University of Manchester. This would be of obvious and immediate relevance to the liklihood of SWAP being more widely implemented, whether in its present form or following the recommendations from the user testing effort.

    Repositories Roadmap Session (Rachel Heery, external consultant for JISC), 7 May 2009

    Rachel Heery gave a summary of her Digital Repositories Roadmap Review, revised from the original version by herself and Andy Powell in 2006.  Recommendation 11 referred to SWAP specifically, proposing a cut-down version without the FRBR entity-relationship model and a re-analysis of the sort undertaken in the current user testing programme; Recommendation 12 made an interesting reference to OAI-ORE in the context of SWAP.

    Recommendation 11: Explore deployment of a cut down version of SWAP, possibly at the copy level, retaining the cataloguing rules to ensure a consistent approach to linking to full text. Evaluate whether use of SWAP is consistent with a Web architecture approach to repositories.

    Recommendation 12: Explore use of OAI-ORE to enable applications to handle complex objects. Demonstrate how OAI-ORE facilitates the re-use of research outputs and research data. Clarify different roles of OAI-ORE and SWAP.

    Outcomes

    There was considerable discussion of SWAP on Twitter among colleagues at Eduserv, UKOLN and elsewhere on both days of the meeting, focussing on both the structure and implementation of SWAP as it was originally intended, and in response to Rachel Heery’s recommendations. The need to solve the lack of implementation of the Dublin Core Application Profiles appears to have regained significant impetus from the interest in the series of user testing events planned by UKOLN. In particular, new impetus has been given to the SWAP implementation effort, in which expectations had previously subsided. Given Rachel Heery’s review, it is clear that SWAP may need to be considered once more as an ongoing project rather than a past product that failed to gain support, and one that may need substantial revision in future iterations. It is important to keep an open mind about the nature of those revisions, which should be conditioned by the results of the ongoing user testing effort.