Application profiles and metadata for repositories
RSS icon Email icon Home icon
  • Practical metadata solutions using application profiles

    Posted on September 21st, 2010 Talat Chaudhri 2 comments

    Past and present

    Up until the present, a number of application profiles have been developed by various metadata experts, with the support of the JISC, with the intention of addressing the needs of practitioners and service providers (and thus ultimately their users) across the higher education sector in the UK. The most significant of these have been aimed at particular resource types that have an impact across the sector.

    car gear lever showing the word "metadata"
    Their names indicate the approach that has been taken to date, e.g.:

    • SWAP – Scholarly Works Application Profile
    • IAP – Images Application Profile
    • GAP – Geospatial Application Profile
    • LMAP – Learning Materials Application Profile (scoping study only: also the DC Education AP)
    • SDAPSS – Scientific Data Application Profile Scoping Study
    • TBMAP – Time-Based Media Application Profile

    Problems with this approach

    However, it cannot be said that a particular type of resource type, set of resource types, or even general subject domain actually constitutes a real, identified problem space that faces large sections of the information community in the UK higher education sector today. Geospatial resources can be any type of resources that have location metadata attached (e.g. place of creation, location as the subject of the resource). Learning materials can be any type of resource that has been created or re-purposed for educational uses, which can include presentations, academic papers, purpose-made educational resources of many types, images, or indeed almost anything else that could be used in an educational context, to which metadata describing that particular use or re-use has been attached. Images might have all sorts of different types of metadata: for instance, metadata about images of herbs might need very different metadata to images of architecture. The same applies to time-based media: what is the purpose of these recordings and what are they used for? why and how will people search for them? Likewise, the type of science in question, of which there are almost innumerable categories and sub-categories, will to a large extent determine the specific metadata that will be useful for describing scientific data.

    Of all of the above, only scholarly works, which might more usefully be called scholarly publications, are an entirely focussed, specific set of resource types with a common purpose. The others are loose and sometimes ill-defined collections of resources or resource types that fit into a particular conceptual category. Only in the case of scholarly publications is there an unspoken problem space: discovery and re-use in repositories and similar systems, usually but not exclusively as Open Access resources. There are other related problem spaces such as keeping accurate information about funders and projects for the purposes of auditing that is required by funding bodies and university authorities. The ability to access these resources with new technologies could be a further area of study, and is one that UKOLN is taking an active interest in. Again, the question must be “what do users want to do with these resources?”

    Current Approaches

    It must not be said that the work in creating the application profiles mentioned above has been wasted. At the same time, the above application profiles constitute general purpose solutions that do not target specific problems affecting identifiable communities of practice across the sector. There is considerable work continuing in Dublin Core Metadata Initiative (DCMI) circles on how metadata modelling should best be carried out, for instance on the Dublin Core Abstract Model (DCAM) and on the overlap between application profiles and linked data, where those application profiles contain relationships that can better enable resource discovery in a linked data world.

    New Approaches

    These approaches remain useful. However, more immediate, specific problem spaces face particular university services (not all of which are necessarily repositories) in trying to describe resources so that they can be discovered, providing copyright and other licensing information so that they can be re-used, providing funding information so that work can be audited and cases can be constructed for funding new projects, and so on. Some of these resources may be textual, but others are increasingly including images (of many types and for many purposes), music, film, audio recordings, learning objects of many types, and large scale corpora of data. Any metadata solution that is tailored to a particular purpose (and, thus, which is usually de facto an application profile) needs to address specific aspects of the Web services that practitioners and other service providers are seeking to develop for their users, not simply provide general catch-all metadata of relatively generic use.

    Key to all this is consultation with those communities: first, to scope the most significant two or three problem spaces that face the largest number of resource providers in serving their users; second, to get those practitioners together with developers to draw up practical, workable recommendations and perhaps demonstrations; third, to provide tangible evidence to the developers of existing software platforms, and to engage with them to help solve such problems in practice. To do this, it is necessary to engage practitioners and deverlopers in practical, hands-on activities that can bring the discussion forward and provide tangible solutions.

  • Drupal, RDFa and the “fauxpository”

    Posted on May 19th, 2010 Talat Chaudhri 2 comments

    Drupal 7 is likely to be released soon, and will include native support for RDFa. The RDF module for Drupal 6 already allows this functionality. Why is this important? Because it makes relationships between resources much easier to describe through Drupal’s user-friendly interface and, in the process, would allow documents to be available as linked data.

    In Drupal terminology, a “node” is effectively a metadata record, and various Drupal modules enable the easy customisation of metadata. In effect, you could build a repository on the basis of Drupal, by-passing the need for platform-specific knowledge tied to open source software that has increasingly moved towards the “enterprise solution” space, along with all of the technical tie-in that it usually entails. For the service provider, it is not dissimilar to the tie-in experienced with commercial software, especially in the case of information librarians or other professionals who are not developers, or even developers are not part of that particular open source development team.

    Application Profiles are essentially structured metadata comprising elements and (usually) relationships, and are therefore inherently linked data solutions. They vary in complexity according to their particular functional requirements: for instance, in the world of scholarly publications, there is a spectrum between the straightforward, unstructured way that DSpace implements Dublin Core (which should perhaps be called the DSpace Application Profile), the simplified FRBR structure of the Scholarly Works Application Profile (SWAP) and the complex entity-relationship model of CERIF, the standard developed for Current Research Information Systems (CRISs). This latter is a de facto application profile, even if it is not normally referred to as such.

    Why should Drupal be any better than the repository platforms that already exist? In many ways, it depends on what you need to do with it, and on the resources at your disposal. But the advantage is that Drupal is a flexible Content Management Framework that is designed to be leveraged for any sort of content, and for new modules to be designed easily for new purposes. After all, what does a repository actually do that other websites cannot? They put metadata records and bitstreams (the actual documents or files) on the Web, and add a few additional services such as OAI-PMH, SWORD and statistics. But repositories are only a particular specialised subset of content management systems. Drupal is accessible to any PHP developer without any initial requirement of particular specialist platform knowledge, which is relatively easy to obtain. The community is large and support is quite easily available, as are modules that can be adapted for local purposes. It is designed to be easy to customise and theme.

    Sarah Currier recently talked about the idea of a “fauxpository”. If I remember correctly, she pointed out that it could even be based on WordPress. This is clearly a workable idea, although hardly suitable for production use as a university service. I would maintain that Drupal could easily be suitable for such a use with relatively little work, and could make use of and adapt application profiles in a way that the major open source repository platforms have been slow to do, and are still only just beginning to enable as something of an afterthought. UKOLN are investigating how Drupal can be used to make it possible to make use of the JISC’s Dublin Core Application Profiles (DCAPs), and using Drupal is intended to show how it can work independently of tie-in to any specific platforms.

  • Linked data and Dublin Core Application Profiles in EPrints 3.2.0

    Posted on March 23rd, 2010 Talat Chaudhri 2 comments

    EPrints 3.2.0 was released on 10th March 2010. It has some remarkable new features relating to linked data and, consequently, to Dublin Core Application Profiles based on multiple entity domain models such as SWAP, IAP and TBMAP (the GAP does not have a domain model). Here are the key points:

    Linked Data Support

    • Ability to establish arbitrary relations between objects or provide additional metadata in triple form.

    Semantic Web / Linked Data (RDF)

    We have made a (difficult) decision to move these features to 3.2.1 (due out soon after 3.2.0) because testing showed it caused a significant slow down.

    We’re rewriting it to do the same thing but with much less overhead!

    However, as may be seen on the EPrints wiki, the latter section read as follows until 11th March 2010:

    Semantic Web Support

    • RDF+XML Format
    • N3 Format
    • URIs for all objects, including non dataobjs. [sic] eg. Authors, Events, Locations.
    • BIBO Ontology
    • Extendable
    • URIs now use content negotiation to decide which export plugin to redirect to, based on mime-types supplied by plugins and the “accept” header.
    • Relations between eprints and documents

    If this is understood on face value, it appears that there has been significant progress in enabling features that would allow the full implementation of the JISC’s DCAPs based on the simplified FRBR model, although we must wait for some important details until the promised version 3.2.1, which is to be released “soon after 3.2.0″ according to the statement above. Although objects may be described with “arbitrary relations” and “additional metadata” (additional to what?) can be described in triple form, there are not yet URIs for all entities, such as Authors and so on. Presumably, the support for BIBO would be more demanding that the support required for the cut-down version of FRBR as seen, for example, in SWAP.

    This is all very promising, especially in the light of the same functionality being promised in DSpace 2.0, which were not yet implemented in the recent release of DSpace 1.6.0. However, all of this must come with the caveat that, until this is tried out in practice, it is not certain which levels of implementation are possible: clearly, the actual metadata fields can easily be adopted by any repository, but what about the relationships between entities, and the relationships with other complex objects? How exactly will these be implemented in practice? For the purposes of linked data, we also have to wait until EPrints 3.2.1 for metadata in the RDF+XML format.

    To this end, although UKOLN cannot offer a publicly accessible test repository with user access, we hope wherever possible to implement and test these pieces of repository software for their usability with SWAP, IAP, TBMAP, GAP and DC-Ed in the first instance, since the majority of repositories in the UK HE sector use these platforms. Of course, we would also like to do the same with Fedora at some point in the future. However, if you have evidence of any such implementations, even for test purposes, and if you are happy for us to evaluate these, we would be very happy to hear from you.