Posted on February 11th, 2013 No comments
This blog has now been frozen. Comments have been disabled and we do not intend to publish further posts. The Application Profile Support Project was funded by the JISC from September 2008 to February 2011.
We have published the following statistics for future reference. They are intended to inform others about the lifecycle of the blog and assist people wishing to reuse resources by identifying the authors of articles etc.
Active Dates: From 12 December 2009 to 1 March 2011
Number of posts: 12 published posts
Number of comments: 11
Akismet statistics: 6,221 spams caught, 11 legitimate comments, and an overall accuracy rate of 99.79%.
Author of posts: Talat Chaudhri, UKOLN – 12 posts
Details of blog theme: Gear 1.3.6 by Mr Mobiles
Details of technology used:
Embedded Vimeo was used inside iframes in some posts, and some images are linked to their originals uploaded to Flickr.
Details of type and version of software used:
The blog was running on WordPress 3.4.2 at the time of archiving.
This blog is licenced under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License. Comments are NOT covered under this licence except where those made by the blog author. Some embedded items (such as images) may not be covered by this licence unless those that were produced by UKOLN staff. Vimeo and Flickr content belonging to UKOLN is covered by this licence except where the terms and conditions of those services may indicate otherwise that those services have rights that may impose restrictions upon its use.
Posted on March 1st, 2011 No comments
The Picture This! event on image metadata was held at Dev8D+ on 15 February 2011 at the University of London Union (ULU) in Bloomsbury, London. It led into the Picture This! Developer Challenge at Dev8D on 16/17 February 2011.
The morning began with a brief, practical introduction to application profiles for image metadata by Talat Chaudhri of the Application Profiles Support Project, aimed at getting the attendees to think about the kinds of metadata solutions required for the specific problems that face them in dealing with images within the public-facing systems that they run in their institutions. He invited the attendees to think about the sorts of metadata and the kinds of relationships between images as related web resources that might be required in these systems.
The attendees then delivered some lightning talks, outlining the sorts of problem spaces that they are seeking to deal with in order to deliver image resources more successfully to users. Most of these centred around EXIF, IPTC, as well as ISO 19139 for geospatial metadata. Other metadata standards that were mentioned included NISO MIX and VRA. The talks were focussed on a range of issues including embedding image metadata within images, extracting metadata from images on services such as Flickr using the relevant API, auto-generation and enrichment of metadata, and visual surfacing of copyright and other information for users from embedded metadata within images. There was surprisingly little interest in managing relationships between images, for example where various different types of post-processing of a particular image has resulted in multiple related images that stem from the same original. It was also quite notable that comparatively little attention was given to subject metadata by attendees at the event. Holding the event at an event for developers may explain the relative lack of interest in these areas, which have been more significant issues at meetings of the Metadata Forum.
After lunch, the attendees formed into groups that included both metadata practitioners and developers, to address the issues raised by the lightning talks. They later delivered pitches based on these ideas, several of which fed into the Picture This! Developer Challenge at Dev8D. In addition to the attendees themselves, a number of additional developers who attended Dev8D+ offered their advice and collaboration, and dropped in and out of the afternoon session. This sharing of expertise highlighted the value of the collaborative approach taken at Dev8D, as well as directly helping practitioners with the problems that they had outlined during the morning session. Particular mention should be made of Ben O’ Steen and Ben Charlton for the considerable help that they gave to developers and practitioners throughout the day.Ben O’ Steen talks about Picture This! and Dev8D
As part of theDev8D Developer Challenge, the Picture This! event offered prizes of Amazon vouchers for first and second prize for those who came up with the most innovative and practical solutions to identified problems using image metadata.
The first prize of a £50 Amazon voucher went to Robert Baker and Roger Greenhaigh for their work in extracting embedding copyright information from images and dynamically modifying images to include a banner including a logo displaying the licence, e.g. the specific type of Creative Commons licence, and the copyright holder. The judges felt that this relatively simple but highly effective idea again had enormous potential within the UK HE sector, not least as a time-saving device with instant visual impact that could be used widely by anybody wanting to know whether or not and how they could re-use a particular image.Rober Baker and Roger Greenhaigh’s Entry for the Developer Challenge
The second prize of a £25 Amazon voucher went to Bharti Gupta for her work in embedding geospatial image metadata within map images, for example climate data, an idea that has enormous potential for re-use within the UK HE sector. By embedding the metadata in this way, the problem of managing images and metadata separately is removed and machine processing and transmission of map images over the web is significantly enriched without the need for metadata harvesting.
“Before”: Bharti Gupta talks about Picture This!“After”: Bharti Gupta’s Entry for the Developer Challenge
Ianthe Hind and Scott Renton worked on enriching image metadata using a range of techniques, the most ambitious of which was image recognition. Ianthe’s work on this challenge at Dev8D deserves special mention for the huge effort and wide range of technologies that she investigated for auto-generation of metadata. She showed that commonly available image recognition software is not yet capable of delivering the functionality that developers need to be able to make use of existing rich metadata on the web to describe new images of known objects, places or landmarks, which would avoid the need for constant duplication and time-consuming repetitious metadata entry.
“Before”: Ianthe Hind talks about Picture This!“After”: Ianthe Hind’s Entry for the Developer Challenge
The organisers of the event intend to follow up on and document these outputs, and ensure that they feed back into future meetings of the Metadata Forum. The day was highly successful, the attendees were enthusiastic and motivated, and the Dev8D event format was at its best in bringing practitioners with practical problems together with developers to address real, tractable problems and produce immediate solutions and demonstrations to solve them.application profiles application profiles, automated metadata generation, Ben Charlton, Ben O 'Steen, Bharti Gupta, Creative Commons, Dev8D+, Developer Challenge, developers, EXIF, Flickr, Flickr API, geospatial, geospatial metadata, image recognition, images, Images Application Profile, IPTC, ISO 19139, metadata, Metadata Forum, NISO MIX, Picture This!, Robert Baker, Roger Greenhaigh, VRA
Posted on January 31st, 2011 No comments
The AP Support project and the Metadata Forum will be holding an event at Dev8D+ on 15 February (the day before Dev8D, 16-17 February), to bring practitioners who work in services dealing with images and image metadata together with developers to build practical solutions to their problems!
There will be prizes as part of the Dev8D Developer Challenge. More details can be found here on the Dev8D wiki.
Posted on December 10th, 2010 No comments
EPUB is designed for reflowable content, meaning that the text display can be optimized for the particular display device used by the reader of the EPUB-formatted book. The format is meant to function as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale.
That is to say that ePub contains within it the Open Packaging Format (for convenience, we can ignore the other structural parts for the purposes of this discussion), which defines the structure of both the metadata for the item contained within the file and the presentational (XML, XHMTL, CSS) elements of the standard. It is similar in many ways to a .docx file (MS Word 2007 onwards) in being effectively a specialised type of .zip file.
So why is ePub of interest from the point of view of metadata and application profiles? The IDPF’s Open Packaging Format gives this description:
Dublin Core metadata is designed to minimize the cataloging burden on authors and publishers, while providing enough metadata to be useful. This specification supports the set of Dublin Core 1.1 metadata elements (http://dublincore.org/documents/2004/12/20/dces/), supplemented with a small set of additional attributes addressing areas where more specific information is useful. For example, the OPF role attribute added to the Dublin Core creator and contributor elements allows for much more detailed specification of contributors to a publication, including their roles expressed via relator codes.
Content providers must include a minimum set of metadata elements, defined in Section 2.2, and should incorporate additional metadata to enable readers to discover publications of interest.
In which case, how is the metadata contained within ePub any different to Dublin Core 1.1? This is the interesting part:
Because the Dublin Core metadata fields for creator and contributor do not distinguish roles of specific contributors (such as author, editor, and illustrator), this specification adds an optional role attribute for this purpose. See Section 2.2.6 for a discussion of role.
To facilitate machine processing of Dublin Core creator and contributor fields, this specification adds the optional file-as attribute for those elements. This attribute is used to specify a normalized form of the contents. See Section 2.2.2 for a discussion of file-as.
This specification also adds a scheme attribute to the Dublin Core identifier element to provide a structural mechanism to separate an identifier value from the system or authority that generated or defined that identifier value. See Section 2.2.10 for a discussion of scheme.
This specification also adds an event attribute to the Dublin Core date element to enable content providers to distinguish various publication specific dates (for example, creation, publication, modification). See Section 2.2.7 for a discussion of event.
Using these addition attributes, it is possible to define more accurately what certain fields contain, a standard, normalised format for agent metadata such as personal names, schemes defining the format in which a particular field is expected to appear, identifiers to provide a mechanism to link that metadata to the generating system or authority, and events to describe more accurately the events that have occurred during the life cycle of the item. By applying such constraints that are beyond the scope of DC 1.1, the ePub format effectively contains a de facto application profile, identified by its own namespace. Further, ad hoc metadata can be added using the (X)HTML meta element:
One or more optional instances of a meta element, analogous to the XHTML 1.1 meta element but applicable to the publication as a whole, may be placed within the metadata element [...]. This allows content providers to express arbitrary metadata beyond the data described by the Dublin Core specification. Individual OPS Content Documents may include the meta element directly (as in XHTML 1.1) for document-specific metadata. This specification uses the OPF Package Document alone as the basis for expressing publication-level Dublin Core metadata.
It would seem, however, that this last option suffers from the weakness that such metadata is invented on the fly, and does not have to follow the constraints of any schema or authority.
Nonetheless, it would seem overall that the ePub “application profile” does significantly add to the functionality DC 1.1 in a potentially useful way. Different types of agent defined in DC, such as creator, contributor, can be further defined, for example author, editor, illustrator and thesis supervisor for higher degrees. Potentially, this could be leveraged for use with a number of different types of resources and for various purposes, although ePub by it’s very nature is designed for reflowable content, which by and large means textual resources such as books, articles, manuals and so on. Illustrations, tables, charts, images and other non-reflowable content can potentially create a problem on the small screens of mobile devices such as ebook readers.
The structure of this application profile is very simple and easy to use, unlike for example the classic form of SWAP, whose structure is based directly upon its conceptual data model, a simplified version of FRBR. It would be extremely interesting to compare the two, since they are fundamentally similar, relatively simple solutions that are limited in scope to online publications and similar resources. It would be most revealing to see whether what SWAP seeks to achieve can be done in a simpler way, and whether either SWAP or the ePub application profile have functionality that the other cannot provide.
Ultimately, the purpose of this investigation could be to provide online textual content, for example in repositories, via increasingly popular hand-held devices, and to capitalise on the rapid growth of commercial ebooks. It would probably be necessary to provide .epub files in such systems as well as the usual .pdf and .doc(x) formats that are common in publishing, and consequently in institutional repositories. Either this would need to be done by converting the existing content, and likewise new content after it is deposited, or in addition by providing tools to enable the ePub format to be more immediately accessible to service providers and depositors in future.
UKOLN is holding an ePub event (unfortunately postponed due to inclement winter weather: new date to be announced), as a collaboration between the Application Profile Support Project and DevCSI, to investigate exactly these issues in a hands-on, practical way: the aim is to get repository managers and other information professionals together with developers and investigate the feasibility of demonstrator solutions that could encourage software development to enable repository content to be available to ebook readers in future.
The details of the rescheduled event will be announced here in due course. Watch this space!
Posted on November 10th, 2010 1 comment
Please note: postponed due to bad weather and cancellations.
What is ePub and what has it got to do with application profiles? ePub is a standard file format for ebooks. It contains some metadata fields that include some simple mandatory elements and can include more elements as required. Any local metadata solution drawing elements from various sources is effectively an application profile. So ePub itself is a file format, but it contains some simple core metadata.
What we would like to do is investigate how useful, achievable and discoverable it is possible to make repository content available on ebook readers using the ePub format. We are bringing developers together with library and repository professionals to participate in a practical, hands-on hackday designed to find solutions to converting repository content into the ePub format and evaluating its metadata requirements in the context of ebooks.
More information is available here. If you are interested in getting repository content onto ebook readers, and in metadata, this event is for you!
Posted on September 21st, 2010 2 comments
Past and present
Up until the present, a number of application profiles have been developed by various metadata experts, with the support of the JISC, with the intention of addressing the needs of practitioners and service providers (and thus ultimately their users) across the higher education sector in the UK. The most significant of these have been aimed at particular resource types that have an impact across the sector.
- SWAP – Scholarly Works Application Profile
- IAP – Images Application Profile
- GAP – Geospatial Application Profile
- LMAP – Learning Materials Application Profile (scoping study only: also the DC Education AP)
- SDAPSS – Scientific Data Application Profile Scoping Study
- TBMAP – Time-Based Media Application Profile
Problems with this approach
However, it cannot be said that a particular type of resource type, set of resource types, or even general subject domain actually constitutes a real, identified problem space that faces large sections of the information community in the UK higher education sector today. Geospatial resources can be any type of resources that have location metadata attached (e.g. place of creation, location as the subject of the resource). Learning materials can be any type of resource that has been created or re-purposed for educational uses, which can include presentations, academic papers, purpose-made educational resources of many types, images, or indeed almost anything else that could be used in an educational context, to which metadata describing that particular use or re-use has been attached. Images might have all sorts of different types of metadata: for instance, metadata about images of herbs might need very different metadata to images of architecture. The same applies to time-based media: what is the purpose of these recordings and what are they used for? why and how will people search for them? Likewise, the type of science in question, of which there are almost innumerable categories and sub-categories, will to a large extent determine the specific metadata that will be useful for describing scientific data.
Of all of the above, only scholarly works, which might more usefully be called scholarly publications, are an entirely focussed, specific set of resource types with a common purpose. The others are loose and sometimes ill-defined collections of resources or resource types that fit into a particular conceptual category. Only in the case of scholarly publications is there an unspoken problem space: discovery and re-use in repositories and similar systems, usually but not exclusively as Open Access resources. There are other related problem spaces such as keeping accurate information about funders and projects for the purposes of auditing that is required by funding bodies and university authorities. The ability to access these resources with new technologies could be a further area of study, and is one that UKOLN is taking an active interest in. Again, the question must be “what do users want to do with these resources?”
It must not be said that the work in creating the application profiles mentioned above has been wasted. At the same time, the above application profiles constitute general purpose solutions that do not target specific problems affecting identifiable communities of practice across the sector. There is considerable work continuing in Dublin Core Metadata Initiative (DCMI) circles on how metadata modelling should best be carried out, for instance on the Dublin Core Abstract Model (DCAM) and on the overlap between application profiles and linked data, where those application profiles contain relationships that can better enable resource discovery in a linked data world.
These approaches remain useful. However, more immediate, specific problem spaces face particular university services (not all of which are necessarily repositories) in trying to describe resources so that they can be discovered, providing copyright and other licensing information so that they can be re-used, providing funding information so that work can be audited and cases can be constructed for funding new projects, and so on. Some of these resources may be textual, but others are increasingly including images (of many types and for many purposes), music, film, audio recordings, learning objects of many types, and large scale corpora of data. Any metadata solution that is tailored to a particular purpose (and, thus, which is usually de facto an application profile) needs to address specific aspects of the Web services that practitioners and other service providers are seeking to develop for their users, not simply provide general catch-all metadata of relatively generic use.
Key to all this is consultation with those communities: first, to scope the most significant two or three problem spaces that face the largest number of resource providers in serving their users; second, to get those practitioners together with developers to draw up practical, workable recommendations and perhaps demonstrations; third, to provide tangible evidence to the developers of existing software platforms, and to engage with them to help solve such problems in practice. To do this, it is necessary to engage practitioners and deverlopers in practical, hands-on activities that can bring the discussion forward and provide tangible solutions.
Posted on July 16th, 2010 No comments
Talat Chaudhri and Stephanie Taylor submitted an entry to the Developer Challenge at Open Repositories 2010 in Madrid, which was received with some interest because it used Open Calais to automatically create links to related content. This “quick and dirty” entry was made at the last minute, so only the main features worked. It comes out of UKOLN’s work in creating a new, interactive Drupal site (soon to be launched) as a focus for their work on various metadata activities, including but not limited to application profiles, aimed at providing a hub of user-facing, community documentation. The demand for such a central focus of metadata information was raised separately at the first meeting of the Metadata Forum.
Posted on May 19th, 2010 2 comments
Drupal 7 is likely to be released soon, and will include native support for RDFa. The RDF module for Drupal 6 already allows this functionality. Why is this important? Because it makes relationships between resources much easier to describe through Drupal’s user-friendly interface and, in the process, would allow documents to be available as linked data.
In Drupal terminology, a “node” is effectively a metadata record, and various Drupal modules enable the easy customisation of metadata. In effect, you could build a repository on the basis of Drupal, by-passing the need for platform-specific knowledge tied to open source software that has increasingly moved towards the “enterprise solution” space, along with all of the technical tie-in that it usually entails. For the service provider, it is not dissimilar to the tie-in experienced with commercial software, especially in the case of information librarians or other professionals who are not developers, or even developers are not part of that particular open source development team.
Application Profiles are essentially structured metadata comprising elements and (usually) relationships, and are therefore inherently linked data solutions. They vary in complexity according to their particular functional requirements: for instance, in the world of scholarly publications, there is a spectrum between the straightforward, unstructured way that DSpace implements Dublin Core (which should perhaps be called the DSpace Application Profile), the simplified FRBR structure of the Scholarly Works Application Profile (SWAP) and the complex entity-relationship model of CERIF, the standard developed for Current Research Information Systems (CRISs). This latter is a de facto application profile, even if it is not normally referred to as such.
Why should Drupal be any better than the repository platforms that already exist? In many ways, it depends on what you need to do with it, and on the resources at your disposal. But the advantage is that Drupal is a flexible Content Management Framework that is designed to be leveraged for any sort of content, and for new modules to be designed easily for new purposes. After all, what does a repository actually do that other websites cannot? They put metadata records and bitstreams (the actual documents or files) on the Web, and add a few additional services such as OAI-PMH, SWORD and statistics. But repositories are only a particular specialised subset of content management systems. Drupal is accessible to any PHP developer without any initial requirement of particular specialist platform knowledge, which is relatively easy to obtain. The community is large and support is quite easily available, as are modules that can be adapted for local purposes. It is designed to be easy to customise and theme.
Sarah Currier recently talked about the idea of a “fauxpository”. If I remember correctly, she pointed out that it could even be based on WordPress. This is clearly a workable idea, although hardly suitable for production use as a university service. I would maintain that Drupal could easily be suitable for such a use with relatively little work, and could make use of and adapt application profiles in a way that the major open source repository platforms have been slow to do, and are still only just beginning to enable as something of an afterthought. UKOLN are investigating how Drupal can be used to make it possible to make use of the JISC’s Dublin Core Application Profiles (DCAPs), and using Drupal is intended to show how it can work independently of tie-in to any specific platforms.
Posted on March 23rd, 2010 2 comments
EPrints 3.2.0 was released on 10th March 2010. It has some remarkable new features relating to linked data and, consequently, to Dublin Core Application Profiles based on multiple entity domain models such as SWAP, IAP and TBMAP (the GAP does not have a domain model). Here are the key points:
Linked Data Support
- Ability to establish arbitrary relations between objects or provide additional metadata in triple form.
Semantic Web / Linked Data (RDF)
We have made a (difficult) decision to move these features to 3.2.1 (due out soon after 3.2.0) because testing showed it caused a significant slow down.
We’re rewriting it to do the same thing but with much less overhead!
However, as may be seen on the EPrints wiki, the latter section read as follows until 11th March 2010:
Semantic Web Support
- RDF+XML Format
- N3 Format
- URIs for all objects, including non dataobjs. [sic] eg. Authors, Events, Locations.
- BIBO Ontology
- URIs now use content negotiation to decide which export plugin to redirect to, based on mime-types supplied by plugins and the “accept” header.
- Relations between eprints and documents
If this is understood on face value, it appears that there has been significant progress in enabling features that would allow the full implementation of the JISC’s DCAPs based on the simplified FRBR model, although we must wait for some important details until the promised version 3.2.1, which is to be released “soon after 3.2.0″ according to the statement above. Although objects may be described with “arbitrary relations” and “additional metadata” (additional to what?) can be described in triple form, there are not yet URIs for all entities, such as Authors and so on. Presumably, the support for BIBO would be more demanding that the support required for the cut-down version of FRBR as seen, for example, in SWAP.
This is all very promising, especially in the light of the same functionality being promised in DSpace 2.0, which were not yet implemented in the recent release of DSpace 1.6.0. However, all of this must come with the caveat that, until this is tried out in practice, it is not certain which levels of implementation are possible: clearly, the actual metadata fields can easily be adopted by any repository, but what about the relationships between entities, and the relationships with other complex objects? How exactly will these be implemented in practice? For the purposes of linked data, we also have to wait until EPrints 3.2.1 for metadata in the RDF+XML format.
To this end, although UKOLN cannot offer a publicly accessible test repository with user access, we hope wherever possible to implement and test these pieces of repository software for their usability with SWAP, IAP, TBMAP, GAP and DC-Ed in the first instance, since the majority of repositories in the UK HE sector use these platforms. Of course, we would also like to do the same with Fedora at some point in the future. However, if you have evidence of any such implementations, even for test purposes, and if you are happy for us to evaluate these, we would be very happy to hear from you.
Posted on October 6th, 2009 No comments
One of the anecdotal remarks that is said a lot about SWAP in particular, but also as a general opinion about the JISC DCAPs, is that they are based on domain models that are too complex. But too complex for what?
Too complex to fit with how real repositories work?
Too complex to create usable input forms?
Too complex for users to understand? Do we mean real end users, or service providers such as repository managers? Do we mean anybody who is using the web forms to input metadata about digital objects, both content providers (who may also be users of the content) and repository managers?
It seems that there is an evidential problem innate in all of these assertions. It’s also worth remembering that not all resource types, and hence application profiles, are equal in this regard – nor are all users, content providers and repository managers. It’s also fair to say that sufficient work has not yet been done on investigating interface design and usability to be able to say for certain that the complexity of a data model necessarily makes the input forms difficult to use. There is an aspect of back-end software design to this question as well: the input forms may very well be simplified if the software can intelligently suggest relationships for the user to agree or reject, and generate as much of the record behind the scenes as possible.
Current work at UKOLN is aimed at solving these evidential problems by providing a methodology for investigating the best way to construct metadata
I can’t unequivocally answer the question of whether the JISC DCAPs have too complex data models to fit with the way that the most common repository platforms organise their records. It appears, however, that DSpace 1.5 does not yet support entity-relationships models, and that EPrints has its own data model. However, the use of DCAPs as exchange formats has already been shown to be a fruitful alternative approach, as EPrints has already got a SWAP export plug-in to do this. It is generally asserted that Fedora can already handle any data model. It is for the repository platform developers, ultimately, to provide the final answer to these questions. It’s clear that a lot of work is going on to address some of these issues. For example, it has been said that DSpace 2.0 will support entity-relationship models.
What I can say, however, is that the inability to support a back-end entity-relationship model does not by any means restrict a particular software platform from making use of an application profile, although there may well be a considerable demand in terms of development time in making the necessary functionality available. This is because there is clearly another alternative, namely emulating the entity-relationship model. To begin to understand this possibility, it’s necessary to take a close look at how the JISC DCAPs have been constructed, and the different classes of metadata that you find within them:
- metadata about the digital object(s) themselves, i.e. the usual stuff in any repository
- metadata elements relating to the semantic relationships between entities, i.e. isExpressedAs, IsManifestedAs, IsAvailableAs and variations thereof. These exist purely for the sake of the particular model that has been chosen, here a reduced form of FRBR. It is interesting that the dc:creator field, which is “real” metadata about the object, is the only link to the Agent identity, which may be seen, from the perspective of the object, as an entity that exists to express more detail about an item of metadata describing the object itself. In fact, it is an independent entity that could relate to multiple unrelated objects, of course.
- identifiers: these are specific to the repository instance and application profile in question. Of course, all digital objects on the Web require at least one URI to identify them (in practice, nearly always a URL that also locates them). However, the entity model required by the application profile, if it isn’t emulated as described hereafter, cuts up a compound digital object in such a way that it is possible to apply further identifiers to each entity as discreet metadata records.
It must be said that this is NOT the only way to do relationships between digital objects. It’s perfectly possible to use, for example, OAI-ORE (or plain RDF) resource maps in place of the second of these types of “metadata” here. In fact, they are really not metadata about the object at all, because they describe the relationships of different parts of that metadata to each other: so they are really meta-metadata! It could be said that identifiers don’t describe the actual objects either, merely locate their metadata descriptions, so they are also meta-metadata. Change the way you do the modelling, and the meta-metadata may change – however, this is NOT true of the “real” metadata (title, author, image size etc) that describe the object itself.