Posted on December 10th, 2010 No comments
EPUB is designed for reflowable content, meaning that the text display can be optimized for the particular display device used by the reader of the EPUB-formatted book. The format is meant to function as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale.
That is to say that ePub contains within it the Open Packaging Format (for convenience, we can ignore the other structural parts for the purposes of this discussion), which defines the structure of both the metadata for the item contained within the file and the presentational (XML, XHMTL, CSS) elements of the standard. It is similar in many ways to a .docx file (MS Word 2007 onwards) in being effectively a specialised type of .zip file.
So why is ePub of interest from the point of view of metadata and application profiles? The IDPF’s Open Packaging Format gives this description:
Dublin Core metadata is designed to minimize the cataloging burden on authors and publishers, while providing enough metadata to be useful. This specification supports the set of Dublin Core 1.1 metadata elements (http://dublincore.org/documents/2004/12/20/dces/), supplemented with a small set of additional attributes addressing areas where more specific information is useful. For example, the OPF role attribute added to the Dublin Core creator and contributor elements allows for much more detailed specification of contributors to a publication, including their roles expressed via relator codes.
Content providers must include a minimum set of metadata elements, defined in Section 2.2, and should incorporate additional metadata to enable readers to discover publications of interest.
In which case, how is the metadata contained within ePub any different to Dublin Core 1.1? This is the interesting part:
Because the Dublin Core metadata fields for creator and contributor do not distinguish roles of specific contributors (such as author, editor, and illustrator), this specification adds an optional role attribute for this purpose. See Section 2.2.6 for a discussion of role.
To facilitate machine processing of Dublin Core creator and contributor fields, this specification adds the optional file-as attribute for those elements. This attribute is used to specify a normalized form of the contents. See Section 2.2.2 for a discussion of file-as.
This specification also adds a scheme attribute to the Dublin Core identifier element to provide a structural mechanism to separate an identifier value from the system or authority that generated or defined that identifier value. See Section 2.2.10 for a discussion of scheme.
This specification also adds an event attribute to the Dublin Core date element to enable content providers to distinguish various publication specific dates (for example, creation, publication, modification). See Section 2.2.7 for a discussion of event.
Using these addition attributes, it is possible to define more accurately what certain fields contain, a standard, normalised format for agent metadata such as personal names, schemes defining the format in which a particular field is expected to appear, identifiers to provide a mechanism to link that metadata to the generating system or authority, and events to describe more accurately the events that have occurred during the life cycle of the item. By applying such constraints that are beyond the scope of DC 1.1, the ePub format effectively contains a de facto application profile, identified by its own namespace. Further, ad hoc metadata can be added using the (X)HTML meta element:
One or more optional instances of a meta element, analogous to the XHTML 1.1 meta element but applicable to the publication as a whole, may be placed within the metadata element [...]. This allows content providers to express arbitrary metadata beyond the data described by the Dublin Core specification. Individual OPS Content Documents may include the meta element directly (as in XHTML 1.1) for document-specific metadata. This specification uses the OPF Package Document alone as the basis for expressing publication-level Dublin Core metadata.
It would seem, however, that this last option suffers from the weakness that such metadata is invented on the fly, and does not have to follow the constraints of any schema or authority.
Nonetheless, it would seem overall that the ePub “application profile” does significantly add to the functionality DC 1.1 in a potentially useful way. Different types of agent defined in DC, such as creator, contributor, can be further defined, for example author, editor, illustrator and thesis supervisor for higher degrees. Potentially, this could be leveraged for use with a number of different types of resources and for various purposes, although ePub by it’s very nature is designed for reflowable content, which by and large means textual resources such as books, articles, manuals and so on. Illustrations, tables, charts, images and other non-reflowable content can potentially create a problem on the small screens of mobile devices such as ebook readers.
The structure of this application profile is very simple and easy to use, unlike for example the classic form of SWAP, whose structure is based directly upon its conceptual data model, a simplified version of FRBR. It would be extremely interesting to compare the two, since they are fundamentally similar, relatively simple solutions that are limited in scope to online publications and similar resources. It would be most revealing to see whether what SWAP seeks to achieve can be done in a simpler way, and whether either SWAP or the ePub application profile have functionality that the other cannot provide.
Ultimately, the purpose of this investigation could be to provide online textual content, for example in repositories, via increasingly popular hand-held devices, and to capitalise on the rapid growth of commercial ebooks. It would probably be necessary to provide .epub files in such systems as well as the usual .pdf and .doc(x) formats that are common in publishing, and consequently in institutional repositories. Either this would need to be done by converting the existing content, and likewise new content after it is deposited, or in addition by providing tools to enable the ePub format to be more immediately accessible to service providers and depositors in future.
UKOLN is holding an ePub event (unfortunately postponed due to inclement winter weather: new date to be announced), as a collaboration between the Application Profile Support Project and DevCSI, to investigate exactly these issues in a hands-on, practical way: the aim is to get repository managers and other information professionals together with developers and investigate the feasibility of demonstrator solutions that could encourage software development to enable repository content to be available to ebook readers in future.
The details of the rescheduled event will be announced here in due course. Watch this space!
Posted on August 6th, 2009 No comments
We’ve recently started trying out various methodologies for testing whether the different bits of application profiles work for the people trying to use those resources. The main thing to remember is that the approach must not be too technical: anybody ought to be able to understand what the metadata terms and the relationships between digital objects on the web are trying to achieve. This is hands-on metadata for real people!
So we’ve been to various meetings lately. The first one is perhaps the least relevant from most people’s point of view, the Metadata Registries Meeting at the Novotel Centre, York, 23-24 July 2009. We were seeking feedback and discussion of our methodology, as well as talking about a few technical possibilities, which was a useful thing to do. You may ask, what are registries? Well they aren’t the subject of this blog, but in brief they are places that allow people to share their metadata schemas, application profiles and so on, as well as to find tools to help them develop, build and maintain them over time. Remember that application profiles are living structures that should change as the metadata needs of your users in dealing with the resources that you provide change over time. We (UKOLN) operate a registry called IEMSR.
So what did we do for the people?! Well, first of all we went to the Institutional Web Managers Workshop 2009 (IWMW), 28-30 July 2009 because we felt that they are people who are focussed on making services work for users. It may have been an advantage in some ways that they weren’t by and large repository-related people and could look on things from a fresh perspective. It’s always good to get a range of different approaches: after all, won’t the users come to a repository, VRE, VLE or other service with a whole range of points of view and things they want to do? You can see a slightly ad hoc and only mildly embarrassing interview with me, Talat Chaudhri of UKOLN, explaining in about 20 seconds of profound unreadiness over coffee what it is that application profiles (should) do. (Why on earth did the kind editor choose that particular first frame to stop the video?!) Not a bad attempt, given the lack of coffee, I hope you may agree.
What we did was to get people to think about resources, and reasons why users would want to be looking at them. We played with post-it notes (also called stick-it notes elsewhere?) that had metadata terms written on them, and arranged them in logical groups that would help a person who was trying to perform focussed searches for resources. Any resource type will do: for instance, we tested it out before the session on “beach life: what you might find on a beach and what you might want to know about it”. It doesn’t even have to be sensible! In real life application profiles, however, you obviously need to think of the whole range of things that people will want to do with your resource. The best way: ask them! Don’t engineer things that people won’t want to use. The extra complexity creates the very real danger of making your structures difficult to search, which will put off the very users that are supposed to be using the service.
This method is called card sorting, and is quite well known. It does have some limitations, but we have already shown its usefulness in focussing attention on what users need to do. One limitation, for instance, is that it’s rather hard not to prejudice the process from the beginning. If you ask the participants to think of the scenarios that users might search for resources first, then participants will come with pre-ordained ideas that will tend to undermine the fresh analysis of user requirements that we are looking for. On the other hand, if you don’t let them know until they have already thought of the terms that they need to describe the resource, on the first try they will tend to organise them in ways that don’t work with the scenarios. Let us remember, though, that this is just the first iteration of a development cycle for metadata solutions. You need to take every new version back to the users and check that it does what they need it to.
A second limitation is that paper prototyping can’t produce the complex cross-links that you’d find in a real database. A third one is that it doesn’t begin to touch the importance of interface design to usability testing of metadata terms and structures. You may (or may not) need a complex data structure. However, your user should only see what they need to see in order to accomplish what they want. Anything else will actually hinder their use of the service, be it a web page or a repository deposit interface. That complexity can be generated behind the scenes by software, so that users are asked understandable, intuitive and above all useful questions that facilitate their end user experience. We’re also working on these areas.
We then went to the Repositories Fringe 2009 in Edinburgh, 30-31 July 2009. (You will see from the above dates that this was a bit of a marathon!) The session was broadcast live on the Web, and I hope that the recording will be made available before long. I will add a link here if/when that happens. Having learnt a little from the above session, we did more of the same. We learnt a lot about how to get user requirements, and even more about how not to do it!
We were asked if we were running a focus group. If people want application profiles like SWAP, IAP, GAP, TBMAP and so on to be implemented, we will certainly have to consult focus groups, but we will tell people when that is what we’re doing. First, however, we are trying to raise discussion about how we can analyse user requirements on an ongoing basis and transmit that hard evidence to developers, so that will have a reason to go to the trouble of incorporating it into their software releases. At present, we can’t show them sufficient evidence that these APs do what they are intended for, which is why repository software developers in particular have been understandably agnostic about APs. But the other thing that is crucial is to engage service providers and users. Why do they want to come? If they don’t get something out of the event that will improve their service or their knowledge, preferably both, they won’t come. This was as much an outreach and training event as a focus group.
We’re hoping that this is a good start towards an iterative, user-driven method for analysing existing APs for various purposes, as well as for designing new APs from scratch. We’re confident that it’s going well at the moment and that we are beginning to get answers. But the task of making your metadata fit the service that you provide is ongoing, because services also change over time. It’s best not to be too prescriptive, as different institutional or web services take different approaches to achieving the same things. We are aiming at a flexible, iterative, toolkit approach that works for as many people as possible, and offers a range of tools to implement relevant parts of an overall solution that work for the services and users concerned.
Lastly, the fact that we are reviewing APs should not be taken as a criticism of the ones that we have, even if deficiencies are found that need to be rectified, or new approaches taken. The work that was done in creating them has laid the groundwork for this new activity, which is aimed precisely at making the results of that work more useful in the community of web services that they were intended for. Change should be welcomed because needs and requirements change, along with our understanding of how best to analyse them.application profiles, user testing application profiles, APs, card sorting, DCAPs, GAP, IAP, IEMSR, interface design, iterative development, IWMW, jiscob, metadata, Metadata Registries Meeting, metadata terms, paper prototyping, repositories, Repositories Fringe, SWAP, TBMAP, usability testing, user requirements, user testing, VLEs, VREs
Posted on May 8th, 2009 1 comment
Workshop: Application Profiles in Practice, 6 May 2009
This was an event in two parts: firstly, an introduction to the user testing methodology being developed by the AP Support project in collaboration with the IEMSR and the IE Demonstrator project; secondly, an iteration of the paper prototyping element of the user testing. On this occasion the audience was comprised largely of experts rather than an especially representative group of typical users – quite understandably, given the nature of the meeting. (While it is very helpful to engage repository managers in user testing, it is more difficult to involve entirely non-specialist users, so there is a need for further work in facilitating this.) The session proved to be a success in raising considerable interest in current developments in application profiles.
It was always the intention to use this particular event as a platform for consulting colleagues in the repositories community about the usefulness of the approach. In this respect, the workshop was highly successful: attendees responded positively to the intention of engaging users in order to analyse and address the strengths and weaknesses of the various application profiles, raising some insightful questions and contributing to an animated debate. Rachel Bruce of JISC commended the workshop in her speech closing the Programme Meeting on the following day.
“Working with the Repositories Community: WRAP Project” (Jenny Delasalle, Warwick University), 6 May 2009
Jenny Delasalle referred to the difficulties faced in pioneering an implementation of SWAP in an institutional repository based on EPrints 3.0. Unlike in its successor EPrints 3.1, versioning was unsupported at the time, which to a great extent hampered the SWAP effort in WRAP at Warwick. She considered that in its present form, SWAP represents too complex a metadata model for adoption by the typical IR. But since there is not necessarily a need to employ all of the SWAP metadata terms (any more than one would necessarily need to employ all of the terms in DC Simple or Qualified DC), it must be presumed that the FRBR structure and the lack of automated means to populate fields with structural metadata represent a significant part of the problem. It would be useful to get a clarification from Jenny on this.
That the feasibility of complex metadata schemas could be radically improved by the use of text mining to autopopulate metadata fields, thus requiring far less input and/or correction from the user, was raised later in the Forum in the discussion “How can text mining support repository tasks?”, convened by James Farnhill of JISC and led principally by Brian Rea of NaCTeM, University of Manchester. This would be of obvious and immediate relevance to the liklihood of SWAP being more widely implemented, whether in its present form or following the recommendations from the user testing effort.
Repositories Roadmap Session (Rachel Heery, external consultant for JISC), 7 May 2009
Rachel Heery gave a summary of her Digital Repositories Roadmap Review, revised from the original version by herself and Andy Powell in 2006. Recommendation 11 referred to SWAP specifically, proposing a cut-down version without the FRBR entity-relationship model and a re-analysis of the sort undertaken in the current user testing programme; Recommendation 12 made an interesting reference to OAI-ORE in the context of SWAP.
Recommendation 11: Explore deployment of a cut down version of SWAP, possibly at the copy level, retaining the cataloguing rules to ensure a consistent approach to linking to full text. Evaluate whether use of SWAP is consistent with a Web architecture approach to repositories.
Recommendation 12: Explore use of OAI-ORE to enable applications to handle complex objects. Demonstrate how OAI-ORE facilitates the re-use of research outputs and research data. Clarify different roles of OAI-ORE and SWAP.
There was considerable discussion of SWAP on Twitter among colleagues at Eduserv, UKOLN and elsewhere on both days of the meeting, focussing on both the structure and implementation of SWAP as it was originally intended, and in response to Rachel Heery’s recommendations. The need to solve the lack of implementation of the Dublin Core Application Profiles appears to have regained significant impetus from the interest in the series of user testing events planned by UKOLN. In particular, new impetus has been given to the SWAP implementation effort, in which expectations had previously subsided. Given Rachel Heery’s review, it is clear that SWAP may need to be considered once more as an ongoing project rather than a past product that failed to gain support, and one that may need substantial revision in future iterations. It is important to keep an open mind about the nature of those revisions, which should be conditioned by the results of the ongoing user testing effort.application profiles, user testing Andy Powell, application profiles, Birmingham, Brian Rea, DC, DC Simple, DCAPs, Digital Repositories Roadmap Review, Dublin Core, Eduserv, EPrints, EPrints 3.0, Eprints 3.1, FRBR, IE Demonstrator, IEMSR, James Farnhill, Jenny Delasalle, JISC, jiscob, NacTeM, OAI-ORE, paper prototyping, Qualified DC, Rachel Bruce, Rachel Heery, Repositories and Preservation Programme Meeting, repository software, SWAP, text mining, Twitter, UKOLN, user testing, WRAP
Posted on April 29th, 2009 3 comments
On Monday 27 July, the first trial of methods for user testing for SWAP were conducted at UKOLN. This was very much an internal “dry run”, the success of which leaves us in a strong position to take every opportunity to repeat the exercise more widely within the repositories community.
In collaboration with the IEMSR and IE Demonstrator projects, which also have an interest in developing and implementing application profiles in repositories, we are very interested in developing methodologies for the evaluation of the Dublin Core Application Profiles (DCAPs) funded by JISC. Of these, SWAP has the best developed online presence and content in institutional repositories, as well as a strong and developed user community focussed on developing Open Access content on the Web.
Our current work is therefore focussed on SWAP in the first instance, but we naturally intend to develop the process of practical user testing for the other DCAPs. We are of course aware that the needs of different resource types and repository communities will differ very widely. This is the reason that we are interested in practical user testing within those communities, to ensure that theoretical approaches to constructing application profiles actually fill the needs and requirements than underpin the development of such content in repositories and other related services on the Web.
In many ways, it is fair to say that SWAP is the “lowest hanging fruit” for this endeavour, but the impact of getting application profiles right for such a large and growing proportion of repository content should not be underestimated. Being largely textual resources, scholarly works are likely to be an area where significant lessons can be learnt for other resource types with more specific constraints and requirements. It is intended that we conduct user testing of several other DCAPs during the summer of 2009, if possible, following initial work on SWAP.
It is perhaps worth remembering that developments in repository technology have come a long way since SWAP was first developed, the example of which the other DCAPs have tended to follow, especially in the matter of using the FRBR structure. It is by no means certain that this is the only way, or the best way, to create relationships in so-called complex objects, which is to say sets of resources that relate to each other as versions. In particular, OAI-ORE is an exciting development that may provide an alternative, although its relevance for this purpose needs to be carefully evaluated and compared to existing approaches and technologies. It will not do to simply adopt the newest, coolest approach without a careful analysis of how the needs of users relate to the functionality that is presently available. If these do not correlate well within the software contexts currently in use in the community, the application profiles will fail accordingly.
It has become obvious that implementation of the DCAPs has been slow. In the case of SWAP, which has been around the longest, that lag has become a profound apathy towards efforts to implement the application profile widely in repositories. It is not even clear how best this should be done, as neither methods nor benefits have been convincingly demonstrated. It would be a great shame if the investment of expertise in improving the metadata vocabulary were wasted because the structure has not been successfully integrated into repositories. This mismatch must be understood and resolved if the situation is to be turned round and the expected gains of SWAP and the other DCAPs are to be realised.
There are a variety of technological approaches to application profiles that will need careful study, once the user testing brings a better understanding of how users need to relate particular types of resources together in repositories. These may include the Description Set Profile approach, traditional XML with XML Schema and OAI-ORE. But more importantly, it must be shown beyond doubt how the DCAPs fit the applications that users need them for, and which changes may be required, before software developers will have the motivation to address those needs by implementing the DCAPs in the major repository platforms.
Posted on March 12th, 2009 No comments
At present we are working on practical evaluation and user testing of SWAP, the first and most fully developed of the DCAPs funded by JISC. The aim of this work is to enable us to report on how well SWAP fits the real needs of repository managers in their day-to-day work. On this basis, we intend to organise further practical events, both for repository managers and for repository platform developers. The hope is to provide an impetus for SWAP to be supported by the major repository platforms in an appropriate form. This feedback should provide a sound basis to continue the development cycle of SWAP and improve it over time.
We are not forgetting, however, the needs of the other resource types for which DCAPs have been developed, and the process is intended to be an iterative one, learning from the experience of SWAP in the first instance and inviting domain specialists for each resource type to help adapt the process for the needs of the section of the repository community involved with that particular resource. We invite those user communities and specialists to engage with us in the same process as outlined for SWAP above.
Web resources have living, changing needs and user communities, so we believe that their application profiles should reflect this. Obviously, there is a balance to be struck between developing and maintaining useful standards that can be relied upon, and meeting these changing requirements. The only way to do this properly is to use inclusive methods, consulting domain specialists and real users as much as possible.