Developer Labs

PIMMS workshop, Bristol: Web engineering for research data management

I’m attending the PIMMS (Portable Infrastructure for Metafor Metadata System) workshop today, in an anonymous office off Bristol’s Berkeley Square where, unusually, the sun is shining and the room is warm. The Metafor project referenced in the PIMMS acronym is all about metadata for climate data, so the weather is on everybody’s minds today. It’s a slightly cosier meeting than expected, because it turns out that there’s a huge conference on climate science simulation currently taking place somewhere in India and relatively few people have been able to resist its siren call.

PIMMS is a software stack designed to support the development and collection of metadata. Initially, it was developed to “document the provenance of computer simulations of real world earth system processes”[*], but as is traditional for almost any infrastructure designed to support different types of experiment, the thought has occurred that the approach may be more broadly generalisable. It’s designed to allow for direct user (=scientist) involvement and the platform consists of, essentially, a sequence of stages of form-filling and design  processes, each of which fulfil a purpose:

  • ‘experiment’-level description documents (at a high level, describing an overall scientific investigation in an area) – these include information such as description, author, rationale and so on and are encoded into CIM documents
  • ensemble-level descriptions (datasets, versus individual data files, although as is so often true with collection-level metadata, opinions may vary on how this works in any given context).
  • run-level descriptions: detailed component – based descriptions of individual experimental runs or even sessions.

Unusually and rather charmingly, PIMMS uses mindmapping as a design tool, viewing it as accessible to users. Whilst PIMMS clearly contains elements of the thinking that underlies UML-based design and uses UML vocabulary and tools in places, UML is ‘useful within a development team’, says Charlotte Pascoe, the PIMMS project manager, but it is not meant for end-users.

PIMMS counts among its potential benefits an increase in sheer quantity, quality and consistency of metadata provided. The underlying methods and processes can, in theory at least, also be generalised. A mindmap could be built for any domain, parsed into a formal data structure, automagically built or compiled into a web form and applied on your metadata. The process for building a PIMMS form goes more or less as follows.

  1. Get someone to install a copy of the PIMMS infrastructure (software stack) for you, or indeed do it yourself.
  2. Work out what you are trying to achieve. Write an introductory explanation, and export it to XML.
  3. Identify and invite along stakeholders.
  4. Invite them to build a visual representation (a mindmap) of a domain with you.
  5. Press ‘Go’ and generate a web form suitable for input of metadata.

If this sounds somewhat familiar, folks, it is because the concepts underlying PIMMS have a long and honourable background in software engineering. Check out the Web Semantics Design Method (De Troyer et al, 2008), which specifies the following process for engineering a web application – my own comments in parentheses to the right:

  1. Mission statement (what, when you get down to it, are you trying to achieve?)
  2. Audience modeling (for whom?)
  3. Conceptual design (with what? about what? doing what?)
  4. Implementation design (using what?)
  5. Implementation (Well, what are you waiting for? Go on, do it.)

WSDM, as described here, owes much to the waterfall model of software engineering (although, one would assume, there is nothing stopping subsequent iteration through phases) – see for example Ochoa et al (2006). To my eyes, the PIMMS metadata development process would appear to implement about half of WSDM in a less analytical and more user-centric model, encouraging direct input from the scientists likely to use the data.

The distinction, primarily, is in the implementation design and implementation phase; the PIMMS infrastructure compiles your conceptual design/structure, as represented in the mind map you have developed, into an XML structure from which PIMMS can build user-facing forms. After that compilation phase, further implementation work is essentially cosmetic, presentational work such as skinning the form. PIMMS removes the majority of implementation decisions from the user by making them in advance. Much as SurveyMonkey consciously limits the user’s design vocabulary to elements that may be useful for your average survey, PIMMS essentially offers a constrained vocabulary of information types and widgets.

I don’t make the comparison between PIMMS and SurveyMonkey lightly. The PIMMS project itself uses the terminology of ‘questionnaires’. PIMMS-based forms have a lot in common with SurveyMonkey, too;  incrementally developing the form whilst still retaining your previously collected data is not a straightforward operation. That may be a good thing – that way, you know which version of the input questionnaire your data came from – but on the other hand, incremental tinkering can sometimes be a useful design approach too…

The day continues. The sun subsides and the room is cooling fast. The geographers in the room, climate modellers of anything from the Jurassic to the Quaternary, go through a worked example of developing a PIMMS questionnaire. They discover a minor problem: the dates in the PIMMS forms don’t represent the usage of dates in palaeoclimate research, which are measured in ‘ka’ – thousands of years. This is a problem inherited from the UM, the Met. Office Unified Model [numerical modelling system].

Faster than you could say ‘Precambrian’, we are out of time. There has not been a chance to view the generated metadata collection form in use, which I regret slightly as it is the most common scenario in which the attendees will work with the system. Still, it was a worthwhile day. Workshop attendees  have voiced an interest in working with the software in future. As for me, after this glimpse into the future of palaeoclimate data management, I find myself thinking back to my past life in web engineering. I wonder whether, like  palaeoclimatologists, research data managers  could develop their expectations of the future by exploring the literature of times past…

De Troyer, O.,  Casteleyn, S.  and Plessers, P. (2008). WSDM: Web Semantics Design Method. In: Rossi et al (eds.), Web Engineering: Modeling and Implementing Web Applications. [see]

Sergio F. Ochoa, José A. Pino, Luis A. Guerrero, César A. Collazos (2006). SSP: A Simple Software Process for Small-Size Software Development Projects. IFIP Workshop on Advanced Software Engineering 2006: 94-107