JISC Beginner's Guide to Digital Preservation

…creating a pragmatic guide to digital preservation for those working on JISC projects

DCA Guidelines for a Long-term Preservation Strategy

Posted by Marieke Guy on March 26th, 2012

The Digitising Contemporary Art (DCA) is a 30-month EU digitisation project for contemporary art. One of their deliverables is a set of Guidelines for a Long-term Preservation Strategy for Digital Reproductions and Metadata. The document has been described as “very readable: they use accessible language; no jargon, a pleasure to read” by one of my colleagues. The guidelines are available in PDF format from the DCA site.

Posted in Project news | Comments Off

Launch Workshop for DataFlow and ViDaaS

Posted by Marieke Guy on March 5th, 2012

Mark Thorley, data management co-ordinator for NERC set the tone for the day when he explained that “Data management is too important to leave to the data managers, it needs to be an important part of research“. The launch event, hald at the Saïd Business school, University of Oxford, on Friday 2nd March 2012 for two new UMF-funded infrastructure projects, was all about embedding research data management (RDM) into workflow using shared services. The UMF programme aims to help universities and colleges deliver better efficiency and value for money through the development of shared services.

Data Management at Oxford

Paul Jeffreys, director of IT, University of Oxford, gave an introduction to current data management practice at the University of Oxford. Currently activities in Oxford are varied and rarely co-ordinated. Although there is a RDM portal comprising of a research skills toolkit, RDM checklist, a University statement on research data management (based on the University of Edinburgh’s ’10 commandments’) and a training programme in place there are many people/areas they are failing to meet. One area for concern is non-funded research (i.e. people for whom their research is their life’s work). It remains very tricky to build in generic support and activities need to be flexible.

Introduction to DataFlow

DataFlow was introduced by David Shotton, the DataFlow PI. DataFlow is a collaborative project led by the University of Oxford. It is a two-tier data management infrastructure that allows users to manage and store research data. The project builds on a prototype developed in the JISC-funded ADMIRAL project.

The first tier, called DataStage, is a file store which can be accessed through private network drives or the web. Users can upload research data files and the service is backed up nightly. DataStage is likely to be used by single research groups and deployment can be on a local server or on an institutional or commercial cloud. There is optional integration with DropBox and other Web services.

The second tier is DataBank, which, through a web submission interface, allows users to select and package files for publication. Files are accompanied by a simple metadata and contain an RDF manifest, which is then displayed as linked open data. They are packaged using the BagIt service. Databank is a scalable data repository where data packages are published and released under a CCZero licence, though users can chose to keep data private or add an optional embargo period.

DataFlow is now at beta release v0.1. The DataFlow team are keen to build a user community and have lots of processes in place allowing users to comment on developments.

Introduction to ViDaaS

James Wilson, ViDaaS project manager introduced us to ViDaaS. Virtual Infrastructure with Database as a Service (ViDaaS) comprises of two separate elements. DaaS is a web based system that enables researchers to quickly and intuitively build an online database from scratch, or import an existing database. The virtual infrastructure (VI) is an infrastructure which enables the DaaS to function within a cloud computing environment, it is known as the ORDS service – Online research database service. It builds on ideas developed in the JISC-funded sudamih projects The ViDaaS service currently has three business models:

  • £600 per year for a standard project (25gb)
  • £2000 per year for large project (100gb)
  • Later option for public cloud for hosting

ViDaaS is officially launching this summer.

Further details on interoperability between ViDaaS are contained within the Data Management Rollout at Oxford (DaMaRO) Project.

Both services are seen as being ‘sheer curation’. This is an approach to digital curation where curation activities are quietly integrated into the normal work flow of those creating and managing data and other digital assets. http://en.wikipedia.org/wiki/Digital_curation#Sheer_curation

So Why Use these Services?

Many of the other speakers from the day attempted to convince us of why we should use these services. It seems that despite the efforts of many, including the DCC data curation is often seen as a ‘fringe activity’. There are negligible rewards for creating metadata and there is a noticeable skills barriers in metadata– researchers have raw data – institutions have repositories that are empty. The principle of ‘sheer curation’ – allow tools to work with you rather than against you. It is an approach to digital curation where curation activities are quietly integrated into the normal work flow of those creating and managing data and other digital assets. Both DataFlow and ViDaaS offer integration with simple workflows and immediate benefits.

Use of shared infrastructure services is supported by JISC. They offer potential cost savings, transferability and reuse of tools.

The key for getting people to use the services lies in getting buyin from users and allowing flexibility. As user Chris Holland explained “we are inherently creative people are going to do things in our own way”. There is a need to make services flexible and intuitive as no system can be all things to all researchers.

What about the Cloud?

Peter Jones, Shared Infrastructure Services Manager at Oxford University Computing service began his session introducing the Oxford cloud Infrastructure with a quote from Randy Heffner: “The trouble with creating a “cloud strategy”? You’re focusing on technology, not business benefit.” He explained how the main barriers to cloud adoption include understanding costs, reliability (network), portability (lock-in), control, performance and security. However the biggest issue was inertia and reluctance to change. He concluded that a local private cloud overcomes a number of these issues and that the most likely approach is a public private hybrid

It is becoming apparent that the cloud exposes a cost that was previously hidden. However research institutions need to stand by the data they create, therefore the costs need to be observed and paid. James Wilson, ViDaaS project manager, observed that this is how libraries work, however it is not yet recognised in the research world in which people are still trying to offload costs on to other people.

The afternoon breakout allowed more interaction and discussion around some of the highlighted issues, primarily cost, the cloud and national services.

Resources from the day are available on the DataFlow Website.

Tags: , ,
Posted in Archiving, Events, irg, rdm | Comments Off

DPC Report: Preserving Email

Posted by Marieke Guy on February 21st, 2012

The Digital Preservation Coalition (DPC) has released a new report on Preserving Email, authored by Chris Prom, Assistant University Archivist, University of Illinois. The report (available as a PDF at http://dx.doi.org/10.7207/twr11-01 provides a comprehensive advanced introduction to the topic for anyone who has to manage a large email archive in the long term and offers practical advice on how to ensure email remains accessible.

Email is a defining feature of our age and a critical element in all manner of transactions. Industry and commerce depend upon email; families and friendships are sustained by it; government and economies rely upon it; communities are created and strengthened by it. Voluminous, pervasive and proliferating, email fills our days like no other technology. Complex, intangible and essential, email manifests important personal and professional exchanges. The jewels are sometimes hidden in massive volumes of ephemera, and even greater volumes of trash. But it is hard to remember how we functioned before the widespread adoption of email in public and private life.

The report is published by the DPC in association with Charles Beagrie Ltd.

Posted in Archiving | 1 Comment »

Getting Started in Digital Preservation

Posted by Marieke Guy on February 7th, 2012

A colleague of mine (Sarah Jones) from the Digital Curation Centre has pointed out a presentation she gave last year entitled ‘Getting Started in Digital Preservation’. The presentation slides are available on Slideshare and are also embedded below. They provide an excellent introduction for people just starting out in this area.

Posted in definition | Comments Off

Seasons Greetings!

Posted by Marieke Guy on December 22nd, 2011

As I’m now working for the Digital Curation Centre (DCC) I’m going to make use of their Christmas Card to wish you all a Merry Christmas and a Happy New year!

Posted in dcc | Comments Off

DCC Roadshow in Cardiff

Posted by Marieke Guy on December 15th, 2011

Snow, sleet, hailstones, rain and sunshine! The Cardiff weather couldn’t make up its mind, but the Digital Curation Centre (DCC) roadshow carried on regardless. Although I have attended various days of the travelling roadshow (Bath and Cambridge) I’ve never actually managed to catch a day one. The opening day is an opportunity to hear an overview of the research data management landscape and is also the day on which local case studies make it onto the agenda, so I was looking forward to it.

Welcome: Janet Peters, Cardiff University

Janet Peters, Director of University Libraries and University Librarian for Cardiff University, opened the day by saying how keen she was to have the roadshow take place locally; feeling it to be very timely given current research data management (RDM) work in Cardiff. Janet explained that her attendance of the Bath roadshow had kick-started Cardiff’s work in this area. Cardiff have recently revitalised their digital preservation group and have been providing guidance and assisting departments with implementing changes to their RDM processes – more on this later. They have also recently rolled out an institutional repository, though it doesn’t cover data sets (at the moment).

The Changing Data Landscape: Liz Lyon, UKOLN

Liz Lyon on The Changing Data Landscape

Liz set the scene for the day by outlining the current data landscape. She began by introducing the new BIS report entitled Innovation and Research Strategy for Growth which expresses the government’s support for open data and introduces the Open Data Institute (ODI). Only last week David Cameron made the suggestion that “every NHS patient should be a “research patient” with their medical details “opened up” to private healthcare firms”. Openness and access to data are two of the biggest challenges of the moment and have stimulated much debate. Liz gave the controversial example of one tobacco companies FOI request to the University of Stirling for information relating to a survey on the smoking habits of teenagers. She explained that proposed amendment to FOI data will allow institutions to ask for exemption to FOI requests when research is ongoing. It’s often the case that researchers don’t want to share data and there have been instances when governments have placed restrictions on data use(e.g. the bring your genes to cal project. Liz shared some examples of more positive cases of when research is shared e.g. Alzheimers research, 1000 Genomes Project, Personal Genome Project, openSNP. She also offered some citizen science examples: BBC nature, project Noah http://www.projectnoah.org/, Galaxy Zoo, Patients Participate, BBC Lab. The Panton Principles are a recent set of guidelines that offer possible approaches: Open knowledge, open data, open content and open service. To some degree the key to all of this is knowing about data licensing and the DCC offer advice in this area.

Liz then moved on to what is often seen as the biggest challenge of all: the sheer volume of data now created e.g. large hydron collider. In the genomics area there are lots of shocking statistics on the growth of data and the implications of this. Another new report phg foundation: Next steps in the sequence gives the implications of this data deluge for the NHS. The text the Forth paradigm highlights data intensive research as being the next step in research. The DCC are working with Microsoft Research Connections to create a community capability model for data intensive research

It is apparent that big data is being lost, but so is small data (like excel spreadsheets) and part of the challenge is working out how scientists can deal with the longtail. What is framed as gold standard data is when you can fully replicate the code and the data, reproducible research is the second best approach. Data storage needs to be scalable, cost-effective, secure, robust and resilient, have a low entry barrier, have ease of use. Liz also also asked us to consider the role of cloud services, giving Dataflow http://www.dataflow.ox.ac.uk/, VIDaaS, BRISSKit, lab notebook as 4 JISC projects to follow in this area.

Liz then talked a little about policy, giving research council examples. The most relevant is the fairly demanding EPSRC expectations that have serious implications for HEIs: Institutions must provide a RDM roadmap by 1st May 2012 and must be compliant with these expectations by 1st May 2015. At the University of Bath, where Liz is based, there is a new project called research360@Bath and they have a particular emphasis on faculty-industry focus. There will also be a new data scientist role based at UKOLN. A full list of funders and their requirements is available from the DCC Web site.

Resources are available and the Incremental project http://www.lib.cam.ac.uk/preservation/incremental/ back in 2010 found that many people felt that institutional policies were needed in the RDM area. Edinburgh have developed an aspirational data management policy. The DCC have pulled together exemplars of data policy information http://www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies, ANDS also have a page on local policy.

It is also important to consider how you incentivise data management? There is quite a lot of current work on impact, data citation and DOIs. Some example projects: Total Impact http://total-impact.org/ and SageCite.

And what about the cost? Useful resources include the Charles Beagrie report on Keeping Research data safe http://www.beagrie.com/jisc.php, Neil Begrie has done some work into helping people articulate the benefits through use of a benefits framework tool.

In conclusion Liz asked delegates to think about the gaps in their institution.

Digital Data Management Pilot Projects: Sarah Philips, Cardiff University

Sarah explained how at Cardiff the University had retention requirements for quite a lot of corporate records and permanent records. They also have requirements for some of their research data for 5 -30 years. The University has set up three pilot projects in response to feedback from a digital preservation policy in the cultural area, in the school of Biosciences using genomic data and in the school of history and archaeology. Work in the school of history and archaeology department is now coming to a close and this is the area Sarah would concentrate on.

Three projects within the department were used as a test bed. The South Romanian Archaelogical Project (SRAP) at the University had collected excavation data and the team have been keen to make the data available. The Magura Past and Present Project had artists coming in and creating art; because the project was an engagement project it was required that the outputs be available, though not necessarily the data. The final project was on auditory archaeology. All three projects were run by Doctor Steve Mills.

Records management audits were carried out through face-to-face interviews with staff using the DCCs Data Asset Framework. Questions included: what records and data are held? How are the records and data managed and stored? What are the member of staffs requirements? A data asset register was created that dealt with lots of IP issues, ownership issues etc. Once this data was collected potential risks were identified e.g. Dr Mills had been storing data on any other hard-drives available but he didn’t have a systematic approach to this, there was some metadata available but file structure was an issue, proprietary formats were used and there are no file naming procedures in place. Dr Mills was keen to make the data accessible so the RDM team have been looking at depositing it with the Archaeology Data Service, if this solution isn’t feasible they will have to use an institutional solution.

High Performance Computing and the Challenges of Data-intensive Research: Martyn Guest, Cardiff University

Martyn started off by giving an introduction to advanced research computing
at Cardiff (ARCAA) which was established in 2008. Chemistry and physics have been the biggest users of high performance computing so far, but the data problem is relatively new and has really arisen since the explosion of data use by the medical and humanities schools.

He sees the challenges as being technical (quality performance, metadata, security, ownership, access, location and longevity), political (centralisation vs departmental), governance, ownership) and personal, financial (sustainability), legal & ethical (DP, FOI). Martyn showed us their data intensive supercomputer (‘Gordon’) and a lot of big numbers (for file sizes) were banded about! Gordon runs large-memory applications (supermode) – 512 cores, 2 TB of RAM, and 9.6 TB of flash. It has been the case that NERC has spent a lot of time moving data leaving less effort for analysing the data.

Martyn shared a couple of case studies: Positron Emission Tomography Imaging (PET) data where the biggest issues were that the data was raw, researchers weren’t interested in patient identifiable data but want image while clinicians wanted PID and image. He talked about sequencing data , which is now relatively easy, the hard bit is using biometrics on the data. As Martyn explained it now costs more to a analyse a genome than to sequence it and the big issue is sharing that data. Martin joked that the “best way to share data is by Fedex”, many agreed that this may often be the case! The case studies showed that in HPC it’s often a computational problem. HPC Wales has three various components to it including awareness building around HPC and the creation of a welsh network that can be accessed from anywhere and globally distributed.

Martyn concluded that the main issues are around how to do the computing efficiently while the archiving issues continue to be secondary.

Research Data Storage at the University of Bristol: Caroline Gardiner, University of Bristol

Caroline Gardiner explained that at the University of Bristol her team had originally carried out a lot of high performance computing but were increasingly storing research data. She noted that the arts subjects are increasingly creating huge data sets.

Caroline admitted to collecting horror stories of lost data and using this as a way to leverage support. The Bristol solution has been BluePeta which has been created using £2m funding and is a petascale facility. This facility is purely for research data at the moment, not learning and teaching data, thought is an expandable facility.

Caroline explained that their success in this area came from many directions. Bristol already had a management structure in place for HPC and for research data storage, they had access to the strategy people and those who held the purse strings. Bristol also have a research data storage and management board, there continues to be buy in from academics.

The process in place is that the data steward (usually principal investigator PI) applies and can register one or more projects. There is then academic peer review and storage policies applied. There is a cost model in place, the data steward gets 5TB free and then have to pay £400 per TB for annum disk storage. They are encouraging PIs to factor in these costs when writing their research grant applications. The facility is more for data that needs to be stored over the long term rather than active data.

Bristol are also exploring options for offsite storage and will also be looking at an annual asset holding review. They are also looking at preparing an EPSRC roadmap and starting to address wider issues of data management.

In answer to a question Caroline explained that they had made cost analysis against 3rd party solutions but when using the big players (like Google and Amazon) the cost of moving the data was the issue. There was some discussion on peer-to-peer storage but delegates were concerned that it would kill the network.

Data.bris: Stephen Gray, University of Bristol

Following on from Caroline’s talk Stephen Grey talked about what was happening on the ground through data.bris. Stephen explained that the drivers for the project were meeting the funder requirements (not just EPSRC), also meeting the publisher requirements and using research data in the REF and to increase successful applications. Bristol have agreed a digital support role alongside the data.bris project, though this ia all initially limited to the department of arts and humanities.

The team will be initially meeting with researchers and using the DMPOnline tool to establish funder requirements and ethical, IPR and metadata issues. After the planning there will be the research application and then hopefully research funding. The projects will then have access to BluePeta storage. The curation is planned to happen at the end of the project and high valued data identified for curation. Minimal metadata should be added at this stage, though there is a balancing act here between resourcing and how much metadata is added. Bristol have a PURE research management system and data.bris repository where they can check the data and carry out metadata extraction and assign DOIs. They will then promote and monitor data use

In the future the team also want to look into external data centres use. A theme running through the project is ongoing training and guidance and advocacy and policy. Training will need to go to all staff including IT support and academic staff and they are hoping for some mandatory level of training.

Bristol are also planning on using the DCC’s CARDIO and DAF tools

In the Q&A session delegates were interested in how Bristol had received som much top-down support for this work. It was explained that the pro VC for research ws a scientist and understood the issues. While there was support for research data it was felt that there could do with being more support for outputs.

Herding Cats – Research Publishing at Swansea University: Alexander Roberts, Swansea University

Alexander Roberts started off his presentation by saying that Swansea wants it all: all data, big data, notes scribbled on the back of fag packets, ideas, searchable and mineable data. Not only this but Swansea would like it all in one place, currently they have a lot of departmental data bases and various file formats in use. Swansea looked at couple of different systems including PURE but wanted an in house content management system, they also inherited a DSpace repository. They wanted this system to integrate with their TerminalFour Web CMS, with their DSpace system Cronfa and to give RSS feeds for staff research profiles, give Twitter feeds, Facebook updates etc. There was a consultation process that allowed lots of relationships to be formed and the end users to be involved. People were concerned that if they passed over their data they wouldn’t be able to get it back. A schema was created for the system. They started off using Sharepoint and were clear that they wanted everything in a usable format for the REF. The end result was built from the ground up: a form-based research information system that allowed researchers to add their outputs as easily as possible. It is a simple form based application that integrates with the HR database and features DOI resolving, MathML. The ingest formats are RSS, XML, Excel, Acess and others. It provides Open Data Protocol (oData) endpoint which provides feeds to the Web CMS and personal RSS feeds.

Alexander ended by saying that in 2012 they would like to implement automatic updates to DSPACE via SWORD and a searchable live directory of research outputs. They also want to have enhanced data visualisation tools for adminstrators. Mobile consideration is also a high priority as Swansea have a mobile first policy.

Michael Day and Alexander

Delivering an Integrated Research Management & Administration System: Simon Foster, University of Exeter

A Research Management and Administration System (RMAS) is more about managing data about projects but can also deal with research data. The Exeter project has been funded under the UMF, funded by HEFCE through JISC and is part of the HEFCE bigger vision of cloud computing and join up of systems. HE USB is being used: a test cloud environment from Eduserv. Simon Foster described how the project had started with a feasibility study which looked at whether there was demand for a cradle to grave RMAS system, 29 higher education institutions expressed interest. The project was funded and it was worked out that 29 HEIs phased in over ten years could save £25 million. The single supplier approach was avoided after concerns that it could kill all others in the market. The steering group looked at the processes involved and these were fed into a user requirement document. It was necessary that it was cloud enabled and were compliant with CERIF data exchange. Current possible systems include Pure, Avida etc. Specific modules were suggested. The end result will be a framework in place that will allow institutions to put out a mini-tender for RMAS systems asking specific institution related questions. Institutions should be able to do this in 4 weeks rather than 6 months.

The next steps for the project are proof of concept deliverables using CERIF data standards and use of externally hosted services. They also want to work with other services, such as OSS Watch.

There followed a panel session which included questions around the cost implications of carrying out this work. One suggestion was to consider the cost of failed bids due to lack of data management plans.

What can the DCC do for You?: Michael Day, UKOLN

Michael Day finished off the day with an overview of the DCC offerings and who they are aimed at (from researchers to librarians, from funders to IT services staff). He reiterated that part of RDM is bringing together different people from disparate areas and clarifying their role in the RDM process. The DCC tools include CARDIO, DAF, DMP Online, DRAMBORA. Some of the services include policy development, training, costing, workflow assessment etc. DCC resources are available from the DCC Website.


So after a day talking about data deluge while listening to a deluge of the more familiar sort (loud hail and rain) we were left with a lot to think about.

One interesting insight for me were that while the data deluge had come originally from certain science areas (astronomy, physics etc.) now more and more subjects (including arts and social sciences) are creating big data sets. One possible approach, advocated by a number of the day’s presenters, is to use HPC as a starting point from which to jolt start research data management. However there will continue to be a lot of data ‘outside of the circle’. As ever, join up is very important. Getting all the stakeholders together is essential, and that is something the DCC roadshows do very well. All presentations from the day are available form the DCC Web site.

The next roadshow will take place from 7 – 8 February 2012 in Loughborough. It is free to attend.

Posted in Conference, dcc, irg | Comments Off

30 Seconds to Comply..

Posted by Marieke Guy on December 13th, 2011

Or a month in the case of TwapperKeeper….

The TwapperKeeper site, which I’ve mentioned many a time on this blog, now features a worrying message –

Twapper Keeper’s archiving is now available in HootSuite! As a result, we will be shutting down Twapper Keeper. Existing archives will be kept running until Jan 6, 2012, after which you will not be able to access your archives anymore. Thanks for using TwapperKeeper – we look forward to seeing you at HootSuite.

This could cause people to panic but luckily help is at hand. The following posts all offer advice on how to extract your archives and preserve them for the future.

I actually used Martin Hawksey’s excellent (and easy to use) Google Spreadsheet [Twitteralytics v2] to pull out a number of Digital Curation Centre archives. It took me a relatively short amount of time and they are now available as public Google docs spreadsheets and as excel files in our Sharepoint.

It’s clear that our reliance on third party services is increasingly requiring us to keep on our toes when it comes to digital preservation. Who is to know which services will dissapear in 2012…?

Posted in Project news | Comments Off

UK Web Archive Advent Calendar

Posted by Marieke Guy on December 1st, 2011

Feeling festive yet?

The UK Web Archive has just launched their 2011 Advent Calendar. Every day they will post about a different resource that’s freely available in the UK Web Archive. You’ll be able to find out a little bit about the resource and browse it via the UKWA website.

Day one has kicked off with the ‘Stolen Votes: Save Democracy’ site which was archived as part of the UK General Election 2005 collection.

For more information see the UK Web archive blog.

Posted in Project news | Comments Off

Alliance for Permanent Access Conference

Posted by Marieke Guy on November 16th, 2011

Last week (8th – 9th November) I attended the Alliance for Permanent Access (APA) annual conference in London. The APA aims to develop a shared vision and framework for a sustainable organisational infrastructure for permanent access to scientific information.

The event was held at the British Medical Association House, a fantastic setting for an event. It was a really interesting conference which provided a chance to hear about lots of great digital preservation projects.

There were a lot of really interesting plenaries so I’ve summarised a few of my personal favourites:

Digital Preservation What Why Which When With? – Prof. Keith Jeffery, Chair of APA Executive Board.

Unfortunately the European Commissioner Nellie Kroes couldn’t made it so Keith, outgoing chair of the Alliance, gave the keynote instead. Keith reflected on the history of digital preservation starting with the legendary story of the Doomsday book and the chameleon project. Keith talked about the importance of keeping digital resources accessible, understandable and easy to find. He gave an overview of some of the value judgements that need to be made, the standards (OAIS) and best practice (looking at projects like Parse and Aparsen). Keith also emphasised the role of the APA in this area, pulling together digital preservation research.

ODE Project – Dr Salvatore Mele, CERN

Salvatore Mele introduced the Opportunities for Data Exchange (ODE) project, which is about sharing data stories. Currently there are lots of incentives for research but not for preservation and the transition from science to e-Science has resulted in a data deluge that needs serious attention! Salvatore talked about the impossible triangle of reuse, (open) access and preservation – each leans heavily on the other. ODE has considered both the carrot and stick approaches (which have some value e.g the carrot of sharing big data has incentives to research not preservation) but isn’t enough. Mele explained that if there was no stick and no carrot we may to work one by one with researchers to encourage sharing. ODE offers a way to reduce the friction in research data management through awareness raising. The ODE Project booklet Ten Tales of Drivers & Barriers in Data Sharing is definitely worth a read.

Mr Mark Dayer, Consultant Cardiologist, Taunton & Somerset NHS Trust

It was really refreshing to hear the view of an outsider. Mark Dayer is not involved in digital preservation, he is a consultant cardiologist – he operates on hearts. Mark gave an incredibly open and entertaining presentation on the state of play in the National Health Service (NHS). He began by giving some background for the non-UK residents in the audience: “The NHS is a beloved institution that no political party dare dismantle” – or at least it used to be. Unfortunately the NHS and IT has made for grim headlines in the recent past and the NHS has enormous quantities of data and an enormous number diverse systems working locally and in unconnected ways. Many people are still working with paper based systems .Not only this but the NHS needs to make £20 billion of savings. Mark explained how an increasing number of systems (120 different clinical systems in use in one area) and bad IT planning has added to the problem. Other issues such as data security add to the mix: the ‘spine’ personal records system should hold over 50 million records but only has 5 million so far.

After the disaster story Mark moved on to the small successes that have started to happen. He explained that they are starting to build data centres, use the cloud (e.g. Chelsea and Westminster hospital) and use integration engines (which give an idea of number of data standards). He talked about the systems and standards including CDA, HL-7, ICD-10 (classification system), OPCS, SNOWMED-CT and about the new N3 VPN. Mark concluded by saying that it wasn’t just about the right software, but about the right hardware too, and that you need to bring people with you, all the way

Dr Martha Anderson, Director of the NDIIPP, US Library of Congress, Networks as evolving infrastructure for digital preservation

Martha Anderson started off by showing us a picture of the biggest Web seen. She explained that the old African proverb “when spiders unite they can take down a lion” applies here. Almost a dozen spider families were involved in the creation of this Web, the population had exploded due to wet conditions. Martha applied this analogy to digital preservation networks, telling us that we need our network will evolve if the conditions are right. The National Digital Information Infrastructure and Preservation Program (NDIIPP) was created to help create networks between people to undertake preservation – communities working together as bilateral and multi-lateral alliances.
Many different institutions are now involved in digital preservation and in developing alliances across communities. A good example is the blue ribbon task force which cut across sectors including the financial, scientific, aerospace and HE. Other sectors have much to offer us, for example Martha has learnt about video metadada annotation from Major League Baseball! The Data-PASS network gives a picture of what networks are doing. Martha concluded that it is all about setting up and supporting social interaction and local interaction to set up networks – finding common stories. She felt that if there was no local benefit for work then it cannot be sustained and that it cannot last past the funding. Martha observed that it is interesting that groups of institution will act in public interest but in their own interest on their own. Networks are beneficial to all.

UK Government views, Nigel Hickson, Head EU and International ICT Policy DCMS

Nigel Hickson was there to talk about the government’s responsibility for the digital infrastructure which includes the take up of broadband and copyright issues. Nigel began by singing the praises of the Riding the Wave report that was released 2010 by the high level expert group on research data, the Knowledge Exchange. He talked about the importance of having a framework and a holistic approach. For many broadband is an economic driver, mobile data continues to be a disruptive element (doubling every year) and all this spells game change for the public sector. The problem is that mobile data is increasing; the solution is having an ‘auction’ to increase capacity. The current UK approach is that the market should lead and that competition is vital. Britain’s superfast broadband strategy has 530 million to spend by 2015 and potential for an extra 300 million before 2017. Projects require price match from the private sector. The government also wants things to be digital by default, with the option of doing them offline if necessary. Other key priorities are a rights management infrastructure and the proposal on orphan works.

Nigel also outlined the European digital agenda where broadband is again a critical element. The key European targets are for basic broadband by 2013 for 100% citizens. By 2020 50% of households should have subscription of 100Mbits ps or above.

The Report A Surfboard for Riding the Wave builds on the 2010 report and presents an overview of the present situation with regard to research data in Denmark, Germany, the Netherlands and the United Kingdom and offers broad outlines for a possible action programme for the four countries in realising the envisaged collaborative data infrastructure.

Posted in Conference, Events | Comments Off

Had your Heart Broken by Data Loss?

Posted by Marieke Guy on October 13th, 2011

Then maybe it’s time to share the pain…

The National Digital Stewardship Alliance (NDSA) Outreach group are collecting stories about data loss and preservation. If you’ve had your heart broken or uplifted by data loss or preservation, please fill out the form at http://j.mp/datastories.

There’s no deadline, but they will probably be taking a look at what they have in November.

Posted in Case studies | Comments Off