JISC Beginner's Guide to Digital Preservation

…creating a pragmatic guide to digital preservation for those working on JISC projects

Archive for May, 2010

Preserving your Tweets

Posted by Marieke Guy on 28th May 2010

Recently there has been lots of talk about preserving Tweets, especially since the Library of Congress agreed to take on the archive.

I’ve just written an article for FUMSI on this (probably out in September). FUMSI are the FreePint people who publish tips and articles to help information professionals do their work.

One area I looked at was the tools available to archive tweets. We wrote quite a lot about this on the JISC PoWR project. Here is a taster of the most current tools out there:

  • Print Your Twitter service which creates a PDF file of an accounts tweets.
  • WordPress Lifestream plugin which allows you to integrate Twitter with your blog and so archive using blog capabilities.
  • What the Hashtag allows you to create an HTML archive and RSS feed based on a hashtag.
  • Tweetdoc service allows you to create a PDF file that brings together all the tweets from a particular event or search term.
  • Twappr Keeper allows users to create a notebook of tweets for a hashtag.
  • The Archivist Desktop is a desktop application that runs on your local system and allows you to archive tweets for later data-mining and analysis for any given search.

Other approaches include:

  • Searching Twemes the site mashes Twitter with Flickr, Delicious and other services.
  • Searching FriendFeed brings back much older tweets than Twitter but you are reliant on users being members of the service
  • Subscribing to certain Twitter feeds by email and then applying an email filter to them.

Some of these tools are covered in more detail in the JISC PoWR blog post: Tools For Preserving Twitter Posts.

Anyone got any other suggestions?

Posted in Archiving, Web | Comments Off

Getting a Structure in Place

Posted by Marieke Guy on 24th May 2010

Last week I took some time out to brainstorm on a structure for the JISC Beginner’s Guide to Digital Preservation. The nice weather meant I could sit out in the garden and use my felt tips to create some useful mind maps. I now have a working structure which I hope to confirm this week after I’ve run it by a colleague.

The relatively short time-scale and limited people resources for this project have meant that the engagement I can have with my audience is limited. I have had to be selective about what I do and am relying heavily on data that is already out there rather than trying to gather my own. I probably won’t be able to run a survey of project workers in order to establish where their digital preservation needs lie – though there is a lot of information already available in this area (for example the Digital Preservation Coalition Mind the Gap survey was very useful, no need to read it all, this Ariadne article offers a useful summary). However I do intend to engage JISC project staff slightly further down the line and will be asking for feedback on the areas I propose covering.

Of course people are always welcome to comment on this blog.

Posted in Project news | 1 Comment »

Theory to Practice: Digital Preservation Case studies

Posted by Marieke Guy on 21st May 2010

I’m sure you’d agree that experience counts for a lot. In the digital preservation world when you need to do something a little tricky that you haven’t done before it can really help to have a case study close to hand. I am hoping that we will be able to include a number of these in the JISC Beginner’s Guide to Digital Preservation. Although we are on the hunt for new case studies there are already some available:

Digital Preservation Coalition Case Notes
The DPC have published a series of 4 case studies looking at the National Archives has approach to the UK’s Cabinet Papers, the Freeze Frame project’s use of their institutional repository, the Archival Sound Recordings 2 project’s use of METS and a complex digitisation project at the National Library of Wales
SCARP Project Case Studies
The Digital Curation Centre SCARP project (2007-2009) used a series of immersive case studies to identify disciplinary approaches to data deposit, sharing and re-use, curation and preservation
DCC Case Studies
The DCC also have several other case studies from the following projects: the Integrative Biology, JHOVE, PrestoSpace, CARMEN and Wide Field Astronomy Unit (WFAU)
JISC Digital Preservation Policies Study
A list of useful case studies covering institutional preservation policies are listed in the JISC Digital Preservation Policies Study carried out in 2008
JISC Preservation of Web Resources Case Studies
The JISC Preservation of Web Resources blog and handbook both offer case studies in the area of Web preservation
JISC Digital Media Case studies
JISC Digital Media hold case studies in their ‘Learning Lessons from Other Digitisation Projects’ area, although they are primarily about digitisation many do also cover preservation
AHDS Case Studies
The Arts and Humanties Data Service has now ceased to be but the Web site still houses case studies on digitisation and preseservation
National Organisations
There are a number of national library and large national organisation case study approaches available including the National Library of New Zealand, the Theater Instituut Nederland (TIN) and the National Library of Australia

Any one know of any other useful case studies available?

Posted in Case studies | 4 Comments »

KRDS2 and the cost of Digital Preservation

Posted by Marieke Guy on 20th May 2010

I’ve been taking a look at the final report for Keeping Research Data Safe 2 (KRDS2) which is now available from the JISC Web site. The KRDS2 study report presents the results of a survey of available cost information, validation and further development of the KRDS activity cost model, and a new taxonomy to help assess benefits alongside costs, it was conducted by Charles Beagrie Ltd. and associates.

One of the key findings of the report is on the long-term costs of digital preservation for research data:

The costs of archiving activities (archival storage and preservation planning and actions) are consistently a very small proportion of the overall costs and significantly lower than the costs of acquisition/ingest or access activities for all our case studies in KRDS2. As an example the respective activity staff costs for the Archaeology Data Service are Access (c.31%), Outreach/Acquisition/Ingest (c.55%), Archiving (c.15%).

The conclusions are drawn from 13 survey responses for different cost datasets. Bearing in mind the Blue Ribbon Task Force Report and its economic framework it seems to me that research into preservation costing tools and cost benefit analyses are fairly key at this moment in time.

Posted in Reports | Comments Off

The Planets Survey on Managing Digital Content

Posted by Marieke Guy on 17th May 2010

A White Paper summarising the findings of the Planets (Preservation and Long-term Access through Networked Services) Market Survey is available to download from the Planets Web site.

The survey of 200 organisations worldwide and conducted by Tessella aimed to understand the requirements for long-term management of digital content. Contributors spanned libraries, archives, government, providers of digital library systems, museums and commercial organisations.

Key Findings

  • The volume of digital content organisations expect to archive will increase 25 fold in the next ten years, from a median of less than 20TB to over 500TB.
  • Over 80% of organisations already need to preserve content in simple formats, such as documents and images, for the long-term, by 2019, 70% will also need to preserve databases, websites, and audio and video files.
  • Ninety-three per cent of organisations recognise the challenges of preserving digital content for the long-term and many plan for it: 76% include it in their operational planning, 71% in business continuity planning and 62% in financial planning.
  • A digital preservation policy is a vital first step in preserving digital content. Organisations with a policy are more likely to include digital preservation in their operational, financial and business continuity plans, three times more likely to budget for it and four times more likely to be investing in a solution.
  • National libraries and archives with large volumes of, and variation in types of, digital content, as well as a legal and moral imperative to preserve it, currently lead the way. However, all organisations will face similar challenges as the volume and variety of content they hold rises.

Posted in Reports | Comments Off

Blue Ribbon Task Force Symposium

Posted by Marieke Guy on 14th May 2010

Last week on election day (May 6th 2010) I attended the Blue Ribbon Task Force Symposium on Sustainable Digital Preservation and Access.

The symposium provided an opportunity for stakeholders to respond to the recent Blue Ribbon Task Force report entitled Sustainable economics for a digital planet: Ensuring long term access to digital information. The report is available to download in PDF Format.

Panel Session: Clifford Lynch, Adam Farquhar, Matthew Woollard, Graham Higley, John Zubrzycki

Introduction to the Report

Neil Grindley, JISC Digital Preservation Programme Manager opened the symposium by introducing the two UK members of the Blue Ribbon taskforce: Paul Ayris, Director of Library Services, University College London and the recently retired Chris Rusbridge, an Independent consultant.

After Paul Ayris’ introduction explaining that the taskforce had been set up to answer three key questions 1) What shall we preserve 2) Who will preserve it and 3) who will pay for it? Chris Rusbridge followed with a summary of Blue Ribbon activity and recommendations. He explained that despite what some might think sustainability of resources is not just about finding money, it is about incentivising. Yet current access to digital information is not a clear case; those who pay for it, those who provide it, and those whose benefit from it are not necessarily the same. With this in mind the Blue Ribbon Task force report has been written with an economic framework on board. Rusbridge also explained within the report they had set down that the case for preservation is the case for use. People don’t want digital preservation, they want access to resources: digital preservation is effectively a derived demand. The report conclusions offered an agenda for further action including looking at economies of scale and scope, chains of stewardship and investigation of public partnerships. It had laid down the foundations for a further report taking the next steps.

Brian Lavoie, Research Scientist at OCLC, and fellow taskforce member, then talked a little about the US launch; the products of the launch are available online. Lavoie explained that Clarity of licensing and devices like Creative Commons have been valuable in making resources preservable: they encourage third-party curation by enshrining the right to preserve.

A panel session on what the task force had actually achieved followed. The initial questions were posed by Paul Ayris and centred around the fact that while open access is now so high on everyone’s agenda, digital preservation remains low, almost invisible. It is very much a case of open access being today’s problem and digital preservation being tomorrow’s.

Different Perspectives

After a much needed coffee break the symposium moved onto session two chaired by Clifford Lynch of the Coalition for Networked Information, considering different sector perspectives. The view from the heritage sector was offered by Graham Higley Head of Library and Information Services at the Natural History Museum. Higley introduced the Biodiversity Heritage Library (BHL) at the Natural History Museum which holds about 1 million books. Many of the resources are very old with more than half of all named species documented in literature pre 1900. Preservation is considered a core part of BHL work and their long term access approach is LOCKSS based on international partnership guarantees and entirely on open source software.

John Zubrzycki Principal Technologist and Archives Research Section Leader at BBC Research followed with a view from public broadcasting. The BBC have 650k hours of video, 350k hours of audio, 2 million stills, 3 million items of sheet music, 400k “pronunciations”, 1.5 million titles in “grams library” and 100 km of shelves – that’s a lot of stuff and it will up to take 16 years to digitise all the 65 PetaBytes of existing content. The BBC Charter states obligations on the BBC to preserve output and the BBC is aiming to provide public web access to all its archived content by 2020.

Lunch was really good and gave us a chance to network and put faces to Twitter IDs. We then all proceeded back to the lecture theatre. The Data Manager’s perspective was given by Matthew Woollard, Director-Designate of the UK Data Archive. The UK Data Archive is a department at the University of Essex and provides infrastructure and shared services for various data archives. Wollard argued that it is a fallacy that researchers want to keep everything and that priorities for selection, curation and retention were key. In reality it costs the UKDA more to restrict access than to open it. Wollard is currently involved in formulation of the ESRC research data policy which will hopefully influenced by Blue Ribbon Task Force report. He ended with the suggestion that Data archives should use the arguments in the Blue Ribbon Task Force report report to leverage, not necessarily more money, but more sustainable money.

The final perspective was that of the national library. Adam Farquhar, Head of Digital Library Technology at the British Library where: “preservation is their day job”. They have to ask for permission to archive Web sites; of the 13,000 people asked only100 have said ‘no’, but then only 4,000 have responded. It is this Copyright investigation that costs time and money, establishing the right legislative foundation is a priority Farquahar talked about their use of Datacite and Dryad to support researchers by providing methods for them to locate, identify and cite research datasets with confidence. The British Library also has an interest in Planets and the Open Planets Foundation.

There followed a discussion on free-riders (those who use content but do not contribute to its upkeep), who exactly they are and whether they are a problem. Brian Lavoie explained that taxes pay for public bodies to perform preservation and therefore free use of these services is not ‘free’. The report itself is fairly critical of free riders, though those of use working in academia might believe that any use of resources should be encouraged. Matthew Wollard pointed out that the costs of excluding ‘free riders’ can be greater than costs of letting them in.

Higher Level Views

The final talks gave two higher level views: that of the European Commission and the JISC. Pat Manson, Acting Director of Digital Content and Cognitive Systems at the European Commission, talked about policy initiatives at European level and how they are tackling the sustainability challenge.

The JISC vision for digital preservation was provided by Sarah Porter, Head of Innovation at the JISC. The JISC are keen to ensure that organisations are prepared to undertake preservation and to embed preservation practice. Currently the JISC has taken no formal position in this area but one possibility is that they, as funders, create an explicit mandate for projects to follow. They are also considering if funders in different countries could work together on further actions and if they should create financial incentives for private entities to preserve in the public interest?

Chris Rusbridge sums up

The chair for the session Brian Lavoie then facilitated a discussion on ‘where do we go from here?’ One suggestions made was engaging beyond academia and the cultural sector at a high political and governmental level, promotion of this as one of the ‘big society’ challenges, how apt on the day of the election. Chris Rusbridge closed with the thought that the report offered something for us to build on but the scale of challenge required us to move on quickly.

After the symposium there was a drinks reception for those who didn’t need to rush back to cast their vote. I had an interesting chat with the BBC team, most of our talk focussed around where a possible new government would leave not just digital preservation but the public sector as a whole.

A longer version of this trip report will appear in Ariadne Web magazine.

More photos are available from the UKOLN Flickr site.

Posted in Events | 2 Comments »

Reading for Today…

Posted by Marieke Guy on 5th May 2010

To get started on the project I will be reading a number of recent digital preservation reports including:

All the resources I use for the Beginner’s Guide will be tagged with jisc-bgdp and added to Delicious.

If you know of any other useful resources please do let me know.

Posted in Reports | 5 Comments »