JISC Beginner's Guide to Digital Preservation

…creating a pragmatic guide to digital preservation for those working on JISC projects

Archive for June, 2011

Digital Obsolescence – buzz phrase or a real issue?

Posted by Marieke Guy on 20th June 2011

Deborah Wilson is currently undertaking a research study into digital obsolescence; the results of which she will be happy to share once the analysis is complete. To benefit from quality data she is asking that people complete an online survey. The survey is tailored to those working in the records management area but all working with digital data are welcome to fill it in.

The survey will take no longer than 10 minutes to complete and your participation is greatly appreciated.

Please note that any personal data collected as a result of this survey will be made anonymous and will not be disclosed to any third party or processed for any other purpose.

Posted in Project news | Comments Off

Digital Archaeology

Posted by Marieke Guy on 13th June 2011

An exhibition on the history of the Web and Web design will run in New York this month.

The exhibition, Digital Archaeology, debuted at Internet Week Europe 2010 and “charts the disruptive moments of web design and celebrates the characters behind its radical evolution“.

The Project

The exhibition will show case 28 important Web sites including: The Project – the first website, published by Tim Berners-Lee at CERN in 1991, and Word.com – One of the earliest and most influential e-zines, a true multimedia experience, incorporating games, audio, and chat.

An introduction to the exhibition explains…

The web is just 20 years old, yet it has transformed our lives utterly, down to the bone. We do, see, hear, share, copy, sell, buy, interact, relate with authority and participate in society differently. Things will never be the same again. Over this short time, technological and communications developments have been so fast that the groundbreaking work of the early creative pioneers, produced on now defunct hardware and software, have disappeared almost as soon as they appeared, like Mayflies in spring doomed to die as the daylight fades.

Soon we will know less about these HTML blossomings than we do about the relief carvings in Mohenjo-Daro or the Yucatán. While they helped define our new culture, almost none of the websites of less than two decades ago can be seen at all. Today, when almost a quarter of the earth’s population is online, this most recent artistic, commercial and social history is being wiped from the face of earth and a hundred million hard drives lie festering in recycling yards or rusting in landfills.”

In his article on the exhibition entitled Internet history is vanishing into thin air Allan Hoffman asks:

Does your company have a working, surfable copy of its first website or the second or third versions? Probably not. But if your company published an annual report or even a newsletter in 1954 or 1999, you can bet someone saved it, and you could dig up a copy.

It seems Web preservation has at last hit the mainstream.

Posted in Project news | Comments Off

Digital Preservation Benefits Toolset Workshop

Posted by Marieke Guy on 10th June 2011

UKOLN have announced that registration is now open for the Workshop to disseminate the Digital Preservation Benefits Toolset and accompanying materials such as user guides and factsheets to the research community.

Workshop Details

Tuesday, 12 July 2011: 12.30 -16.00
London South Bank University
Main Conference Room
The Keyworth Centre
Keyworth Street
London
SE1 6NG

Workshop registration is free but please note that places are limited and early registration is advised. At least 24 hours notice of cancellation is required, otherwise a fee of £50 will be charged to recover costs.

The Digital Preservation Benefit Analysis Tools Project is funded by the Joint Information Systems Committee (JISC) and runs from 1 February to 31 July 2011.

The project has tested and reviewed the combined use of the Keeping Research Data Safe (KRDS) Benefits Framework and the Value Chain and Impact Analysis tool, which were first applied in the I2S2 Project for assessing the benefits and impact of digital preservation of research data. We have extended their utility to, and adoption within, the JISC community by providing user review and guidance for the tools and by creating an integrated toolset. The project consortium consists of a mix of user institutions, projects, and disciplinary data services committed to the testing and exploitation of these tools and the lead partners in their original creation.

A project Web site and the project plan are available and further outputs will be available from the Web site during the summer. The project partners are UKOLN and the Digital Curation Centre at the University of Bath, Centre for Health Informatics and Multi-professional Education (CHIME) at University College London, UK Data Archive (University of Essex), Archaeology Data Service (University of York), OCLC Research, and Charles Beagrie Limited.

Details concerning the Workshop programme, venue and registration are all available from the UKOLN Web site.

Posted in Events, Workshops | Comments Off

Update on the LOC Twitter Archive

Posted by Marieke Guy on 3rd June 2011

It’s all been very quiet on the Twitter front at the Library of Congress since their announcement last year so it was good to see an update written by Audrey Watters from the O’Reilly Radar. The article entitled How the Library of Congress is building the Twitter archive is a write up by Audrey following a conversation with Martha Anderson, the head of the LOC’s National Digital Information Infrastructure and Preservation Program (NDIIP), and Leslie Johnston, the manager of the NDIIP’s Technical Architecture Initiatives. It gives us a little insight into how the LOC is dealing with the challenges and opportunities of archiving digital data of this kind.

The article cites the biggest challenges as the size of the archive (we are now producing 140 million tweets per day!), the composition of a tweet (a JSON file with a lot of Twitter metadata) and the layers of complexity (e.g. dealing with all the url links).

Dealing with these complexities efficiently is big work.

This requires a significant technological undertaking on the part of the library in order to build the infrastructure necessary to handle inquiries, and specifically to handle the sorts of inquiries that researchers are clamoring for….Expectations also need to be set about exactly what the search parameters will be — this is a high-bandwidth, high-computing-power undertaking after all.

No decision has been made yet on which tools to use but the library is “testing the following in various combinations: Hive, ElasticSearch, Pig, Elephant-bird, HBase, and Hadoop“.

We wait with bated breath!

For those who like analogies Martha Anderson has just written an interesting post on how saving digital information is a lot like jazz. In Digital Preservation Jazz Martha talks about the creative, diverse, and collaborative nature of digital preservation.

Tags: ,
Posted in Archiving | Comments Off