JISC Beginner's Guide to Digital Preservation

…creating a pragmatic guide to digital preservation for those working on JISC projects

Archive for August, 2010

Where does the future of digital archiving lie?

Posted by Marieke Guy on 27th August 2010

So where does the future of digital archiving lie? According to Steve Bailey it’s in Google’s hands.

This answer has sparked off some discussion on the records management JISCMail list, firstly about whether this is truly the case, and if so what it means. So let’s you peel back the discussion and start at the beginning by watching Steve’s excellent talk given at the 8th European Conference on Digital Archiving, 28 – 30 April 2010, Geneva.

A warning: the talk is excellent but unfortunately the embedded video isn’t very user friendly and won’t allow you to enlarge it or watch it from a specific point. Any mishap and you’re back to the beginning again. It’s all or nothing so set aside 20 minutes for this one!

The presented paper (there are no slides) starts of with a hypothetical analogy. Imagine if Samuel Pepys, the 17th century diarist, had had to rely on individual businesses to store and preserve his maps, his notebooks, his vellum manuscripts and so on. These were businesses that dealt with individual formats and had little interest in the content of Pepys records. Luckily this wasn’t the case and much of what he wrote has been recorded by the National Archive.

Bailey points out that we now find ourselves in a world where the reponsibility for archiving much of our office 2.0 documents lays at the feet of 3rd parties. Documents are stored according to format and regardless of their communality of content, text documents are now stored on Google docs, videos on YouTube, photos on Flickr and so on. Although cloud services have brought us much flexibility they have left us with a Pandoras box, ‘no regard for preservation’ is one of the evils that has flown out. They are externally hosted services with very different agendas from ours, they may notify us if they are going to delete all our content but they don’t necessarily have to so. The title of Brian Kelly’s post 5 Days Left to Choose a New Ning Plan is enough to show that there may be very little time in which to rescue your digital objects.

And so Bailey concludes that the future of digital archiving lies with Google.

Bailey also outlines this theory in a post on his Records management futurewatch blog – Is the Cloud aware that it has ‘the future of digital archiving in its hands’?

For him it is not a case of whether this is the right place for it to lie, it is just so.

It is at this point perhaps worth pausing to note that the question I have just offered an answer to is not in whose hands should the future of digital preservation lie, but in whose hands does it lie – a very important distinction indeed

At another point he says:

“Once again, I do not say that this is right or wrong, foolish or wise – simply that it appears inevitable and that we would do well to prepare ourselves for it.

Steve asks us to hold back from lamenting about this situation and consider in engaging in a dialogue with cloud based service providers. He offers a possible four point plan that might help us:

  1. Taking a risk management approach to choice of Web 2.0 services – look at issues like IPR
  2. Consider what to do if your provider closes down, have a back up strategy
  3. Work with service providers to establish ways of searching information (this looks at areas like retention schedules)
  4. Consider asking Google if they are happy to fulfil this role

Much of this rings true with work we have carried out at UKOLN on projects like the JISC Preservation of Web Resources project. The final point is an interesting one though.

Perhaps we should actually stop to ask Google and their peers whether they are indeed aware of the fact that the future of digital preservation lies in their hands and the responsibilities which comes with it and whether this is a role they are happy to fulfil. For perhaps just as we are in danger of sleepwalking our way into a situation where we have let this responsibility slip through our fingers, so they might be equally guilty of unwittingly finding it has landed in theirs.

If so, might this provide the opportunity for dialogue between the archival professions and cloud based service providers and in doing so, the opportunity for us to influence (and perhaps even still directly manage) the preservation of digital archives long into the future

Bailey even suggests the possible maintenance of a public sector funded meta-repository “within which online content can be transferred, or just copied, for controlled, managed long term storage whilst continuing to provide access to it to the services and companies from which it originated“.

In reply someone from the Records Managers list makes the following point:

In terms of where the future of digital preservation does lie, I doubt it is with the major providers in part because that it not their business case. Just as newspapers are not in the archive business, (although they may have archives) neither are the web service providers (yet) in that business. The challenge is that archives as opposed to storage, is guided by the key question of who and why. To archive something is based upon a distinct community fixed in time and space. Archives as opposed to mass storage has to work by what it refuses as much as by what it includes“.

The cloud may be a mass storage device but it is not yet an archive.

So it seems that the future of digital archiving continues to lie in the hands of those who care about it – the records managers, the archivists, the librarians, the JISC project managers – it is just that they now need to either include others in the dialogue about how to preserve digital objects or (and a part of me thinks this is the more realistic approach) think in a more lateral way about how you continue to preserve when you’ve lost control of your digital objects.

Other interesting posts/articles relating to preservation and the cloud include:

Digital preservation: a matter for the clouds? by Maureen Pennock, British Library

Duracloud – A hosted service and open technology developed by DuraSpace that makes it easy for organizations and end users to use cloud services. DuraCloud leverages existing cloud infrastructure to enable durability and access to digital content.

Posted in Archiving | 1 Comment »

Mirroring sites with WinHTTrack

Posted by Marieke Guy on 26th August 2010

Earlier this week Brian kelly published a post on how he has used WinHTTrack to create a copy of the Institutional Web Management Workshop 2008 social network. The social network was created using Ning, who have recently cancelled provision of free social networks. In his post – 5 Days Left to Choose a New Ning Plan – Brian talks us through the process taken to mirror the service and also discusses some of the wider implications of use of externally hosted services.

Brian says:

The use of such services to support events, in particular, raises some interesting issues. I have previously suggested that “The lesson I’ve learnt – there’s a need to change the settings for social networks set up to support events after the event is over. I still prefer to make it easy to subscribe to such services, however, in order to avoid any delays caused by the need to accept new subscriptions manually“. But as well as tightening up on access after an event is over in order to avoid spam are futher measures needed? Should the content be replicated elsewhere? Should the social networking site be closed? Or should we be happy with the default option of simply doing nothing – after all, although the announcement stated that the free service would be withdrawn on 20 August, it is still available today.

HTTrack is one of the tools I talked about in my post Web Archiving: Tools for Capturing. It is always interesting to hear case studies of use.

Posted in Archiving | 1 Comment »

Preserving Digital Lives

Posted by Marieke Guy on 23rd August 2010

The @jisckeepit Twitter account alerted me to a really interesting article on downsizing your personal world from physical to digital (Cult of less: Living out of a hard drive). The jist of it is that many people are getting rid of their CD, DVD, and book collections and replacing them with digital versions. On an extreme level this has led to some people getting rid of nearly all of their physical possessions and living a ‘minimalist life’.

The article really struck me on a number of levels. Firstly I have been having quite a few discussions with a friend who is in the process of down-sizing to a smaller house. She’d already sold off her CDs on ebay (after adding them to her MP3 player) but has now gone one step further and got rid off her books too. She can get the informtion she needs off the Web of by using an e-book reader. To many living in a house of clutter this might appeal, personally I’m not quite ready to let go. However we both agreed that on an environmental level any moves away from ‘creating stuff’ must be a good thing.

Secondly, and of more relevance to this blog, there is the digital preservation angle. As @jisckeepit put it “note how rapidly preservation becomes critical…“. In fact there is no mention of ‘digital preservation’ in the article per se but there is recognition that back ups are vital.

Mr Yurista says he frequently worries he may lose his new digital life to a hard drive crash or downed server. “You have to really make sure you have back-ups of your digital goods everywhere,” he said.

The article mentions the new role of Data crisis counsellors who help individuals claw back their data: “data recovery services will become rather like the firefighters of the 21st Century – responders who save your valuables.

Digital Lives

Back in 2007-2009 the British Library carried out the Digital Lives Research Project. The project team, made up of the British Library, University College London and University of Bristol, created a major pathfinding study of personal digital collections.

One of the primary research questions asks How should curators approach selection, preservation and access to personal digital collections? What aspects of existing practice can be applied? What needs to be changed?

The Digital Lives project blog offers some interesting insites. The beta synthesis of the project was released early this year and is available as a PDF (it is a hefty 259 pages long but well worth a read!)

It is concluded that the role of personal archives in daily life and their research value have never been more profound. The potential benefits to society and to individuals are both deep and far reaching in their capacity to empower research and human well being and advancement….The project has outlined the concept of Personal Informatics to encapsulate the three concerns of digital capture, preservation and utility in the context of personal digital objects, and to embrace the study of digital personal information in all its manifestations.

So how does preservation of our own digital lives fit in with JISC? The answer is still unclear but as the lines between work and home life, real and digital continue to blur many may feel that the digital preservation thread cuts right the way across.

JISC Keep It

Note the JISC Keep It project aims to enable a diverse range of digital content presented by institutional repositories – research papers, science data, arts, teaching materials and theses – to be managed effectively today, tomorrow and beyond. Their Web site and blog are useful for anyone interested in a repositories role in digital preservation.

Posted in Project news | 2 Comments »

DCC Roadshow 2010 – 2011

Posted by Marieke Guy on 20th August 2010

The Digital Curation Centre have carried out digital preservation training in the past (for example the Digital Curation 101 course) but they have now committed to running a series of data curation roadshows. These are likely to be very useful anyone involved in digital curation from senior managers to researchers and librarians.

Institutional Challenges in the Data Decade

The DCC Roadshows will comprise of a series of inter-linked workshops aimed at supporting institutional data management, planning and training.

The first will take place 2-4 November in Bath and will be open to participants from Higher Education Institutes (HEIs) in the south-west of England. The roadshow will run over 3 days and comprise of a series of day and half day workshops.

For more details see the DCC Web site. Registration will open in September 2010.

Posted in dcc, Project news | Comments Off

Making Digital Preservation Fun…

Posted by Marieke Guy on 16th August 2010

…isn’t always that easy but DigitalPreservationEurope(DPE) are having a good go. They have created Team Digital Preservation – a wacky cartoon crew who “embody all aspects of digital preservation“. Digiman leads his team against Blizzard and his band of evil cronies, Team Chaos, who “embody all aspects of threats to digital preservation“.

It’s all good clean fun but still gets over a very clear message – Digital Preservation is good!

DPE have so far uploaded 5 Team Digital Preservation videos to their Wepreserve account and they are getting a good number of hits. The latest is Team Digital Preservation and the Planets Testbed.

Blizzard and his band of evil cronies, Team Chaos, have developed a devastating new weapon. But Never Fear trusty Viewers, tune in now to find out what those wonderful whizz-kids at the top-secret Team Digital Preservation research lab have cooked up to protect Digiman this time!

YouTube Preview Image

All animations are free to use by those wishing to raise awareness and understanding about digital preservation.


Planets (Preservation and Long-term Access through NETworked Services) is a four-year, €15 million project, co-funded by the European Commission under the Information Society Technologies (IST) priority of the 6th framework Programme (IST-033789). The Open Planets Foundation has been established to build on the investment to provide practical solutions and expertise in digital preservation.


DigitalPreservationEurope(DPE) builds on the earlier successful work of ERPANET, facilitates pooling of the complementary expertise that exists across the academic research, cultural, public administration and industry sectors in Europe. It fosters collaboration and synergies between many existing national and international initiatives across the European Research Area. DPE addresses the need to improve coordination, cooperation and consistency in current activities to secure effective preservation of digital materials. DPE’s success will help to secure a shared knowledge base of the processes, synergy of activity, systems and techniques needed for the long-term management of digital material.

Tags: ,
Posted in trainingmaterials | 1 Comment »

Using WordPress

Posted by Marieke Guy on 3rd August 2010

Owen Stephens, a friend and colleague of mine has just started work on a JISC commissioned Guide to Open Bibliographic Data for use by managers, practitioners and developers in the library community.

He’s planning to create the guide in WordPress, he wants the guide to be a useful and powerful online resource, he wants commenting on different sections, he wants to have different views on the sections…

It’s all starting to sound very familiar. These are some of my intentions with the JISC Beginner’s Guide to Digital Preservation.

Owen’s thoughts on what he calls Multi-faceted document navigation are available from his blog. He’s also created a demonstrator site.

There is a lot going on. He’s used a Query Multiple Taxonomies plugin and is using custom taxonomies and taxonomy templates – a much detailed approach than using just tags and categories.

He’s also using an inline post plugin to allow him to embed content from one post in another post.

I know I need to spend a lot more time looking at the plugins WordPress has available. At a recent event I attended a workshop session on WordPress beyond blogging.

The session, presented by Joss Winn from the University of Lincoln left me inspired but slightly unsure about what to do next. My blogs are housed by our systems team, so there I have to ‘ask’ for things I’d like. This isn’t a problem (by that I mean they are keen to help out) but it can be a time delay and I don’t really get to play as much as I’d like to (or should).

I think my final guide won’t be as ambitious as Owen’s but I hope it is a useful and powerful online resource.

Posted in Wordpress | 1 Comment »