UKOLN Cultural Heritage Documents » Digital Preservation http://blogs.ukoln.ac.uk/cultural-heritage-documents A commentable and syndicable version of UKOLN's cultural heritage briefing documents Fri, 17 Sep 2010 09:32:22 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.2 What To Do When a Service Provider Closes http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/what-to-do-when-a-service-provider-closes/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/what-to-do-when-a-service-provider-closes/#comments Thu, 02 Sep 2010 14:09:44 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=246 Introduction

This seven point checklist presents some steps that creators and managers of community digital archives might take to make sure that their data is available in the long term. It is useful for many circumstances but it will be particularly relevant to community archives that depend on third party suppliers to provide technical infrastructure.

The economic downturn and poor trading conditions mean that some technology providers are unable to continue providing the services upon which community groups have depended. Because hardware, software and services are often very tightly integrated, the failure of a technology company can be very disruptive to its customers. This is especially true if systems are proprietary and customers are ‘locked in’ to particular services, tools or data types. The key message is that community archives need to retain sufficient control of content in order that services can be moved from one service provider to another. Change brought about through insolvency is disruptive and unwelcome: the more control that a group has over content, the less disruptive it will be.

Consideration of the following seven points might help reduce disruption in the event that a content management company withdraws its services.

1. Keep the Masters

Many community groups hold a mix of photographs, sound recordings, video and text in digital form. Some of these are digital copies that have been scanned – such as old photographs, letters: some are ‘born digital’ using digital cameras or digital sound recording equipment. In every case the underlying data will be captured in one of a series of file formats. A simple rule of thumb is that a high quality ‘original’ is retained which has not been processed or edited and that the community group has direct access to this high quality ‘original’ without relying on the content management company.

2. Know What’s What

The rapid proliferation of digital content means that it can be hard to keep track of content – even in a relatively small organisation. Typically a content management company will use a database to catalogue content and then use the database to drive a Web site that makes it available to the public. So, to retain control over content community archives should keep a copy of the catalogue. The database can be complex and even when it is implemented in open source software, it can be proprietary.

The tools used to describe a collection depend on the nature of the collection. For example archives are often described in ‘Encoded Archival Description’ while an images might best be described using the ‘VRA Core’ standard. It’s useful to know a little about the standards that apply in your area.

3. There Should be a Disaster Plan

Most content management companies will have some kind of disaster plan – a backup copy which can be made available in the event of some unforeseen break of service. Good practice means that the content management company should keep multiple copies of data in multiple locations. It is reasonable for a community group to see a copy of the disaster plan and for parts of the disaster plan to be written into the contract between the contractor and the community group. You should ask for evidence that the disaster plan has been tried out and agree how quickly your data would be restored should a disaster occur. It is also reasonable to request or keep a copy of your data for safekeeping, though you may need to plan how and in what format you receive this and you may want to update it periodically.

A common approach to backups is called the ‘Grandfather – Father – Son’ approach. A complete copy is taken every month and stored remotely (Grandfather). A complete copy is taken every week but kept locally (Father) and a daily backup is made of recent changes (Son). The frequency of backups should be dictated by the frequency of changes. Ask your service provider how they approach this.

3. Agree a Succession Plan

A good content management company will also have a succession plan and be willing to involve you in this. Although it is not a happy topic, a shared understanding of rights and expectations of what should happen when either partner is no longer able maintain a contractual relationship can go a long way to reassuring both parties. This is particularly important where a hosting company is employed to deliver content which is not theirs. It is not unreasonable to include a note within the contract clearly identifying that content provided to the hosting company remains the property of the party supplying it and that should there be any break in the contract that the contractor will be obliged to return it. In reality this does not guarantee that you will get content back if a company goes into liquidation but it does secure your right to ask the administrator for it, and if that is not successful then you are then clear about your rights to use the masters and backups which have been lodged with you.

5. Know Your Rights

Rights management can be daunting but it is important to be clear when engaging a third party contractor of the limits of what they are entitled to do with content that a community archive might produce. A good content management contract is likely to give the content management company a licence to distribute content on your behalf for a given period – and it should also specify that technical parts of the service such as software are the property of the content management company. In reality this can be complicated because the community archive may itself be depending on agreements from the actual copyright holders and elements of design and coding will be shared. But so long as you are clear that the content provider will not become the owner of the content once it’s on their site, and that you can terminate their licence after appropriate notice, then it will be easier for you to pass the masters to a new company.

6. Find a Digital Preservation Service

A small number of services exist to look after data for you: either funded as part of existing infrastructure or as a service you can buy. Many local government archives and libraries are developing digital preservation facilities for their own use and might welcome an approach from a community group. Other types of partnership might also make sense: many universities now maintain digital archives for research so it might be useful to talk to a university archivist. Facilities also operate thematically – for example there is a national facility allowing archaeologists to share short reports of excavations. Image and sound libraries may also be able to provide an archival home to data or provide advice, while other services provide digital preservation on a commercial basis. In the same way publishers have started sharing some of their content to reduce their risks and risks to their clients. Having a preservation partner can be very useful for you in the short term and in the long term and will make you a lot more confident that your data will be safe even if the content management company is not around to service it.

7. Put a Copy of your Web Site in a Web Archive

There are a number of services that can make copies of online content before a supplier goes into liquidation. A free service from the British Library called the UK Web Archive exists to ‘harvest’ Web sites in the UK. It can create a simple static copy of your Web site and present this back to you under certain limitations. The UK Web Archive is free but it is based on a recommendation: you need to ask them to take a copy and need to give them permission to do so. But once you’ve given them permission they can harvest the site periodically and so build up a picture of your Web site through time. The UK Web Archive is ideal for relatively static Web sites – but is less good with sites that require passwords, which change quickly or which contain lots of dynamic content. Similar services exist such as the US-based Internet Archive have paid for services that allow users to control the harvesting of content and allow more complicated data types to be managed. Considering the ease of use and how quickly it can gather content, every community archive should consider registering with a service like this as a way to offset the risks of a supplier going into liquidation.

See the briefing paper on Web Archiving for further information [1].

The UK Web Archive is one of a number of services that can make a copy of your Website. So, in the worst case, users can be directed to a version of your site fixed at one point in time [2].

Acknowledgements

This briefing paper was written by William Kilbride of the Digital Preservation Coalition [3].

References

  1. Web Archiving, Cultural Heritage briefing paper no. 53, UKOLN, <http://www.ukoln.ac.uk/cultural-heritage/documents/briefing-53/>
  2. UK Web Archive, <http://www.Webarchive.org.uk/>
  3. Digital Preservation Coalition, <http:// http://www.dpconline.org/>
]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/what-to-do-when-a-service-provider-closes/feed/ 0
Web Archiving http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/web-archiving/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/web-archiving/#comments Thu, 02 Sep 2010 11:03:03 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=176 Introduction

Archiving is a confusing term and can mean the backup of digital resources and/or the long-term preservation of those records. This document talks about the physical archiving of your Web site as the last in a series of steps after selection and appraisal of Web resources has taken place. This will be part of a ‘preservation policy’.

Approaches

Before archiving it is important to consider approaches to preserving your Web site:

What to do now
This includes quick-win solutions, actions that can be performed now to get results, or to rescue and protect resources that you have identified as being most at risk. Actions include domain harvesting, remote harvesting, use of the EDRMS, use of the Institutional Repository, and ’2.0 harvesting’. These actions may be attractive because they are quick, and some of them can be performed without involving other people or requiring changes in working. However, they may become expensive to sustain if they do not evolve into strategy.
Strategic approaches
This class includes longer-term strategic solutions which take more time to implement, involve some degree of change, and affect more people in the Institution. These include approaches adapted from Lifecycle Management and Records Management and also approaches which involve working with external organisations to do the work (or some of it) for you. The pay-off may be delayed in some cases, but the more these solutions become embedded in the workflow, the more Web-archiving and preservation becomes a matter of course, rather than something which requires reactive responses or constant maintenance, both of which can be resource-hungry methods.

Domain Harvesting

Domain harvesting can be carried out in two ways: 1) Your Institution conducts its own domain harvest, sweeping the entire domain (or domains) using appropriate Web-crawling tools. 2) Your Institution works in partnership with an external agency to do domain harvesting on its behalf. Domain harvesting is only ever a partial solution to the preservation of Web content. Firstly, there are limitations to the systems which currently exist. You may gather too much, including pages and content that you don’t need to preserve. Conversely, you may miss out things which ought to be collected such as: hidden links, secure and encrypted pages, external domains, database-driven content, and databases. Secondly, simply harvesting the material and storing a copy of it may not address all the issues associated with preservation.

Migration

Migration of resources is a form of preservation. Migration is moving resources from one operating system to another, or from one storage system to another. This may raise questions about emulation and performance. Can the resource be successfully extracted from its old system, and behave in an acceptable way in the new system?

Getting Other People to Do it for You

There are a number of third party Web harvesting services which may have a role to play in harvesting your Web site:

UKWAC
The UK Web-Archiving Consortium [1] has been gathering and curating Web sites since 2004. To date, UKWAC’s approach has been very selective, and determined by written selection policies which are in some ways quite narrow, it currently only covers UK HE/FE. However it is now possible to nominate your Institutional Web site for capture with UKWAC.
The Internet Archive
The Internet Archive [2] is unique in that it has been gathering pages from Web sites since 1996. It holds a lot of Web material that cannot be retrieved or found anywhere else. There are a number of issues to consider when using the Internet Archive. To date it lacks any sort of explicit preservation principle or policy and may not have a sustainable business model and so its use cannot guarantee the preservation of your resources. There are also issues with the technical limitations of the Wayback Machine e.g. gaps between capture dates, broken links, database problems, failure to capture some images, no guarantee to capture to a reliable depth or quality. The National Archives use a model where they contract out collection to the Internet Archive, but also maintain the content themselves.
HANZO
Hanzo Archives is a commercial Web-archiving company [3]. They claim to be able to help institutions archive their Web sites and other Web-based resources. They offer a software as a service solution for Web archiving. It’s possible for ownership to be shared at multiple levels; for instance, one can depend on a national infrastructure or service to do the actual preserving, but still place responsibility on the creator or the institution to make use of that national service.

References

  1. UKWAC, <http://www.webarchive.org.uk/>
  2. The Internet Archive, <http://www.archive.org/>
  3. HANZO, <http://www.hanzoarchives.com/>
]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/web-archiving/feed/ 0
Selection for Web Resource Preservation http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/selection-for-web-resource-preservation/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/selection-for-web-resource-preservation/#comments Thu, 02 Sep 2010 11:00:06 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=174 Introduction

This document provides some approaches to selection for preservation of Web resources.

Background

Deciding on a managed set of requirements is absolutely crucial to successful Web preservation. It is possible that, faced with the enormity of the task, many organisations decide that any sort of capture and preservation action is impossible and it is safer to do nothing.

It is worth remembering, however, that a preservation strategy won’t necessarily mean preserving every single version of every single resource and may not always mean “keeping forever”, as permanent preservation is not the only viable option. Your preservation actions don’t have to result in a “perfect” solution but once decided upon you must manage resources in order to preserve them. An unmanaged resource is difficult, if not impossible, to preserve.

The task can be made more manageable by careful appraisal of the Web resources, a process that will result in selection of certain resources for inclusion in the scope of the programme. Appraisal decisions will be informed by understanding the usage currently made of organisational Web sites and other Web-based services and the nature of the digital content which appears on these services.

Considerations

Some questions that will need consideration include:

  • Should the entire Web site be archived or just selected pages from the Web site?
  • Could inclusion be managed on a departmental basis, prioritising some departmental pages while excluding others?

You will also be looking for unique, valuable and unprotected resources, such as:

  • Resources which only exist in web-based format.
  • Resources which do not exist anywhere else but on the Web site.
  • Resources whose ownership or responsibility is unclear, or lacking altogether.
  • Resources that constitute records, according to definitions supplied by the records manager.
  • Resources that have potential archival value, according to definitions supplied by the archivists.

Resources to be Preserved

(1) RECORD

A traditional description of a ‘record’ is:

“Recorded information, in any form, created or received and maintained by an organisation or person in the transaction of business or conduct of affairs and kept as evidence of such activity.”

A Web resource is a record if it:

  • Constitutes evidence of business activity that you need to refer to again.
  • Is evidence of a transaction.
  • Is needed to be kept for legal reasons.

(2) PUBLICATION

A traditional description of a publication is:

“A work is deemed to have been published if reproductions of the work or edition have been made available (whether by sale or otherwise) to the public.”

A Web resource is a publication if it is:

  • A Web page that’s exposed to the public on the Web site.
  • An attachment to a Web page (e.g. a PDF or MS Word Document) that’s exposed on the Web site.
  • A copy of a digital resource, e.g. a report or dissertation, that has already been published by other means.

(3) ARTEFACT

A Web resource is an artefact if it:

  • Has intrinsic value to the organisation for historical or heritage purposes.
  • Is an example of a significant milestone in the organisation’s technical progress, for example the first instance of using a particular type of software.

Resources to be Excluded

There are some resources that can be excluded such as resources that are already being managed elsewhere e.g. asset collections, databases, electronic journals, repositories, etc. You can also exclude duplicate copies and resources that have no value.

Selection Steps

Selection of Web resources for preservation requires two steps:

  1. Devise a selection policy- defining a selection policy in line with your organisational preservation requirements. The policy could be placed within the context of high-level organisational policies, and aligned with any relevant or analogous existing policies.
  2. Build a collection list.

Selection Approaches

Approaches to selection include:

Unselective approach
This involves collecting everything possible. This approach can create large amounts of unsorted and potentially useless data, and commit additional resources to its storage.
Thematic selection
A ‘semi-selective’ approach. Selection could be based on predetermined themes, so long as the themes are agreed as relevant and useful and will assist in the furtherance of preserving the correct resources.
Selective approach
This is the most narrowly-defined method which does tend to define implicit or explicit assumptions about the material that will not be selected and therefore not preserved. The JISC PoWR project recommend this approach [1].

Resource Questions

Questions about the resources which should be answered include:

  • Is the resource needed by staff to perform a specific task?
  • Has the resource been accessed in the last six months?
  • Is the resource the only known copy, or the only way to access the content?
  • Is the resource part of the organisation’s Web publication scheme?
  • Can the resource be re-used or repurposed?
  • Is the resource required for audit purposes?
  • Are there legal reasons for keeping the resource?
  • Does the resource represent a significant financial investment in terms of staff cost and time spent creating it?
  • Does it have potential heritage or historical value?

An example selection policy is available from the National Library of Australia [2].

Decision Tree

Another potentially useful tool is the Decision Tree [3] produced by the Digital Preservation Coalition. It is intended to help you build a selection policy for digital resources, although we should point out that it was intended for use in a digital archive or repository. The Decision Tree may have some value for appraising Web resources if it is suitably adapted.

Aspects to be Captured

It is possible to make a distinction between preserving an experience and preserving the information which the experience makes available.

Information = content (which could be words, images, audio, …) Experience = the experience of accessing that content on the Web, which all its attendant behaviours and aspects

Making this decision should be driven by the question “Why would we want to preserve what’s on the Web?” When deciding upon the answer it might be useful to bear in mind drivers such as evidence and record-keeping, repurposing and reuse and social history.

References

  1. JISC PoWR, <http://jiscpowr.jiscinvolve.org/>
  2. Selection Guidelines for Archiving and Preservation by the National Library of Australia, National Library of Australia, <http://pandora.nla.gov.au/selectionguidelines.html>
  3. Digital Preservation Coalition Decision Tree, Digital Preservation Coalition, <http://www.dpconline.org/graphics/handbook/dec-tree-select.html>
]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/selection-for-web-resource-preservation/feed/ 0
Preserving Your Home Page http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/preserving-your-home-page/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/preserving-your-home-page/#comments Thu, 02 Sep 2010 09:58:40 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=172 Introduction

An organisation’s home page provides the doorway to its Web site. How it changes over time reflects both how an organisation has changed and how the Web has changed. Keeping a record of both the visual and structural changes of the home page could be very important in the future.

Scenario

Suppose your organisation is about to commemorate an important anniversary (10 years, 50 years or 250 years since it was founded). Your director wants to highlight the fact that the organisations is actively engaging with new technologies and would like to provide an example of how the organisation’s Web site has developed since it was launched. The challenge:

How has your organisational home page changed over time? Have you kept records of the changes and the decisions which were made? If the above scenario took place in your organisations, do you feel you would be able to deliver a solution?

Although most Web managers will be aware of the most significant changes (such as a CMS brought in, search added, changes in navigation, branding, accessibility, language, content, interactive elements and multimedia) currently there is likely to only be anecdotal evidence and tacit knowledge.

Internet Archive

One option may be to use the Internet Archive (IA) [1] to view the recorded occurrences of the organisation’s home page. The IA is a non-profit organisation founded to build an Internet library, with the purpose of offering access to historical collections that exist in digital format. There are a number of issues to consider when using the IA e.g. it lacks explicit preservation principles and may not have a sustainable business model and so its use cannot guarantee the preservation of your resources.

Example: As part of the JISC PoWR project an interactive display was created of the University of Bath’s home page using IA screenshots [2]. In addition to this display a brief video with accompanying commentary was also created, which discusses some of the changes to the home page over the 11 years.

Compiled History

Building a compiled history is another approach. A 14 year’s history of the University of Virginia’s Web site from 1994-2008 [3] is available from their site. They provide details of the Web usage statistics in the early years, with screen images shown of major changes to the home page from 1997. There is also a time line and access to archived sites from 1996 onwards.

Preserving for the Future

The best way that you can ensure that your organisation’s home page is preserved is ensuring that it gets documented in a preservation policy or as part of a retention schedule. Once this has been agreed there are a number of available options.

Domain harvesting of the site:
Your home page could be captured as part of a harvesting programme. Your organisation could conduct its own domain harvest, sweeping the entire domain (or domains) using appropriate Web-crawling tools or work in partnership with an external agency to do domain harvesting on its behalf.
UKWAC:
The UK Web-Archiving Consortium (UKWAC) [4] has been gathering and curating Web sites since 2004. To date, UKWAC’s approach has been selective: although you can now nominate Web sites for capture with UKWAC.
Adobe Capture:
There is a built-in part of Adobe Acrobat which allows Web sites to be captured to a PDF file.
Exploration of your Content Management System options:
There may be some scope for preservation using your CMS.

Conclusions

Responsibility for the preservation of your organisation’s Web site may fall in many places but will ultimately require shared ownership. Although there may be ways to easily access snap shots of your home page, if you would like long-term access you will need to embark upon some sort of preservation strategy.

References

  1. Internet Archive, <http://www.archive.org/>
  2. Visualisation of University of Bath Home page changes, UKOLN, <http://www.ukoln.ac.uk/web-focus/experiments/experiment-20080612/>
  3. History of UVA on the Web, <http://www.virginia.edu/virginia/archive/>
  4. UWAC, <http://www.webarchive.org.uk/>
]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/preserving-your-home-page/feed/ 0
Preserving Web 2.0 Resources http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/preserving-web-2-0-resources/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/preserving-web-2-0-resources/#comments Thu, 02 Sep 2010 09:54:55 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=170 Introduction

We have become increasingly familiar with the term Web 2.0, referring in a very general way to the recent explosion of highly interactive and personalised Web services and applications. Collaboration and social networking are a key feature, for example through contributing comments or sharing write access and collaborating. Many of these applications have now crossed the threshold between private, personal use and applications used at work.

Web 2.0 Applications

In a briefing paper for JISC, Mark van Harmelen defined seven types of Web 2.0 applications [1]: blogs, wikis, social bookmarking, media sharing services, social networking systems, collaborative editing tools and syndication and notification technologies.

Some of these applications and services listed above are still at an ‘experimental’ stage and (at time of writing) being used in organisations primarily by early adopters of new technologies. But it is possible to discern the same underlying issues with all these applications, regardless of the software or its outputs.

Web 2.0 Issues

Preservation of Web 2.0 resources presents a number of different challenges to preservation of standard Web resources. These include:

  • Use of third party services: data may be held on a provider’s server.
  • More complex ownership, IPR and authentication issues.
  • Data held may be personal and difficult to extract.
  • Emphasis on collaboration and communication rather than access to resources.
  • Richer diversity of services.
  • Is the data worth preserving at all?

Ownership and Responsibility

Quite often these applications rely on the individual to create and manage their own resources. A likely scenario is that the user creates and manages his or her own external accounts in Flickr, Slideshare or WordPress.com; but they are not organisational accounts. By contrast, one would expect blogs and wikis hosted by the organisation to offer more commitment to maintenance, in line with existing policies on rights, retention and reuse, as expressed in IT and information policy, conditions of employment, etc.

Third-party sites such as Slideshare or YouTube are excellent for dissemination, but they cannot be relied on to preserve your materials permanently. If you have created a resource – slideshow, moving image, audio, whatever it be – that requires retention or preservation, then someone needs to make arrangements for the ‘master copy’. Ideally, you want to bring these arrangements in line with the larger Web archiving programme. However, if there is a need for short-term action, and the amount of resources involved are (though important) relatively small, then remedial action for master copies may be appropriate. Some possible remedial actions are:

  • Store it in the Electronic Document Records Management System
  • Store it on the Institution Web site
  • Store it in the Institutional Repository
  • Store it on a local networked drive

In the case of blogs, wikis and collaborative tools, content is created directly in them, and access is normally dependent on the availability of the host and the continued functioning of the software. Users of such tools should be encouraged and assisted to ensure significant outputs of online collaborative work are exported and managed locally.

Conclusion

It is unclear at this stage if Web 2.0 offers a new set of challenges or an enhancement of existing ones. The really challenging problems are organisational e.g. how can an organisation identify “its content” on something like Slideshare? Who ultimately “owns” content? How (and should) things be “unpublished”? A number of case studies of preservation of Web 2.0 resources are available from the JISC PoWR Web site [2].

References

  1. An Introduction to Web 2.0, Cultural Heritage briefing paper no. 1, UKOLN, <http://www.ukoln.ac.uk/cultural-heritage/documents/briefing-1/>
  2. JISC PoWR Web 2.0, JISC PoWR Blog, <http://jiscpowr.jiscinvolve.org/category/Web-20/>
]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/preserving-web-2-0-resources/feed/ 0
Introduction to Web Resource Preservation http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/introduction-to-web-resource-preservation/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/introduction-to-web-resource-preservation/#comments Thu, 02 Sep 2010 09:52:43 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=168 Introduction

Institutions now create huge amounts of Web-based resources and the strategic importance of these is finally being recognised. Long-term stewardship of these resources by their owners is increasingly becoming a topic of interest and necessity.

What is Web ‘Preservation’?

Digital preservation is defined as a “series of managed activities necessary to ensure continued access to digital materials for as long as necessary” [1]. In the case of Web resources you may choose to go for:

  • Protection: Protecting a resource from loss or damage, in the short term, is an acceptable form of “preservation”, even if you don’t intend to keep it for longer than, say, five years.
  • Perpetual preservation It is best to think of this as long-term preservation where ‘long-term is defined as “long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community” [2].

Why Preserve?

There are a number of drivers for Web resource preservation:

  • To protect your organisation: Web sites may contain evidence of organisational activity which is not recorded elsewhere and may be lost if the Web site is not archived or regular snapshots are not taken. There are legal requirements to comply with acts such as FOI and DPA.
  • It could save you money: Web resources cost money to create, and to store; failing to repurpose and reuse them will be a waste of money.
  • Responsibility to users: Organisations have a responsibility to the people who use their resource and to the people who may need to use their resources in the future. People may make serious choices based on Web site information and there is a responsibility to keep a record of the publication programme. Many resources are unique and deleting them may mean that invaluable scholarly, cultural and scientific resources (heritage records) will be unavailable to future generations.

Whose Responsibility is it?

There are a number of parties who may have an interest in the preservation of Web resources. These may include the producer of the resource (Individual level), the publisher of the resource, the organisation, the library (Organisational Level), the cultural heritage sector, libraries and archives, the government, consortiums (National Level) or international organisations, commercial companies (International level). Within organisations the Web team, records management team, archives and information managers will all need to work together.

What Resources?

The JISC Preservation of Web Resources (PoWR) project [3] recommends a selective approach (as oppose to full domain harvesting). This won’t necessarily mean preserving every single version of every single resource and may not always mean “keeping forever”, as permanent preservation is not the only viable option. Your preservation actions don’t have to result in a “perfect” solution but once decided upon you must manage resources in order to preserve them. An unmanaged resource is difficult, if not impossible, to preserve. Periodic snapshots of a Web site can also be useful and could sit alongside a managed solution.

How Do I Preserve Web Resources?

Web preservation needs to be policy-driven. It is about changing behaviour and consistently working to policies. As a start an organisation might go about creating a Web resource preservation strategy. Some of the following questions will be worth considering: What Web resources have you got? Where are they? Why have you got them? Who wants them? For how long? What protection policies do you have?

Ways of finding out the answers to these questions include a survey, research, asking your DNS manager. Once you have found your resources you need to appraise them and select which require preserving. The next step is to move copies of your resources into archival storage. Once this process is completed the resources will need to be managed in some way. For further information see the Web Archiving briefing paper [4].

References

  1. Digital Preservation Coalition Definitions, Digital Preservation Coalition, <http://www.dpconline.org/graphics/intro/definitions.html>
  2. Digital preservation, Wikipedia, <http://en.wikipedia.org/wiki/Digital_preservation#cite_note-1>
  3. JISC PoWR blog site, <http://jiscpowr.jiscinvolve.org/>
  4. Web Archiving, Cultural heritage briefing paper no. 53, UKOLN, <http://www.ukoln.ac.uk/cultural-heritage/documents/briefing-53/>
]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/introduction-to-web-resource-preservation/feed/ 0
Developing Your Digital Preservation Policy http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/developing-your-digital-preservation-policy/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/developing-your-digital-preservation-policy/#comments Thu, 02 Sep 2010 09:17:28 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=140 Why Do I Need a Preservation Policy?

The digital world is one of continual change and rapid development of technology. Web sites change content, are radically restructured or disappear. Software is released in new versions, which may not be (fully) compatible with resources created using the earlier versions. Recording media for digital resources also deteriorates, often with data loss. Some resources are designed for use with specific hardware – which may breakdown, perhaps irretrievably, and/or go out of production.

This combination of factors means that you need to consider the preservation aspects of these resources at the earliest possible moment – ideally before they are created.

Before You Create a Policy

Before creating your policy on digital preservation you should first address the following issues:

Listing of resources
All types of digital resources that you either currently or plan to create, own or subscribe to should be documented.
Identification
Document the risks for each type of resource – e.g. Web site changes, software version changes, media degradation, hardware failure and replacement unavailability.
Implications
Consider the implications for your service in the worst case scenario. Are the resources intended to be ephemeral or permanent?
Assessment
Assess the value of groups of resources and the impact on your service if these no longer exist or are inaccessible.
Solutions
For each case, identify what the options are, how much they will cost and what they will require in terms of staff time and skills.
Decide
Decide on the strategies which are most appropriate for each type of resource.

Preservation Strategies

An appropriate strategy will depend on the resource and the type of failure. Strategies include:

Refresh
Transfer data between two types of the same storage medium e.g. creating a new preservation CD from the previous one.
Migrate
Transfer data from one format (operating system, programming language) to another format so the resource remains functional and accessible e.g. conversion from Microsoft Word to PDF or OpenDocument.
Replicate
Create one or more duplicates as insurance against loss or damage to one or more of the copies e.g. back-up copies on CD for resources available from Web site.
Emulate
Replicate the functionality of an obsolete application, operating system or hardware platform e.g. emulating WordPerfect 1.0 on a Macintosh system.

Your Preservation Policy

Having done the preparatory work, you are now in the position to be able to make decisions on your preservation policy, based on your particular combination of digital resources, funding, and technical platform and skills. Having made the decisions, record them and make sure all appropriate staff have access to the information.

The key characteristics of a preservation policy are:

Clarity
Different digital resources will require different preservation strategies. Deal with each type separately within the policy.
Risks
Each digital resource type is listed with its attendant risk.
Solution
The solution currently to be applied in the context of a specific (set of) resource(s).
Revision
As circumstances change, the preservation policy will need to change too, so build in a regular review.

Acknowledgements

This document was based on materials produced by the JISC-funded PoWR (Preservation of Web Resources) project which was provided by UKOLN and ULCC (University of London Computer Centre).

]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/09/02/developing-your-digital-preservation-policy/feed/ 0
Preservation and Sustainability http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/preservation-and-sustainability/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/preservation-and-sustainability/#comments Thu, 26 Aug 2010 15:59:53 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=135 The Role of Planning

Digital media is well placed to be reused, and to be available for different applications e.g. as a source of images for marketing, a picture library resource or for an online collections database. There are several aspects to this:

  • The formats of documents or data files, by following established standards, remain discoverable and usable
  • The media that they reside on is stored safely, is reliable, is refreshed and is backed up securely.
  • Systems are designed to remain available, affordable and are supported.
  • Web sites are maintained and supported.

Strategies

The following strategies can be used in the preservation of digital assets:

Refreshing:
Media may need to be refreshed in line with its recommended life. A checking system may be put in place to identify problems.
Migration:
Data may need conversion of data into a more accessible format. This has the potential for the loss of data.
Emulation:
Where an emulator mimics the original software environment to allow data to be read.

Standards and File Formats

Using standard formats for data files (whether text, images, audio and video) will not prevent them being superseded but can help in maximising the opportunity for reuse.

Systems

There can be a breakdown of continuity in both hardware and software even if open standards are used. Systems that conform to standards (e.g. Spectrum for a collections management system) may ensure easier migration to a new system.

Media

In the life of the PC there have been rapid changes in removable data, from the original 5.25″ ‘floppy’ disk found in the earliest PCs, to the 3.5″ disk, CD, DVD and USB memory stick. Fixed or ‘hard’ disks have grown in size from being measured in megabytes to reaching terabytes.

Issues to consider regarding digital media include:

Capacity:
The increasing size of data files has put pressure on backups. Simple local backups may be made using multiple CDs or DVDs but such media is not likely to be reliable in the longer term.
Reliability:
Reputable brands of media should be chosen to maximise reliability. Different brands may be selected for different sets to protect from faulty batches. Media should be kept in a stable environment, away from dust and dirt and magnetic interference.
Identification:
Safe labelling is important to identify the purpose of the backups and the relationship to any digitisation programmes. This should take into account use beyond the immediate life of the project and the original personnel involved.
Archive Copies:
Should be handled as little as possible. Working copies should be used for regular access, such as for copying and publication.
Remote Backups:
To ensure safe-keeping, a remote backup outside of the building and immediate area is necessary. This may provide an opportunity to store the media in a specialist, supervised store.

Web Sites

Consideration needs to be taken of preservation and sustainability issues concerning Web sites. The design of the Web site should take into account how digital content may be used in other applications, rather than being focussed solely on one output.

There is a clear advantage in storing and managing digital assets within a collections management system which has the capability of exporting to the Web as data might be more easily migrated to another system.

Acknowledgements

Renaissance West Midlands logoThis document has been produced from information contained within the Renaissance East Midlands Simple Guide to Digitisation that was researched and written by Julian Tomlin and is available from http://www.renaissanceeastmidlands.org.uk/. We are grateful for permission to republish this document under a Creative Commons licence. Anyone wishing to republish this document should include acknowledgements to Renaissance East Midlands and Julian Tomlin.

]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/preservation-and-sustainability/feed/ 0
Top Ten Tips For Web Site Preservation http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/top-ten-tips-for-web-site-preservation/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/top-ten-tips-for-web-site-preservation/#comments Thu, 26 Aug 2010 15:12:45 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=106 About This Document

This document provides top tips which can help to ensure that Web sites can be preserved.

The Top 10 Tips

1 Define The Purpose(s) Of Your Web Site
You should have a clear idea of the purpose(s) of your Web site and you should document the purposes. Your Web site could, for example, provide access to project deliverables for end users; could provide information about the project; could be for use by project partners; etc. A policy for preservation will be dependent of the role of the Web site.

2 Have A URI Naming Policy
Before launching your Web site you should develop a URI naming policy. Ideally you should contain the project Web site within its own directory, which will allow the project Web site to be processed (e.g. harvested) separately from other resources on the Web site.

3 Think Carefully Before Having Split Web Sites
The preservation of a Web site which is split across several locations may be difficult to implement. However also bear in mind tip 4.

4 Think About Separating Web Site Functionality
On the other hand it may be desirable to separate the functionality of the Web site, to allow, for example, information resources to be processed independently of other aspects of the Web site. For example, the search functionality of the Web site could have its own sub-domain,(e.g. search.foo.ac.uk) which could allow the information resources (under www.foo.ac.uk) to be processed separately.

5 Make Use Of Open Standards
You should seek to make use of open standard formats for your Web site. This will help you to avoid lock-in to proprietary formats for which access may not be available in the future. However you should also be aware of possible risks and resource implications in using open standards.

6 Explore Potential For Exporting Resources From A CMS
You should explore the possibility of exporting resources from a backend database or Content Management Systems in a form suitable for preservation. When procuring a CMS you should seek to ensure that such functionality is available.

7 Be Aware Of Legal, IPR, etc. Barriers To Preservation
You need to be aware of various legal barriers to preservation. For example, do you own the copyright of resources to be preserved; are there IPR issues to consider; are confidential documents (such as project budgets, minutes of meetings, mailing list archives, etc.) to be preserved; etc.

8 Ensure Institutional Records Managers Provide Input
You should ensure that staff from your institution’s records management teams provide input into policies for the preservation of Web site resources.

9 Provide Documentation
You should provide technical documentation on your Web site which will allow others to preserve your Web site and to understand any potential problem areas. You should also provide documentation on your policy of preservation.

10 Share Your Experiences
Learn from the experiences of others. For example read the case study on Providing Access to an EU-funded Project Web Site after Completion of Funding [1] and the briefing document on Mothballing Web Sites [2].

References

]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/top-ten-tips-for-web-site-preservation/feed/ 0
Mothballing Your Web Site http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/mothballing-your-web-site/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/mothballing-your-web-site/#comments Thu, 26 Aug 2010 15:10:33 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=104 About This Document

This briefing document provides an introduction to digital preservation.

What Is Digital Preservation?

Digital preservation is the management of digital information over time. It takes the form of processes and activities that ensure continued access to information and all kinds of records, both scientific and cultural heritage, that exists in digital form.

The aim of digital preservation is long-term, error-free storage of digital information, with the means of retrieval and interpretation, for the period of time that information is required.

Why Do We Need Digital Preservation?

The digital world is a place of rapid technological and organisational changes, which impacts on the continuing use of digital resources. In contrast to our physical written heritage, still readable today, digital information created only a few years ago is in danger of being lost.

Which Materials Need Preservation?

All types of digital resources need preservation including:

Digitally Reformatted
Digitised versions or surrogates of physical items.
Born Digital
Digital resources that have no analogue counterpart.
Individual resources
Texts, still and moving images, sound recordings, etc.
Collective resources
Web sites, e-journals, wikis, catalogues, etc.
Data Sets
Scientific and cultural data comprising multiple individual pieces of data.
Communication record
For example, email, instant messages, etc.

Preservation Metadata

The long-term storage of digital information is assisted by the inclusion of preservation metadata which records various features of the resource. For example:

Format
MS Word or Notepad? MS Word 2 or MS Word 6? JPEG or GIF?
Version
Pre-print, published.
Playback
Equipment or emulation device required.

Issues

Digital preservation encompasses a range of strategies, processes and activities, with a variety of associated issues to be considered. Examples are:

Long-term
May extend indefinitely and depends on the need for continuing access to a resource in one or more specific formats. The lifetime of a specific resource is determined by the degradation and/or format accessibility of that resource.
Retrieval
Obtaining digital files from storage without corrupting the stored files.
Interpretation
The digital files must be decoded and transformed into usable representations, for machine processing and/or human access.
Rendering
Making a digital file available for a human to access.
Re-digitising
Some early digitised resources are in formats that are, or are rapidly becoming, obsolete. Since it can be the case that poor results are obtained by migrating from the obsolete format to a newer format, it may sometimes be better to re-digitise from the original.
Emulation
Where specific playback equipment is no longer available, emulation software may need to be written in order to access the informational content using a different device.
Degradation
The process by which parts of a resource are lost over time. This may occur as a characteristic of a format (it becomes a less accurate representation over time) or a consequence of copying from another file or migrating from one format to another.
Effort
It appears that digital preservation requires more frequent and ongoing action than other types of media. The consequent requirement in terms of effort, time and money is a major stumbling block for preserving digital information.
]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/mothballing-your-web-site/feed/ 0
An Introduction To Digital Preservation http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/an-introduction-to-digital-preservation/ http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/an-introduction-to-digital-preservation/#comments Thu, 26 Aug 2010 15:08:38 +0000 Brian Kelly http://culturalheritagedocs.wordpress.com/?p=101 An Introduction To Digital Preservation

About This Document

This briefing document provides an introduction to digital preservation.

What Is Digital Preservation?

Digital preservation is the management of digital information over time. It takes the form of processes and activities that ensure continued access to information and all kinds of records, both scientific and cultural heritage, that exists in digital form.

The aim of digital preservation is long-term, error-free storage of digital information, with the means of retrieval and interpretation, for the period of time that information is required.

Why Do We Need Digital Preservation?

The digital world is a place of rapid technological and organisational changes, which impacts on the continuing use of digital resources. In contrast to our physical written heritage, still readable today, digital information created only a few years ago is in danger of being lost.

Which Materials Need Preservation?

All types of digital resources need preservation including:

Digitally Reformatted
Digitised versions or surrogates of physical items.
Born Digital
Digital resources that have no analogue counterpart.
Individual resources
Texts, still and moving images, sound recordings, etc.
Collective resources
Web sites, e-journals, wikis, catalogues, etc.
Data Sets
Scientific and cultural data comprising multiple individual pieces of data.
Communication record
For example, email, instant messages, etc.

Preservation Metadata

The long-term storage of digital information is assisted by the inclusion of preservation metadata which records various features of the resource. For example:

Format
MS Word or Notepad? MS Word 2 or MS Word 6? JPEG or GIF?
Version
Pre-print, published.
Playback
Equipment or emulation device required.

Issues

Digital preservation encompasses a range of strategies, processes and activities, with a variety of associated issues to be considered. Examples are:

Long-term
May extend indefinitely and depends on the need for continuing access to a resource in one or more specific formats. The lifetime of a specific resource is determined by the degradation and/or format accessibility of that resource.
Retrieval
Obtaining digital files from storage without corrupting the stored files.
Interpretation
The digital files must be decoded and transformed into usable representations, for machine processing and/or human access.
Rendering
Making a digital file available for a human to access.
Re-digitising
Some early digitised resources are in formats that are, or are rapidly becoming, obsolete. Since it can be the case that poor results are obtained by migrating from the obsolete format to a newer format, it may sometimes be better to re-digitise from the original.
Emulation
Where specific playback equipment is no longer available, emulation software may need to be written in order to access the informational content using a different device.
Degradation
The process by which parts of a resource are lost over time. This may occur as a characteristic of a format (it becomes a less accurate representation over time) or a consequence of copying from another file or migrating from one format to another.
Effort
It appears that digital preservation requires more frequent and ongoing action than other types of media. The consequent requirement in terms of effort, time and money is a major stumbling block for preserving digital information.
]]>
http://blogs.ukoln.ac.uk/cultural-heritage-documents/2010/08/26/an-introduction-to-digital-preservation/feed/ 0