Any JISC project is likely to create a lot of text-based information including PDFs, MS Word files, RTF, HTML (Web site) pages, XML, TeX and postscript files.
There are two stages at which you will be able to make choices relating to the digital preservation of text-based information:
- Before you start creating documents – Your choice of format can have a significant affect on how easy it is to preserve.
- Once the files have been created and you need to preserve them – One option for text-based information of a high level (such as reports, articles, peer-reviewed papers, dissertations etc.) is to place them within a subject based or institutional repository.
If the text-based information is for internal use only, you will need to consider use of your intranet or document management system.
The Library of Congress has a useful Sustainability of Digital Formats Web site which looks at formats for textual, Web and generic formats (including XML).
Further Resources
- DPC Technology Watch Report: File formats for Preservation
- ULCC: File formats…or data streams?
- Significant properties of text documents that will need to be preserved
- DPC Technology Watch Report: Preserving the Data Explosion: Using PDF
- LOC: Sustainability of Digital Formats: alphabetical list
- JISC: Archiving E-Publications
- ADS: Guides to Good Practice: Documents and Digital Texts