Archiving Old Dynamic Websites
Archiver is a web based application written in Java using the Spring Framework. Its main use is to produce static versions of websites which are no longer updated with new content. This is important for many reasons; firstly, dynamic sites will typically target specific software dependencies. As time passes certain dependencies will no longer receive patches, posing a security risk, or updates, meaning it may no longer function with the latest operating systems and code environments. This is almost always a problem for system administrators who sometimes end up maintaining software that can be tens of years old on outdated systems. Archiver offers an easy means of creating a static versions of of these dynamic sites that can deployed simply to a standard web server, typically, but not limited to, Apache.
Websites will run using different software; some will be written in plain HTML and CSS whilst others will run on CMS or WIKI platforms such as WordPress, MediaWiki or Drupal. Each of these methods will provide a slightly different way of performing some tasks, writing certain elements in HTML/CSS etc and laying out structure. After analysing the problem it was clear that we would need to target specific software in order to provide a high quality solution. For this reason, the ‘Strategy’ design pattern was chosen.
In this case an interface and super implementation provided a default set of methods for dealing with the processing of individual web elements written to work in the majority of cases. It can be thought of as standard web behavior. Subclasses of this Strategy were provided to account for software differences.
We currently support the following strategies -:
- Basic (Static websites for migration)
- Media Wiki
While existing solutions are available none of them provide the comprehensive rewriting capabilities of Archiver. All the user has to do is point the webapp at a site, choose a strategy and deploy the resulting zip.
Archiver also produces a README file which provides details of all the files which have been included in the archive and lists any errors such as missing pages.
Code is available from https://bitbucket.org/ukolnisc/archiver/src
While this is working code it has not received sufficient testing which is obviously vital for this type of project. With that in mind we would love to hear your feedback.