At my university we are exploring options for website archiving for long-term preservation of institutional websites as well as other potential uses. We would like to dialog with participants to find out which institutions are actively archiving websites, which tools are being used for archiving, and what technical issues have been experienced.
Some other questions we would like to explore are:
- Ensuring the quality control of website harvests can be challenging due to technical issues with web crawlers. How does your institution handle QA for captured sites?
- What are some copyright and intellectual property issues surrounding archiving websites?
- Are any institutions using web harvesting as a means to provide long-term access to static copies of “boutique” or one-time faculty/student DH or project sites?
- What is the potential for web archives to be used in data or text mining in DH? (The British Library is offering data visualizations of some of their captures.)
Looking forward to discussing this topic, and many others, at PhillyDH @PENN on Tuesday!
Digital Projects Librarian
Digital Library Initiatives
Temple University Libraries