Unconference topic proposal: Web Archiving

Hello Everyone,


At my university we are exploring options for website archiving for long-term preservation of institutional websites as well as other potential uses. We would like to dialog with participants to find out which institutions are actively archiving websites, which tools are being used for archiving, and what technical issues have been experienced.
Some other questions we would like to explore are:


  • Ensuring the quality control of website harvests can be challenging due to technical issues with web crawlers. How does your institution handle QA for captured sites?
  • What are some copyright and intellectual property issues surrounding archiving websites?
  • Are any institutions using web harvesting as a means to provide long-term access to static copies of “boutique” or one-time faculty/student DH or project sites?
  • What is the potential for web archives to be used in data or text mining in DH? (The British Library is offering data visualizations of some of their captures.)

Looking forward to discussing this topic, and many others, at PhillyDH @PENN on Tuesday!

Doreva Belfiore
Digital Projects Librarian
Digital Library Initiatives
Temple University Libraries

3 thoughts on “Unconference topic proposal: Web Archiving

  1. Hi Doreva, We use Archive-It it for web archiving for our institutional website and also institutional social media sites–Facebook, Twitter, and YouTube channels. We’ve developed our own user interface for this, but it’s still in development. I hope to attend this session!

  2. Update from Caroline Young for this group:
    I attended the Web Harvesting Unconference Session and I had said that I would send the attached items to the group. It is a permission email that the Library of Congress uses to address copyright issues when web harvesting. Please note that I did not write it or having any involvement in its creation or any of the projects at the Library of Congress! I just did research on the topic for an article I wrote about the preservation of legal blogs (a.k.a. blawgs). I have also attached a powerpoint that generally describes their process in web archiving. Again, I have nothing to do with this other than a research interest!
    Email Permission Template from LOC: http://penn2013.phillydh.org/wp-content/uploads/sites/2/2013/06/permission-email.pdf
    PowerPoint from LOC: https://library.columbia.edu/content/dam/librarywebsecure/behind_the_scenes/digital_seminars/meehlieb_jun10.ppt

Comments are closed.