Preserving History on the Web: How Does the Library of Congress Help?

Deanna B. Marcum, November 2009

In 2009 the Library of Congress (LC) really got “with it” out in cyberspace where students and even their teachers increasingly gather. On January 27, LC launched a Twitter feed. On April 7, LC opened to the public a YouTube Channel. On April 21, LC joined UNESCO and others in launching the World Digital Library. On June 25, LC opened an online “Teachers Page.” On June 30, LC established a site on iTunes U. On July 9, LC emerged on Facebook. And of course throughout 2009 we continued our Library of Congress blog.*

All this matters to historians because historical resources are a large part of what LC is putting out there. In official parlance, all these events are part of “an ongoing effort” to make LC’s “digital educational, historical, and cultural resources available to Web users across a broad spectrum of platforms.”

Material to which LC has called attention on these electronic sites include Edison films, Pearl Harbor oral histories, Civil War maps, the story of Rosie the Riveter, a “podcast” about slave narratives, a “Webcast” about publishing the Declaration of Independence, and the one-million plus pages of historical newspapers digitized with the help of the National Endowment for the Humanities (NEH). As for the World Digital Library, it gives access exclusively to cultural and historical primary sources, and LC’s Teachers’ Page offers help in using such sources in teaching.

However, all this comes with two huge questions. First, can libraries like LC preserve for future scholars and students the electronic copies of all the material we are digitizing? And second, how do libraries deal with the blogs, online news, and other Web emanations that are created digitally—primary sources that help document our era for historians in the future?

The First Large Question

Consider first the material we are digitizing. LC is a long way from getting all of the nearly 142 million items in its collections online, and probably will never find it feasible or affordable to digitize so many books, manuscripts, audiovisual items, and other materials. But in addition to joining the NEH in digitizing more than a million newspaper pages, now accessible in a free, searchable database (http://chroniclingamerica.loc.gov/), LC and other partners also provide online access to more than nine million items in our American Memory collection of digitized materials, which document many aspects of U.S. culture and history (http://memory.loc.gov/ammem).

Moreover, LC has digitally scanned more than 25,000 books in our Digitizing American Imprints program, thanks to grant assistance from the Alfred P. Sloan Foundation. In this program we electronically copy aging and brittle books that are too fragile for physical handling by researchers. We are also trying to help historians, among others, by providing more online catalog data in a program focused on Hidden Special Collections and Archives. In this program the Andrew W. Mellon Foundation provides grants through the Council on Library and Information Resources to LC and other libraries to “uncover” collections not well known outside their own institutions.

Additionally we have put 3,000 historic photos on our Flickr photo-sharing web site (http://flickr.com/photos/library_of_congress), a large number of photographs, letters, and personal narratives on our Veterans History Project site www.loc.gov/vets/about.html), and much else. LC also is collecting digital content from others to preserve for future use, using new electronic-transfer tools that in 2008 enabled our library to add approximately 80 terabytes to its digital collections.

As Internet-savvy historians know, many libraries besides LC in the United States and abroad have been equally busily adding to the stock of digitized resources available to scholars, teachers, and students. Contributors include all 37 members of the Digital Library Federation. Moreover, libraries are collaborating on new web sites that enable researchers to search catalogs and see digital resources from multiple institutions. Examples include the Global Digital Library, the European Library, and the Pacific Rim Library. And Google, among other search-service companies, has contracted to digitize several major research libraries in part or in whole, giving digital copies to the libraries. Now librarians are trying to find ways to ensure that all of these expensively created and culturally valuable digital resources will be kept safe for perpetual use.

The danger lies in the fragile nature of computer tapes and disks; in the relative quickness with which software and hardware needed to read files become obsolete; in the possibility that files may be diminished or corrupted in the course of repeated migrations; in the growing diversity and complexity of digital formats with their hyperlinks, interactivity, and audio, visual, and textual combinations; and in the sheer quantity of material to be saved and made perpetually accessible. Historians have an obviously huge stake in the work of librarians and others to overcome these challenges.

Fortunately the U.S. Congress has financed a national effort to help libraries and others prevent digital losses. The National Digital Information Infrastructure and Preservation Program, given the ungainly acronym NDIIPP and headquartered at the Library of Congress, coordinates and helps fund efforts by 130 libraries and other partners to identify and improve digital preservation techniques. The National Science Foundation is also in the forefront of the digital preservation effort, as is the National Archives and Records Administration, which has developed a technologically sophisticated Electronic Records Archives. Major libraries abroad are joining in these pioneering efforts as well. Through all of these efforts we hope to develop a positive, multifaceted answer to the first large question I raised: Can libraries preserve for future scholars and students the growing quantities of electronic copies of the scholarly material we are digitizing?

The Second Large Question

Historians also have a stake in our successfully answering the second large question that I posed: What do libraries do about all the thousands of web sites, blogs, online newscasts, and other electronic evidence documenting what people in our time will have thought, said, done, and looked like? These primary sources are not digitized copies of printed texts and filmed images but are “born digital” and may exist electronically only.

In dealing with this kind of material, what and how much do we try to save? What priorities should we apply in making best use of dollars for digital collecting and preserving? What will historians and other future researchers most need out of the mounting digital mass?

Of course, librarians and archivists have dealt with such questions in collecting traditional materials. But the questions are forced anew by all the newscasting, blogging, e-mailing, twittering, and the like continuously springing up via the internet—and by how quickly and often Web pages, and even entire web sites, are changed or disappear. Just last August, NDIIPP brought curators and public policy experts to the Library of Congress for a session to explore strategies for preserving public policy material that has been available only on the Web, where it may be at risk.

Leadership in the preservation of Web materials has come from the Internet Archive, a private, nonprofit organization founded in 1996. In its own words, its intent is “to build an Internet library, with the purpose of offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format.” Subsequently it has archived some 150 billion Web pages, which can be accessed through its “Wayback Machine” (http://web.archive.org/collections/web.html).

In 2000, the Library of Congress joined the Web preservation effort by creating the Library of Congress Web Archives (LCWA, originally called MINERVA), now available at http://lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html. A multidisciplinary team of LC staff members began by studying methods to evaluate, select, collect, catalog, provide access to, and preserve Web materials. Our approach has become to collect web sites as primary sources for the study of major topics in Web-era history. So far, LC, sometimes in collaboration with the Internet Archive and other partners. makes available archives on the following topics:

  • Attack of September 11, 2001
  • Iraq War, 2003
  • Papal transition, 2005
  • The crisis in Darfur, Sudan, 2006
  • The U.S. national elections of 2000, 2002, and 2004

Additionally LCWA provides access to our Law Library Legal “Blawgs” Web Archive, a Visual Image Web Sites Archives, and a Library of Congress Manuscript Division Archive of Organizational Web Sites. LC also is working with several partners to preserve for eventual access U.S. government Web sites as they existed at the end of the presidential administration of George W. Bush.

We in the United States are far from alone in archiving Web material for future study. In 2003, the Internet Archive and the Library of Congress joined with the national libraries of 10 other countries to form an International Internet Preservation Consortium (IIPC). Since then, membership has expanded to at least 38 libraries and related institutions in 27 countries including Catalonia, Iceland, and Singapore. The consortium’s purpose is to foster international collaboration in collecting, preserving, and providing long-term access to “Internet content from around the world.”

None of this means that libraries will stop collecting books, manuscripts, photographs, and material of value to historians in traditional forms. It does mean that libraries recognize the potential research value of digital resources as well—the value of ensuring long-term access to both digitally copied and digitally created materials that illuminate history.

—Deanna B. Marcum is associate librarian for library services at the Library of Congress and a former president of the Council on Library and Information Resources. She holds a PhD in American studies as well as an MLS degree.

Note

1. The blog is at www.loc.gov/blog/. All of the other developments are listed, and can be accessed through, the 2009 section of a list of “milestones in the Library’s use of the Internet to share its resources with the public,” found at www.loc.gov/rr/program/bib/libsci/faq.html in answer to FAQ 1, accessed August 26, 2009. Source citations and web site locations not given in the text of this article are available on request from the author at dmarcum@loc.gov.