Tag Archives: Internet Archive

Internet Archive backup in Canada?

It’s a good idea whether or not the backup site is in Canada and regardless of who is president of the United States, i.e., having a backup for the world’s digital memory. The Internet Archives has announced that it is raising funds to allow for the creation of a backup site. Here’s more from a Dec. 1, 2016 news item on phys.org,

The Internet Archive, which keeps historical records of Web pages, is creating a new backup center in Canada, citing concerns about surveillance following the US presidential election of Donald Trump.

“On November 9 in America, we woke up to a new administration promising radical change. It was a firm reminder that institutions like ours, built for the long term, need to design for change,” said a blog post from Brewster Kahle, founder and digital librarian at the organization.

“For us, it means keeping our cultural materials safe, private and perpetually accessible. It means preparing for a Web that may face greater restrictions.”

While Trump has announced no new digital policies, his campaign comments have raised concerns his administration would be more active on government surveillance and less sensitive to civil liberties.

Glyn Moody in a Nov. 30, 2016 posting on Techdirt eloquently describes the Internet Archive’s role (Note: Links have been removed),

The Internet Archive is probably the most important site that most people have never heard of, much less used. It is an amazing thing: not just a huge collection of freely-available digitized materials, but a backup copy of much of today’s Web, available through something known as the Wayback Machine. It gets its name from the fact that it lets visitors view snapshots of vast numbers of Web pages as they have changed over the last two decades since the Internet Archive was founded — some 279 billion pages currently. That feature makes it an indispensable — and generally unique — record of pages and information that have since disappeared, sometimes because somebody powerful found them inconvenient.

Even more eloquently, Brewster Kahle explains the initiative in his Nov. 29, 2016 posting on one of the Internet Archive blogs,

The history of libraries is one of loss.  The Library of Alexandria is best known for its disappearance.

Libraries like ours are susceptible to different fault lines:

Earthquakes,

Legal regimes,

Institutional failure.

So this year, we have set a new goal: to create a copy of Internet Archive’s digital collections in another country. We are building the Internet Archive of Canada because, to quote our friends at LOCKSS, “lots of copies keep stuff safe.” This project will cost millions. So this is the one time of the year I will ask you: please make a tax-deductible donation to help make sure the Internet Archive lasts forever. (FAQ on this effort).

Throughout history, libraries have fought against terrible violations of privacy—where people have been rounded up simply for what they read.  At the Internet Archive, we are fighting to protect our readers’ privacy in the digital world.

We can do this because we are independent, thanks to broad support from many of you. The Internet Archive is a non-profit library built on trust. Our mission: to give everyone access to all knowledge, forever. For free. The Internet Archive has only 150 staff but runs one of the top-250 websites in the world. Reader privacy is very important to us, so we don’t accept ads that track your behavior.  We don’t even collect your IP address. But we still need to pay for the increasing costs of servers, staff and rent.

You may not know this, but your support for the Internet Archive makes more than 3 million e-books available for free to millions of Open Library patrons around the world.

Your support has fueled the work of journalists who used our Political TV Ad Archive in their fact-checking of candidates’ claims.

It keeps the Wayback Machine going, saving 300 million Web pages each week, so no one will ever be able to change the past just because there is no digital record of it. The Web needs a memory, the ability to look back.

My two most relevant past posts on the topic of archives and memories are this May 18, 2012 piece about Luciana Duranti’s talk about authenticity and trust regarding digital documents and this March 8, 2012 posting about digital memory, which also features a mention of Brewster Kahle and the Internet Archives.

Does digitizing material mean it’s safe? A tale of Canada’s Fisheries and Oceans scientific libraries

As has been noted elsewhere the federal government of Canada has shut down a number of Fisheries and Oceans Canada libraries in a cost-saving exercise. The government is hoping to save some $440,000 in the 2014-15 fiscal year by digitizing, consolidating, and discarding the libraries and their holdings.

One would imagine that this is being done in a measured, thoughtful fashion but one would be wrong.

Andrew Nikiforuk in a December 23, 2013 article for The Tyee wrote one of the first articles about the closure of the fisheries libraries,

Scientists say the closure of some of the world’s finest fishery, ocean and environmental libraries by the Harper government has been so chaotic that irreplaceable collections of intellectual capital built by Canadian taxpayers for future generations has been lost forever.

Glyn Moody in a Jan. 7, 2014 post on Techdirt noted this,

What’s strange is that even though the rationale for this mass destruction is apparently in order to reduce costs, opportunities to sell off more valuable items have been ignored. A scientist is quoted as follows:

“Hundreds of bound journals, technical reports and texts still on the shelves, presumably meant for the garbage or shredding. I saw one famous monograph on zooplankton, which would probably fetch a pretty penny at a used science bookstore… anybody could go in and help themselves, with no record kept of who got what.”

Gloria Galloway in a Jan. 7, 2014 article for the Globe and Mail adds more details about what has been lost,

Peter Wells, an adjunct professor and senior research fellow at the International Ocean Institute at Dalhousie University in Halifax, said it is not surprising few members of the public used the libraries. But “the public benefits by the researchers and the different research labs being able to access the information,” he said.

Scientists say it is true that most modern research is done online.

But much of the material in the DFO libraries was not available digitally, Dr. Wells said, adding that some of it had great historical value. And some was data from decades ago that researchers use to determine how lakes and rivers have changed.

“I see this situation as a national tragedy, done under the pretext of cost savings, which, when examined closely, will prove to be a false motive,” Dr. Wells said. “A modern democratic society should value its information resources, not reduce, or worse, trash them.”

Dr. Ayles [Burton Ayles, a former DFO regional director and the former director of science for the Freshwater Institute in Winnipeg] said the Freshwater Institute had reports from the 1880s and some that were available nowhere else. “There was a whole core people who used that library on a regular basis,” he said.

Dr. Ayles pointed to a collection of three-ringed binders, occupying seven metres of shelf space, that contained the data collected during a study in the 1960s and 1970s of the proposed Mackenzie Valley pipeline. For a similar study in the early years of this century, he said, “scientists could go back to that information and say, ‘What was the baseline 30 years ago? What was there then and what is there now?’ ”

When asked how much of the discarded information has been digitized, the government did not provide an answer, but said the process continues.

Today, Margo McDiarmid’s Jan. 30, 2014 article for the Canadian Broadcasting Corporation (CBC) news online further explores digitization of the holdings,

Fisheries and Oceans is closing seven of its 11 libraries by 2015. It’s hoping to save more than $443,000 in 2014-15 by consolidating its collections into four remaining libraries.

Shea [Fisheries and Oceans Minister Gail Shea] told CBC News in a statement Jan. 6 that all copyrighted material has been digitized and the rest of the collection will be soon. The government says that putting material online is a more efficient way of handling it.

But documents from her office show there’s no way of really knowing that is happening.

“The Department of Fisheries and Oceans’ systems do not enable us to determine the number of items digitized by location and collection,” says the response by the minister’s office to MacAulay’s inquiry. [emphasis mine]

The documents also that show the department had to figure out what to do with 242,207 books and research documents from the libraries being closed. It kept 158,140 items and offered the remaining 84,067 to libraries outside the federal government.

Shea’s office told CBC that the books were also “offered to the general public and recycled in a ‘green fashion’ if there were no takers.”

The fate of thousands of books appears to be “unknown,” although the documents’ numbers show 160 items from the Maurice Lamontagne Library in Mont Jolie, Que., were “discarded.”  A Radio-Canada story in June about the library showed piles of volumes in dumpsters.

And the numbers prove a lot more material was tossed out. The bill to discard material from four of the seven libraries totals $22,816.76

Leaving aside the issue of whether or not rare books were given away or put in dumpsters, It’s not confidence-building when the government minister can’t offer information about which books have been digitized and where they might located online.

Interestingly,  Fisheries and Oceans is not the only department/ministry shutting down libraries (from McDiarmid’s CBC article),

Fisheries and Oceans is just one of the 14 federal departments, including Health Canada and Environment Canada, that have been shutting physical libraries and digitizing or consolidating the material into closed central book vaults.

I was unaware of the problems with Health Canada’s libraries but Laura Payton’s and Max Paris’ Jan. 20, 2014 article for CBC news online certainly raised my eyebrows,

Health Canada scientists are so concerned about losing access to their research library that they’re finding workarounds, with one squirrelling away journals and books in his basement for colleagues to consult, says a report obtained by CBC News.

The draft report from a consultant hired by the department warned it not to close its library, but the report was rejected as flawed and the advice went unheeded.

Before the main library closed, the inter-library loan functions were outsourced to a private company called Infotrieve, the consultant wrote in a report ordered by the department. The library’s physical collection was moved to the National Science Library on the Ottawa campus of the National Research Council last year.

“Staff requests have dropped 90 per cent over in-house service levels prior to the outsource. This statistic has been heralded as a cost savings by senior HC [Health Canada] management,” the report said.

“However, HC scientists have repeatedly said during the interview process that the decrease is because the information has become inaccessible — either it cannot arrive in due time, or it is unaffordable due to the fee structure in place.”

….

The report noted the workarounds scientists used to overcome their access problems.

Mueller [Dr. Rudi Mueller, who left the department in 2012] used his contacts in industry for scientific literature. He also went to university libraries where he had a faculty connection.

The report said Health Canada scientists sometimes use the library cards of university students in co-operative programs at the department.

Unsanctioned libraries have been created by science staff.

“One group moved its 250 feet of published materials to an employee’s basement. When you need a book, you email ‘Fred,’ and ‘Fred’ brings the book in with him the next day,” the consultant wrote in his report.

“I think it’s part of being a scientist. You find a way around the problems,” Mueller told CBC News.

Unsanctioned, underground libraries aside, the assumption that digitizing documents and books ensures access is false.  Glyn Moody in a Nov. 12, 2013 article for Techdirt gives a chastening example of how vulnerable our digital memories are,

The Internet Archive is the world’s online memory, holding the only copies of many historic (and not-so-historic) Web pages that have long disappeared from the Web itself.

Bad news:

This morning at about 3:30 a.m. a fire started at the Internet Archive’s San Francisco scanning center.

Good news:

no one was hurt and no data was lost. Our main building was not affected except for damage to one electrical run. This power issue caused us to lose power to some servers for a while.

Bad news:

Some physical materials were in the scanning center because they were being digitized, but most were in a separate locked room or in our physical archive and were not lost. Of those materials we did unfortunately lose, about half had already been digitized. We are working with our library partners now to assess.

That loss is unfortunate, but imagine if the fire had been in the main server room holding the Internet Archive’s 2 petabytes of data. Wisely, the project has placed copies at other locations …

That’s good to know, but it seems rather foolish for the world to depend on the Internet Archive always being able to keep all its copies up to date, especially as the quantity of data that it stores continues to rise. This digital library is so important in historical and cultural terms: surely it’s time to start mirroring the Internet Archive around the world in many locations, with direct and sustained support from multiple governments.

In addition to the issue of vulnerability, there’s also the issue of authenticity, from my June 5, 2013 posting about science, archives and memories,

… Luciana Duranti [Professor and Chair, MAS {Master of Archival Studies}Program at the University of British Columbia and Director, InterPARES] and her talk titled, Trust and Authenticity in the Digital Environment: An Increasingly Cloudy Issue, which took place in Vancouver (Canada) last year (mentioned in my May 18, 2012 posting).

Duranti raised many, many issues that most of us don’t consider when we blithely store information in the ‘cloud’ or create blogs that turn out to be repositories of a sort (and then don’t know what to do with them; ça c’est moi). She also previewed a Sept. 26 – 28, 2013 conference to be hosted in Vancouver by UNESCO (United Nations Educational, Scientific, and Cultural Organization), “Memory of the World in the Digital Age: Digitization and Preservation.” (UNESCO’s Memory of the World programme hosts a number of these themed conferences and workshops.)

The Sept. 2013 UNESCO ‘memory of the world’ conference in Vancouver seems rather timely in retrospect. The Council of Canadian Academies (CCA) announced that Dr. Doug Owram would be chairing their Memory Institutions and the Digital Revolution assessment (mentioned in my Feb. 22, 2013 posting; scroll down 80% of the way) and, after checking recently, I noticed that the Expert Panel has been assembled and it includes Duranti. Here’s the assessment description from the CCA’s ‘memory institutions’ webpage,

Library and Archives Canada has asked the Council of Canadian Academies to assess how memory institutions, which include archives, libraries, museums, and other cultural institutions, can embrace the opportunities and challenges of the changing ways in which Canadians are communicating and working in the digital age.

Background

Over the past three decades, Canadians have seen a dramatic transformation in both personal and professional forms of communication due to new technologies. Where the early personal computer and word-processing systems were largely used and understood as extensions of the typewriter, advances in technology since the 1980s have enabled people to adopt different approaches to communicating and documenting their lives, culture, and work. Increased computing power, inexpensive electronic storage, and the widespread adoption of broadband computer networks have thrust methods of communication far ahead of our ability to grasp the implications of these advances.

These trends present both significant challenges and opportunities for traditional memory institutions as they work towards ensuring that valuable information is safeguarded and maintained for the long term and for the benefit of future generations. It requires that they keep track of new types of records that may be of future cultural significance, and of any changes in how decisions are being documented. As part of this assessment, the Council’s expert panel will examine the evidence as it relates to emerging trends, international best practices in archiving, and strengths and weaknesses in how Canada’s memory institutions are responding to these opportunities and challenges. Once complete, this assessment will provide an in-depth and balanced report that will support Library and Archives Canada and other memory institutions as they consider how best to manage and preserve the mass quantity of communications records generated as a result of new and emerging technologies.

The Council’s assessment is running concurrently with the Royal Society of Canada’s expert panel assessment on Libraries and Archives in 21st century Canada. Though similar in subject matter, these assessments have a different focus and follow a different process. The Council’s assessment is concerned foremost with opportunities and challenges for memory institutions as they adapt to a rapidly changing digital environment. In navigating these issues, the Council will draw on a highly qualified and multidisciplinary expert panel to undertake a rigorous assessment of the evidence and of significant international trends in policy and technology now underway. The final report will provide Canadians, policy-makers, and decision-makers with the evidence and information needed to consider policy directions. In contrast, the RSC panel focuses on the status and future of libraries and archives, and will draw upon a public engagement process.

So, the government is shutting down libraries in order to save money and they’re praying (?) that the materials have been digitized and adequate care has been taken to ensure that they will not be lost in some disaster or other. Meanwhile the Council of Canadian Academies is conducting an assessment of memory institutions in the digital age. The approach seems to backwards.

On a more amusing note, Rick Mercer parodies at lease one way scientists are finding to circumvent the cost-cutting exercise in an excerpt (approximately 1 min.)  from his Jan. 29, 2014 Rick Mercer Report telecast (thanks Roz),

Mercer’s comment about sports and Canada’s Prime Minister, Stephen Harper’s preferences is a reference to Harper’s expressed desire to write a book about hockey and possibly a veiled reference to Harper’s successful move to prorogue parliament during the 2010 Winter Olympic games in Vancouver in what many observers suggested was a strategy allowing Harper to attend the games at his leisure.

Whether or not you agree with the decision to shutdown some libraries, the implementation seems to have been a remarkably sloppy affair.

Digital disasters

What would happen if we had a digital disaster? Try to imagine a situation where all or most of our information has been destroyed on all global networks. It may seem unlikely but it’s not entirely impossible as Luciana Duranti, then a professor at the University of British Columbia School of Library, Archival, and Information Sciences, suggested to reporter Mike Roberts in a 2006 interview. She cited a few examples of what we had already lost, (excerpted from my March 9, 2010 posting)

… she commented about the memories we had already lost. From the article,

Alas, she says, every day something else is irretrievably lost.

The research records of the U.S. Marines for the past 25 years? Gone.

East German land-survey records vital to the reunification of Germany? Toast.

A piece of digital interactive music recorded by Canadian composer Keith Hamel just eight years ago?

“Inaccessible, over, finito,” says Duranti, educated in her native Italy and a UBC prof since 1987.

Duranti, director of InterPARES (International Research on Permanent Authentic Records in Electronic Systems), an international cyber-preservation project comprising 20 countries and 60 global archivists, says original documentation is a thing of the past.

Glyn Moody’s March 5, 2012 posting on Techdirt notes a recent attempt to address the possible loss of ‘memory’ along with other issues specific to the digitization of information (I have removed links),

But there’s a problem: as more people turn to digital books as their preferred way of consuming text, libraries are starting to throw out their physical copies. Some, because nobody reads them much these days; some, because they take up too much space, and cost too much to keep; some, even on the grounds that Google has already scanned the book, and so the physical copy isn’t needed. Whatever the underlying reason, the natural assumption that we can always go back to traditional libraries to digitize or re-scan works is looking increasingly dubious.

Fortunately, Brewster Kahle, the man behind the Alexa Web traffic and ranking company (named after the Library of Alexandria, and sold to Amazon), and the Internet Archive — itself a kind of digital Library of Alexandria — has spotted the danger, and is now creating yet another ambitious library, this time of physical books …

For some reason this all reminded me of a Canticle for Leibowitz, a book I read many years ago and remember chiefly as a warning that information can be lost. There’s more about the book here. As for Kahle’s plan, I wish him the best of luck.