Millions of European artifacts -- like this page from a Book of Hours made in France around 1455 -- will be available online in November, thanks to the European Commission.
Not just a page with some facts and figures; the EU has plenty of those. What an EU commissioner has in mind is a rich digital encyclopedia of Europe's cultural heritage. "Europeana" is an ambitious project to digitize large portions of the continent's national libraries and put as much of European civilization as possible -- books, maps, paintings, photos, films -- online for free.
The Web site at the moment just has a demonstration tour. But Viviane Reding, EU Commissioner for Information Society and Media, has promised to have two million digitized "objects" available for full public browsing -- in English, German and French -- by November 20.
Part of the impetus for the project was to give Europe leverage against Google. The Google Library Project started in 2004 and has already scanned about 10 million books and other works from around the world. In the world-literature scanning derby, in other words, America is ahead, at least in sheer numbers. Former French President Jacques Chirac was disconcerted enough by Google to suggest a rival European-built search engine called Quaero. That project faltered in 2006. But in 2005 Chirac and five other heads of state also initiated the European Digital Library, an earlier and now-defunct project which gave rise to Europeana.
The mission to digitize European culture calls up a disturbing image of sweating EU functionaries in a basement somewhere, scanning Shakespeare folios and Picasso canvasses. But not so: Martin Selmayr, a spokesman for Reding's office at the European Commission, said the project involves organizing various digital projects already underway at Europe's state libraries and national archives, so the books and films and photos and paintings can be clicked through on a single site.
No one at the European Commission scans books. But Reding's office does have to smooth out compatibility problems, and make sure all the scanned files work with all the other scanned files. The libraries also need to avoid duplicating their efforts, since Europeana has no particular need for twenty-seven digital copies of Goethe's Faust.
"That shows you what kind of Herculean task is before us," said Selmayr.
The other major problem is triage: What to scan first? Europeana wants to show "the main elements of a national culture," said Selmayr, so major, fragile old works like the Gutenberg Bible or the Magna Carta have obvious preference. "But it's very important that these decisions are not made by political entities," he said. The EU has left these decisions to librarians and archivists, and since they see digital collections as a form of backup or preservation for the libraries themselves, each institution has different priorities.
"For example, magazines from the German exile era," said Ute Schwens, a director at the German National Library. "During World War II many German intellectuals went into exile, and they published in many different countries. We have these publications, but they are wartime materials. They are rare, they were sometimes published underground, many items are not in good condition. We can't let the general public handle all of them. So to make these available, we are digitizing them first."
Data and Meta-Data
In early October, a library in St. Gallen, Switzerland, announced it would scan its valuable collection of handwritten medieval books with a grant from the Mellon Foundation in the United States. The director of the Stiftsbibliothek at the St. Gallen abbey, Ernst Tremp, told the International Herald Tribune that the first motivation to digitize its manuscripts was a flood in Dresden, Germany, in 2002, which threatened old works of art.
The St. Gallen Stiftsbibliothek is one of Europe's most venerable libraries. It also happens to lie outside the political borders of the European Union, since Switzerland is not an EU member. But its collection includes illuminated manuscripts as well as drinking songs, curses against book thieves, an Irish grammar from 904 and the oldest known book in German, a Latin-German glossary called Abrogans (after the first world in Latin) from about 770 AD.
On Europeana readers will be able to page through old manuscripts like the Missale Aboense, published in Lübeck, Germany, around 1500. The collection will include modern paintings, photos and even films, but for now it's also a rescue mission for rare and fragile items.
"We just haven't been asked (to join) yet," said Rafael Schwemmer, a computer scientist at the University of Fribourg in Switzerland who manages the whole Virtual Manuscript Library as well as the St. Gallen project. "We started five years ago, when Europeana didn't exist."
Jonathan Purday, a spokesman for Europeana at the National Library of the Netherlands, said there was no insurmountable political reason why the Swiss collections shouldn't turn up in Europeana. "I haven't heard of any cooperation yet," he said, "but I would hope that we would be able to harvest their images in the future." What's more important than EU membership is that the data be compatible -- in particular the meta-data that define it in a database or on the Internet.
"Say you scan a painting. The scan isn't the problem -- it's how you categorize it so it can be found on the Web," he said. The year of the work, the artist, its main colors or themes -- all the catalog information Europeana might use to find an object counts as meta-data. To fit a collection into the Europeana program, these meta-tags have to be specific.
"It's still been very difficult to make everything work in the same space," Purday said. "We're working on these (compatibility) problems as we go along. So when the project moves into its next phase" -- a push to get 10 million objects online for public viewing by 2010 -- "we hope things will go much more smoothly."
As these digitizing projects grow, and old artifacts crumble to dust, keeping the data safe will become more and more important. Never mind crashing hard drives: Anyone who has tried to open a ten-year-old Microsoft Word file -- and failed, because their version of Word was too advanced -- knows that keeping old data on a hard drive is no way to preserve it. In the case of Europeana, vast and valuable archives could be useless after a generation if technology advances too quickly. "We have thought about that," said Selmayr. "We try to make sure all the objects are digitized in a migratable way" -- in formats that have a future, as far as anyone can tell.
Ute Schwens, at the German National Library, said some of their less important documents were available only in outdated formats which no latter-day computers support, like Atari and Commodore systems from the 1980s. "When we can't migrate old data," she said, "we have to create a virtual version of the old environment (to read the files). We've had to do that with old Atari and Commodore files. These are things that were published in digital form at the time, like laws. But to read them, you also have to know the Atari system. That can't be translated."
It was the sort of strange problem -- early in the age of digital media -- that international projects like Europeana will need to avoid.
"It was a good test for us," said Schwens.