Online library reaches million book milestone

An international venture called the Universal Library Project has
made more than one million books freely available in digitized
format. The joint project of researchers from China, India,
Egypt, and the US has the eventual aim of digitizing all published
works of man, freeing the availability of information from geographic
and socioeconomic boundaries, providing a basis for technological
advancement, and preserving published works against time and tide.

One and a half million books in more than 20 languages, including
Chinese, English, Arabic, and various Indian languages, are now
accessible via a single Web portal.
The online library includes rare and out-of-print books from private
and public collections around the world.

“There are plenty of books that are no longer in copyright, and
that have long been forgotten, but which would be useful to scholars,
students, and just the general population,” says Michael Shamos, a
copyright lawyer, computer science professor, and co-director of the
project at the Carnegie Mellon University in the US.

“There is a tremendous amount of knowledge that we thought would be
lost to mankind if we didn’t start digitizing,” he says.

The project believes digital books on the Internet should be free
to read, instantly available, easily accessible, printable on-demand,
translatable to any language, and readable to both humans and
machines. Additionally, with the advent of low-cost technology like
the One Laptop Per Child project’s XO laptop and ebook readers,
digitized books are expected to reduce the cost of learning by
replacing the repetitive cost of books with a one-off computer
purchase and freely downloadable information.

According to the researchers’ estimates, the Universal Library
collection currently represents a mere one percent of the
approximately 100 million books to ever have been published. Shamos
expects only half of the published books in existence to be found in
physical libraries around the world, so the task of physically
locating a rare book can be a tedious process.

“The only way you can obtain an out-of-print book is to find a
library that has one, and either travel to that library, or obtain
that book through an interlibrary loan,” he says. “It’s a very slow
process, especially considering that without seeing the book, you
might not know if there’s anything interesting in it for you.”

When the project was initiated in 2002, members expected other research and commercial
projects to digitize only around 50,000 books. Google Book Search is one such
project that was started since that time; in recent years, it has come under fire for alleged breaches
of copyright. While Shamos expressed a high regard for Google’s
efforts and the publicity it has attracted to book digitization, he
said the Universal Library Project had “similar but different”
goals.

“We want to digitize all published works of man; I don’t think that
anybody at Google would ever say that’s what their goal is,” he says.
“Their goal is to sell advertising, and one of the ways that they find
to sell advertising is to create a Web site that has such rich content
that people want to visit it all the time. I don’t think that Google
has any interest in putting Sanskrit works up on their Web site.”

Like Google, the Universal Library Project faces issues in
publishing copyrighted books online. As such, books currently under
copyright are only available in part via the Web portal, while books
that are not bound by copyright restrictions are fully and freely
available online.

Citing a need for information to be freely available, Shamos
expects these copyright restrictions to become less of an issue in
time, as publishers adapt to the low-cost business model that digital
books offer.

“Copyright is going to become less and less significant [because]
through digitization, the cost of publishing is vanishingly small,” he
says. “As the cost of copying goes down, the value of works goes down,
and the ability to make profit from them goes down.

“There is a difference in reading for pleasure and reading for
information; what is going to happen, I think, is that copyright is
going to end up focusing on works of entertainment and not works of
information.”

High numbers

The Universal Library Project is the brainchild of researchers at
Carnegie Mellon University, and has received $3.5 million in seed
funding from the National Science Foundation. The project has also
received in-kind contributions from the Zhejiang University in China
and the Indian Institute of Science in India that have been valued at
$10 million each, and has more recently forged a partnership with the
Library at Alexandria in Egypt.

With more than 1,000 workers in about 50 scanning and digitization
centres around the world, the Universal Library collection is growing
at an estimated 7,000 books per day. There is a fair way to go before
the project reaches its lofty book digitization goals; even so, the
researchers have set their sights on eventually including content like
music, artwork, lectures, and newspapers in the library.

“We believe that by having a universal library with all published
works of man, and having multiple sites all around the world that
house the entire content, it will be impossible to destroy these
works,” Shamos says.

“There can never again be a destruction
of the library of Alexandria. There could be a destruction of the
building, but there can’t be a destruction of the works, and so this
makes the creation of man impervious to changes in political regime,
culture, Moirai.”

High numbers

RELATED ARTICLESMORE FROM AUTHOR

Celebrating the Second Year of Linux Man-Pages Maintenance Sponsorship

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

RELATED ARTICLES MORE FROM AUTHOR