How to Digitize Eight Million Books

Fascinating article about how Stanford University set about to digitize the books in its library: bq. About two and a half years ago, a good friend by the name of Christopher Warnock, the CEO and founder of Ebrary.com, an e-book distributor, came to talk to me when we were digitizing some Stanford University Press books on Latin America. He said, "You have to meet a couple of guys about a pretty interesting robot." They were Ivo Iossiger and Danick Bionda, founders of 4Digital Books, which is based in Switzerland. They showed me a video of their robot scanner. I immediately realized that if we could achieve the speeds they were talking about with their robot, we would have a breakthrough in how fast and how consistently we could digitize our materials. bq. When you're turning pages by hand, you can do maybe 150 to 200 pages per hour. It's slow. But the robot can easily do 600 to 1,200 pages per hour without damaging the books. And it's rigorously consistent -- the page is always flat, the image is always good, and software conversion allows you to index the text so you can search it. bq. But it's not just the scanning robot that's needed. There are the servers, the software, the network, the storage. Right now, it is an investment that can only be made by a big place like Stanford that already has a lot of this capacity in place. Even for us, though, a big issue is the large scale required to deal with our collection. With eight million volumes, if we were to digitize everything, we would end up with about a petabyte and a half of data. A petabyte is 10 to the 15th power. Managing the metadata for each individual bibliographic entity and each volume, the coding that allows you to search in a book, or in a collection of books associated by various parameters -- classification, subject heading, author, publisher, place of publication and so forth -- is another petabyte and a half. We're talking about gargantuan-sized memories and massively parallel supercomputers to whiz through this stuff. Not many institutions in this country have that kind of capacity. Maybe it will require a national effort to really do this. WOW! They have some photos of the robot as well a description of how it works. The company that makes the robot (4Digital) has their website here

October 2022

Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          

Environment and Climate
AccuWeather
Cliff Mass Weather Blog
Climate Depot
Ice Age Now
ICECAP
Jennifer Marohasy
Solar Cycle 24
Space Weather
Watts Up With That?


Science and Medicine
Junk Science
Life in the Fast Lane
Luboš Motl
Medgadget
Next Big Future
PhysOrg.com


Geek Stuff
Ars Technica
Boing Boing
Don Lancaster's Guru's Lair
Evil Mad Scientist Laboratories
FAIL Blog
Hack a Day
Kevin Kelly - Cool Tools
Neatorama
Slashdot: News for nerds
The Register
The Daily WTF


Comics
Achewood
The Argyle Sweater
Chip Bok
Broadside Cartoons
Day by Day
Dilbert
Medium Large
Michael Ramirez
Prickly City
Tundra
User Friendly
Vexarr
What The Duck
Wondermark
xkcd


NO WAI! WTF?¿?¿
Awkward Family Photos
Cake Wrecks
Not Always Right
Sober in a Nightclub
You Drive What?


Business and Economics
The Austrian Economists
Carpe Diem
Coyote Blog


Photography and Art
Digital Photography Review
DIYPhotography
James Gurney
Joe McNally's Blog
PetaPixel
photo.net
Shorpy
Strobist
The Online Photographer


Blogrolling
A Western Heart
AMCGLTD.COM
American Digest
The AnarchAngel
Anti-Idiotarian Rottweiler
Babalu Blog
Belmont Club
Bayou Renaissance Man
Classical Values
Cobb
Cold Fury
David Limbaugh
Defense Technology
Doug Ross @ Journal
Grouchy Old Cripple
Instapundit
iowahawk
Irons in the Fire
James Lileks
Lowering the Bar
Maggie's Farm
Marginal Revolution
Michael J. Totten
Mostly Cajun
Neanderpundit
neo-neocon
Power Line
ProfessorBainbridge.com
Questions and Observations
Rachel Lucas
Roger L. Simon
Samizdata.net
Sense of Events
Sound Politics
The Strata-Sphere
The Smallest Minority
The Volokh Conspiracy
Tim Blair
Velociworld
Weasel Zippers
WILLisms.com
Wizbang


Gone but not Forgotten...
A Coyote at the Dog Show
Bad Eagle
Steven DenBeste
democrats give conservatives indigestion
Allah
BigPictureSmallOffice
Cox and Forkum
The Diplomad
Priorities & Frivolities
Gut Rumbles
Mean Mr. Mustard 2.0
MegaPundit
Masamune
Neptunus Lex
Other Side of Kim
Publicola
Ramblings' Journal
Sgt. Stryker
shining full plate and a good broadsword
A Physicist's Perspective
The Daily Demarche
Wayne's Online Newsletter

About this Entry

This page contains a single entry by DaveH published on January 23, 2004 2:16 PM.

Excellent First-Aid supplier was the previous entry in this blog.

So you thought that irksome language was dead? is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Monthly Archives

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.2.9