DjVu

Scan to Web Technology

 

DjVu is an open-source digital imaging format developed by AT&T Research Labs and designed specifically for capturing scanned images and publishing them on the World Wide Web.  DjVu permits a user to scan, store and download documents 20 times faster than JPEG or GIF and at compression ratios as high as 500:1.  This is accomplished by separating a document into a bitmap (black and white) portion and a color photo-imagery portion and encoding them with different techniques to enable smaller storage requirements and instant downloads. These include a new technique based on the mathematical theory of "wavelets". To speed download times, DjVu uses a progressive coding technique that brings up the initial version of the page very quickly and improves its visual quality as more bits arrive. For example, the text of a typical magazine page appears in just three seconds over a 56KBps modem connection. In another second or two, the first version of the picture and background appear. After a few more seconds, the final full quality version of the page is completed.

 

 

Benchmark Testing for UTK E-Reserves: A Comparison of DjVu with PDF

 

Two major problem areas we have encountered using PDF for E-Reserves are file size and print speed.  File size poses a greater problem for remote users who are accessing the system via a standard pots (dial-up) line.  Print speed appears to be most problematic for users who elect to print via a network printer. 

 

These problems have prompted us to explore other avenues in providing an electronic reserve service.  I recently conducted a comparison test of DjVu and PDF formats to determine if DjVu might be better suited for an electronic reserve initiative.  The document links to the right can be used to compare the download performance of the two file formats.

 

You’ll need the readers if you don’t already have them:

DjVu Web Browser Plugin

Acrobat Reader

PDF (IMAGE)

DjVu

(IMAGE)

PDF

(TEXT)

DjVu

(TEXT)

Equipment and Settings Used:

HP Scanjet 5p

200 dpi

true color (image)

B/W (text)

7.96 x 10.89

 

Results

File Sizes

PDF (IMAGE) – 9,028KB

DjVu (IMAGE) – 78KB

PDF (TEXT) – 56KB

DjVu (TEXT) – 29KB

 

Print Spool Size

PDF (IMAGE) – 7,360KB

DjVu (IMAGE) – 9,920KB

PDF (TEXT) – 7,360KB

DjVu (TEXT) –9,920KB

Yann LeCun et al, “DjVu: A Comparison Method for Distributing Scanned Documents in Color over the Internet,” (accessed 30 Nov 2000); available from http://djvu.research.att.com/djvu/techpapers/lecun-98c/index.djvu 

 

Feedback:  adsmith1@utk.edu