Project Gutenberg — Col Choat


Project Gutenberg (PG) began in 1971 when Michael Stern Hart1 was given one hundred million dollars worth of computer time by the operators of the Xerox Sigma V mainframe at the Materials Research Laboratory at the University of Illinois. Mr Hart suggests that he happened to be in the right place at the right time as there was more computer time than people knew what to do with and the operators were encouraged to do whatever they wanted with that fortune in “spare time” in the hope that they would become more proficient at their jobs. After due reflection, Mr Hart decided that one of the most effective uses of computers would be the storage, retrieval, searching and reading of material stored in computer libraries. He then proceeded to key in the American Declaration of Independence and produced the first electronic text (etext) in the PG library. The rest, as they say, is history.

Creation of an etext of the Declaration of Independence was followed by the American Bill of Rights, the US Constitution, the Bible, Shakespeare (a play at a time), and then by general work in the areas of literature and reference. From December 1971 to December 1993 one hundred etexts were produced. This was no mean feat when one considers that the list includes Shakespeare, the Bible and other considerable works. All had to be keyed in and then checked by proof reading and comparison with the printed work. Appropriately, and not coincidentally, etext one hundred was The Complete Works of William Shakespeare.

Now, with the advent of computer scanners (which enable one to “read in” printed pages and convert them to editable electronic text) and the increase in popularity of the Internet, there are over three thousand six hundred etexts available in the Project Gutenberg library and Mr Hart recently announced that, for the first time, more than twenty new etexts were posted to the library in one week. A prodigious effort by the many volunteers involved in converting printed works into etext. The aim is to reach ten thousand texts.

One might think that the pool of printed works will run dry, however this can never happen because every year new works become available as the copyright on them runs out. Furthermore, volunteers have begun the work of converting to etexts the literary gems of other languages, thus opening further rich veins of literary ore for plundering.

Electronic Data

The premise on which Michael Hart based the Project Gutenberg concept was that electronic data stored in a computer can be reproduced indefinitely by passing it from computer to computer. Once a book or any other item (including pictures and sounds) has been stored in a computer then any number of copies can be made. Everyone in the world, or even not in this world (given satellite transmission) can have a copy of a book that has been entered into a computer. When people holiday on Mars, later this century, they might have a copy of Homer's Iliad beamed up to them. The book that they always meant to read. They would only need to specify the required language.

It was decided to store etexts in the simplest, easiest to use form available: the “plain vanilla” or ACSII2 format, the basic characters one reads on a normal printed page. Italics, underlines, and bolds would be capitalized as they are not supported by many basic text readers. This decision was made because 99% of the hardware and software in use all over the world can read and search these files. Any other system of etext storage will fall short of an audience of 99%. Furthermore, etexts stored in this format are easily converted to many other formats, such as that used in word processing and that used to represent text on Internet web pages (i.e. HTML3).

Michael Hart has said that he wants people to be able to use PG etexts to look up quotations they have heard in conversation or in movies, or which they have read in other books. He envisages a compact disc (CD) containing all PG titles, which will constitute a library containing all these quotations within the individual etexts. One could easily search the entire library without any program more sophisticated than a plain search program found on every personal computer.

The text of an average book will fit on a standard 3.5inch floppy disk, available on most personal computers. However, pictures such as those in the book Alice in Wonderland present special problems for electronic reproduction because of the computer disc space which they take up. Nevertheless, Project Gutenberg is very interested in including pictures and other graphics and will continue to take advantage of developments in computer technology to add to the richness of its library of free, readily available literary and reference works.

Scope of the library

The cataloguing and indexing of the library is still under review and is, in itself, a major undertaking. However, works may be broadly classified as follows:

Light literature such as Alice in Wonderland, Through the Looking-Glass, Peter Pan and Aesop's Fables.

Heavy Literature such as the Bible and other religious documents, Shakespeare, Moby Dick and Paradise Lost.

References such as Roget's Thesaurus, almanacs, a set of encyclopedia and dictionaries, philosophy and natural history.

There is no substitute for a good book

Many people point out that there is no substitute for the look, feel and smell of a book and that it is easy to browse through it, mark relevant passages and look at the illustrations. This is perfectly true, and one might say that the use of etexts has until now, been largely restricted to using them to find specific references, since one needs a sit at a computer to view them. Until now, that is.

Sometimes we must wait for technology to catch up before we can make use of an existing situation. The Internet existed in only a crude form when Mr Hart started keying in the Declaration of Independence. We had to wait for computers to become cheap and ubiquitous for the production of PG etexts to explode. In the same way, technology is only now making available portable electronic readers with which we will be able to read etexts, or have them read aloud to us via text recognition software, wherever we can now read a book. As one sits on Mars and use a voice command to open The Iliad to a bookmarked position one might issue the command “mouldy old paper” to have the reader exude the smell one most associates with old books.

It is part of Michael Hart's genius that he saw the potential of Project Gutenberg and persisted with the concept for over twenty years before technology turned the project into something beyond, dare I say, even his wildest dreams. There is no substitute for a good book. It is just that its present form may not matter all that much to future generations.


The continuing success of Project Gutenberg depends on volunteers. As Michael Hart has frequently pointed out, PG is made up entirely of volunteers who produce etexts, proof read them, post them to the PG Internet site, post copies on “mirror” sites around the world, maintain the computer hardware and software involved in the project, correct errors in the text as noted by end-users, do copyright checks and attend to the many administrative tasks involved with any major co-operative project.

Volunteers choose which texts they wish to work on and hence which etexts are posted to the PG site. Since any book out of copyright4 may be used, there is a bewildering choice of titles. Any title chosen is subject to a copyright "clearance" after which it will usually accepted for posting. Some volunteers prefer to proof read work prepared by others. Or, one may become involved in “helping” Mr Hart put the finishing touches to texts before posting, such as adding headers and footers or making minor formatting changes.

When you are reading your etext of The Iliad whist holidaying on Mars, spare a thought for the prodigious amount of work which has been undertaken by Michael S. Hart and the PG team to bring it to you just when and where you want it.

Project Gutenberg on the Internet

The official PG site may be found at A regular newsletter is produced and information is provided about volunteering.

The Bikwil site has a link to PG at It is rumoured that Tony Rogers exhibits unseemly enthusiasm about the PG site.

For a list of Australian texts on PG, try


When Johann Gutenberg invented the printing press he unleashed an unstoppable process which facilitated communication between members of the human race and the passing of knowledge and ideas in ways previously undreamed of. The invention of the computer and the expansion of the Internet have extended the capacity to pass on such knowledge and ideas. Project Gutenberg, as the repository of the condensed knowledge and ideas of some of the greatest minds in human history, contributes in no small way to this process.


[ Col runs the Australian Project Gutenberg site mentioned above, and he can be contacted via email at ]

1 Michael S. Hart, Professor of Electronic Text at Benedictine University (Illinois, U.S.A.) and Visiting Scientist at Carnegie Mellon University (Pennsylvania, U.S.A.), founded Project Gutenberg in 1971 and is currently its Executive Director. In a November 1998 article in Wired Magazine, Hart was chosen among The Wired 25: A Salute to Dreamers, Inventors, Mavericks, and Leaders. (See dated July 1999).

2 ASCII is an acronym for American Standard Code for Information Interchange, a standard for storing characters and numbers in computers.

3 HTML is an acronym for Hyper Text Markup Language.

4 In the U.S.A., books are generally out of copyright seventy-five years after publication. As a rule of thumb, books published before 1923 are eligble. Full details are provided on the PG site.

