These three articles have previously been published in the Swedish daily Svenska Dagbladet in 1994:
Svensk version från Svenska Dagbladet 1994.
I. The electronic age - on the verge of total memory loss?
II. The hacker - archaeologist of the future?
III. Dare we trust the authenticity of electronic texts?
III. Dare we trust the authenticity of electronic texts?
by Karl-Erik Tallmo
The great difficulties involved in preserving electronically published documents for the future have been discussed in two previous articles. There are two issues related to computer based texts, which have already become problematic: authenticity and copyright.
In the middle ages, books were copied by hand, mostly by monks. Different copies of a book contained different errors, due to misinterpreted dictation or deliberate "improvements" of arbitrary scribes. The art of Gutenberg made it possible to freeze the text once it was printed. The edition became a constant, not the individual copy.
Through more than 500 years we have grown accustomed to the book being something relatively unchangeable and reliable. The emergence of the electronic book, however, takes us in a sense two steps backwards, since here not even the copy is a constant. Anybody who has ever worked with a word-processor knows how easy it is to change a few words in a computer file - and that it is done without a trace.
If one logs on to the database Patrologia Latina and looks up something by Tertullian - how canone be sure that it is Migne's edition? The publisher Chadwyck-Healay of course certifies, as any other publisher would, with their hallmark, that the contents as far as they can control and ascertain are verbatim.
But what if someone has hacked his way into the database and made changes? If one copys the text onto a diskette and gives it to somebody, who in turn hands it over to a friend who writes a dissertation - dare this candidate believe that the quotations he interposes in his text are correct?
The easy access to electronic texts is both their strength and their weakness. If texts are spread from person to person in a totally uncontrolled manner, the result might be a gradual alteration as is the case with oral transmission. The text will undoubtedly degenerate.
When text is digitized with a scanner and a program for optical character recognition (OCR), there are always errors. For instance, r+n may be interpreted as m and zero might come out as the letter O.
The staff at the American Memory project calculates that it costs 5 to 6 dollars per page, if one allows a single error per page (99.95 percent's accuracy). To achieve a rate of one error every four pages (99.99 percent's accuracy), the costs would double.
For economic reasons, some database editors are willing to accept accuracy levels as low as 60-80 percent. Such databases are generally intended for searching only, the idea being that when the document in question is found, one would turn to the printed source. But as long as texts are easy to download, the risk is of course that they will be disseminated and that uncritical writers regard them as reliable source material. In an on-going Internet discussion regarding copyright issues, the leader of the Thesaurus Linguae Graecae project, Ted Brunner of University of California at Irvine, claims that there is evidence that texts from the database have been modified for specific purposes and then disseminated.
In many countries, codes of law are now being published electronically. It is rather easy to imagine new forms of computer crime emerging as a result. For instance, judges will perhaps no longer make reference to printed statutes, relying instead on computer databases. It will then be essential to secure texts from forgery.
Presently, CD-ROM is considered fairly secure since it is a "read-only" medium and thus cannot be changed. However, for unlawful purposes, it would not be too difficult to copy the CD onto a large hard disk, change a paragraph or two and then make a new CD-ROM. The costs for CD-ROM fabrication have declined substantially.
There are solutions, however. A contaminated text can "purify" itself with the help of special corrective codes. Electronic seals can guarantee that a text conforms with the original. Such seals link certain blocks of text to certain checksums. If a text suddenly adds up to the wrong checksum, then the text has been tampered with and is not to be trusted.
A more difficult task is to define what an electronic publication really is or is not. What is a single copy of an electronic publication? And what would constitute an electronic edition?
The Online Journal of Clinical Trials, for instance, has no issues. The journal changes organically with each article added or deleted. There are no back issues and no lost issues. How does such a publication fit in with ordinary filing routines at a library?
Large dailies are now increasingly beginning to archive their own articles digitally in databases instead of on paper clippings. Since most newspaper production is now computerized, the most effective method is to create a "tap hole" in some stage of the production process, capturing the text and saving it.
The risk then is that changes or disturbances that occur after the" tapping" will not be reflected in the filing. Thus, one article might not be published at all or it might be published in another version, but still be filed as published at a certain date in the version that was never used. A physical paper clipping is undoubtedly stronger proof of publication than a computer file is.
But, of course, paper clippings sometimes also contain errors and mistakes. Most journalists know how easily false information in an archive takes on a life of its own and gets published repeatedly, when a reporter accepts an article as source material without double checking. With digital archives, critical judgment and a high level of security in the filing routines are of utmost importance.
Surely, electronic publishing will give new meaning to the notion of source criticism. Many people also believe that the concept of copyright must be changed, while others claim that present legislation will still be applicable to new media. A similar debate took place when television satellites came into use. But copyright survived.
Codes and electronic seals may guarantee authenticity through several lines of users, but there is not yet any copyright management system that reaches beyond the first downloaded copy.
Especially among smaller multimedia production companies, there is an enormous need for copyright free material. Normally, productions that utilize several art forms require multiple contracts with musicians, artists and writers. But small budget operations must resort to copyright free material. A dynamic market has already emerged, where collections of licence-free jingles, ornaments, drawings, photographs, films and backgrounds can be purchased.
Some suggest a sort of A-copyright for conventional publications and a B-copyright, with lower royalties, for electronic publishing. According to this idea, copyright owners would not lose money since their works would probably be more wide spread. A radical reform such as this would lay the ground for monographs with all sorts of works enclosed in extenso, instead of just short samples of text or music.
Many of the full-text databases in use today depend on material that has fallen into the public domain. The editors may freely scan and retype such texts and distribute them on computer media. But there are some complications here:
If you digitize an old copyright free book for electronic publication, for instance a facsimile edition which an ambitious publisher has dug up and prepared for reprinting - no written law is violated. But could it be considered morally correct to benefit from the endeavors of others in that way? On the other hand, facsimile publishers also benefit from the fact that they don't have to pay for the text.
CD-ROM records are sometimes promoted in a dubious way. One claim you hear now and again is that you will be able to copy texts freely from the CD-ROM. But how can those texts be used? For personal use only or for distribution among students?
I mentioned earlier the publication of different countries' codes of law on electronic media. Someone could easily get the idea to publish books on some legal subject under his/her own name simply by copying large portions of a text and, if need be, intersperse a few personal commentaries. In form it may be correct, since law texts in most countries are in the public domain, but would it be ethical to benefit from hundreds of man-years of work done by others through scanning and proof-reading?
Take the Swedish author Strindberg for instance? He has been dead for more than 50 years, and according to Swedish copyright law his works are unprotected. Could one free of charge, publish his works on CD-ROM if, for instance, the national critical text edition was scanned? No, because in that edition, there are notes and commentaries which make it a specifically copyright protected edition. But what if one excluded the notes and published only Strindberg's own text? These questions are very complicated to unravel, since copyright must be applicable on otherwise unprotected works when some sort of "deliberate editorial intervention" has been made. Not only are annotations protected in a critical edition, but also the corrected text itself.
In the debate on the Internet, Neel Smith of the Perseus project, suggests that we might end up in a situation where in practice, it will be the errors of a certain edition that are protected, since nothing but the errors can tell which edition is a copy of which. In other words: I may publish books free of charge by authors whose work has fallen into the public domain, but if the original publisher can prove that the errors in my edition are the same as the errors in their edition, we will probably meet in court. Lawsuits of this kind are already taking place in the US, which might be a good thing, to establish a legal precedent.
On the large computer networks, management systems for clearing of rights and sometimes also for assurances of document security, transaction confidentiality and user validation are already under development.
When building such systems one must take into consideration that text is actually imported and exported through international networks between countries with legislation. International trade is complicated enough when we deal with physical goods. Legislators and programmers will surely be busy for some time to clear out this virtual exchange.
Note: A correction in the above text was made in August 2000. In the third paragraph from the end, Neel Smith is the correct source for the idea of errors indicating editions. /KET
Go to [I. The electronic age - on the verge of total memory loss?]
Go to [II. The hacker - archaeologist of the future?]
Copyright Karl-Erik Tallmo 1993, 1994.