Proofreading

We use a version of the Distributed Proofreaders software, part of Project Gutenberg. It divides the work into small, easily managed pieces.

Each book goes through two processing “rounds”:

P1: Proofreading Round 1
Page texts are raw output from OCR software, typically with many scanning errors. Proofreaders carefully compare the texts to page images, fixing errors, adding a bit of formatting, and (optionally) flagging errors found in the images.
P2: Proofreading Round 2
Page texts have all been proofread once, and now need to be reexamined closely for small errors that might have been missed.

Once both rounds are complete, the texts go to an editor, who performs a third and final check, and then processes the text into formats suitable for ebook and print editions.

Rules

Above all, we’re trying to make the texts match the images. That means correcting scan errors and adding a small amount of formatting, but we do not correct errors in the original images. You can flag them, however, by placing an asterisk in square brackets [*] at the end of the relevant line.

Headers, Footers, and Page Numbers

  • Remove these entirely from the proofed text.

Spacing and Paragraphs

  • Insert a blank line before every new paragraph, even at the top of a page.

  • Remove indentation at the beginning of lines, for example at the start of paragraphs.

  • Collapse multiple spaces to single spaces.

Line breaks

  • Leave the original line breaks: do not re-wrap lines.

Hyphenation

  • If a line ends with a hyphenated word, reassemble the word, moving the second half of the word to the first line. Usually you should remove the hyphen in the reassembled word (e.g. in est-as, fil-ino, etc.), but sometimes the hyphen can be a normal part of the word, (e.g., nigra-blanka). If you’re unsure what to do, leave the hyphen in and place an asterisk just after it, like this: nigra-*blanka

  • If the last line on the page ends with a hyphenated word, don’t try to reassemble it; just add an asterisk after the hyphen.

Bold, Italics and Small Caps

  • Put markup tags around <b>bold text</b> and <i>italics</i>. For Small Caps, use <sc>Small Caps</sc>.

Accented Characters

  • The texts will contain the usual accented letters of Esperanto. Look out for missing accents, especially on uppercase letters, and uppercase letters that the scanner has read as lowercase. If you can’t type the accents, see Kiel tajpi en Esperanto.

  • If there are other strange characters that you can’t type, just put a flag at the end of the line to alert the editor in post-processing: [*]

Dashes

  • Many Esperanto texts mark quoted text using em-dashes (—) or en-dashes (–). We represent them both with two hyphens (--).

Ellipses

  • Use three periods, with no space between them.

Horizontal Rules

  • You’ll sometimes fine a divider separating blocks of text. Often these take the form of a solid line, a long string of dashes, or three asterisks in the center of the page.

    In all cases, replace these with the <hr> tag.

Quoted text

  • Quoted text in older Esperanto books will be set apart with en-dashes (–). Treat these just like other en-dashes, with two hyphens.

  • Some quoted text will be set off with special characters, things like “American/British ‘quotes’,” „German quotes,“ «French guillemets,» and so on. Replace all of these with plain "straight ASCII 'quotes'".

Footnotes

  • Footnote markers go in square brackets at the point where they appear in the text. Keep letters or numbers if used in the book, but replace special symbols (daggers, stars, etc.) with asterisks.

    At the bottom of the page, proofread the footnote like normal text. Just make sure it has the same marker you used above, but without the square brackets.

Miscellaneous


An example

Here’s an example that uses some of the tips above:

Page image

page image

Proofed text


Dogler kapneis kaj ŝi, levinte la ŝultrojn, dediĉis sin al la knabeto.
Ŝi ridetis al li, kvazaŭ revante pri ia neatingebla feliĉo kaj
poste leviĝis. La poteton da lakto ŝi metis en la fornon. Ĝi degelu
kaj varmiĝu. El la pakaĵo ŝi prenis kukon kaj rostitan kokidaĵon.
Ŝi aĉetis ilin en la urbo por la mono de Doŝky. Poste el [*por>per?]
sub la lito ŝi prenis alian pakaĵon kaj komencis altranĉi infanan
<i>"rubaŝkon"</i> (bluzĉemizon) el soldata mantelo. Kiel sperte ŝi
faris ĉion. Ne unuafoje en la vivo. Ŝi ja havas kelkajn fratetojn.
Post la tondilo venis la vico de la kudrilo kaj la filigranaj manetoj
kun rapida lerteco aplikis, stebis, kudris.

La knabeto vekiĝis kaj la grasaj pugnetoj frotis la duone dormantajn
okulojn. La buŝeto malfermiĝis je oscedo, kies fino fariĝis
kaprica plorpepado. Fiza formetis la laboron kaj kuris al li.
Dogler dronis en konfuzaj pensoj.

-- Li ja vivas. Kial mi pensis lin mortinta? Strange! Kial?

Things to note:

  • A blank line before each paragraph (even at the top of the page)
  • Line indentation removed
  • Hyphenated words at ends of lines reassembled
  • A possible printing error flagged and noted at the end of a line
  • Straight quotes and tags for the italicized „rubaŝkon“
  • En-dash replaced with --
  • Page number removed

Tips for Newbies

Use the DP Sans Mono typeface

This is a special typeface designed for proofreading, making it easy to distinguish between letters that are visually similar and often confused.

Click on Prefs (upper right) and then the Proofreading tab. In the sections labeled Font Face, choose the radio button for DP Sans Mono, then click on Save Preferences and Quit.

Find a Project

To begin proofreading, click on P1 (top of page) and you’ll see a list of titles that are currently available for round 1. Click on a title to see the project page.

The project page has a lot of details you don’t need. The most important are the Project Comments and the Start Proofreading link.

Proofread!

For each page you’ll see a scanned page image along with the OCR text. Edit the text to match the image, using the rules above. When you’re done with a page, make sure to click on Save as ‘Done’ or Save as ‘Done’ & Proofread Next Page. Otherwise the page will remain “checked out” to you and unavailable for others to work on.

You can also click Return Page to Round to abandon any changes and free up the page for another proofreader.

WordCheck

The software has a dictionary of known Esperanto words, including many of the proper names used in the books we’re proofreading. After you’re done proofreading a page, you may want to click on WordCheck to double-check your work.

The software will flag words it thinks might contain errors, and allow you to correct them. (Remember: don’t correct any errors that appear in the page images!). There will be many false positives; that’s normal. If a good word has been flagged by mistake, you can click the icon next to it to suggest it be added to the “good word” list.