The post “Proofreading” (below) provides a method to create a list of every word in a Microsoft Word document using TextSTAT that makes previously undetected typos and spelling errors more evident for correction. If you are using InDesign to create a book, the following steps will result in an MS Word document file needed for TextSTAT to work.
From the InDesign book palette, select “Export Book to PDF” using the same preset for creating your press-ready PDF.
Open the PDF in Adobe Acrobat Pro and save it as an html file (File> Save As> More Options> HTML Web Page). Before saving, click the “Settings” button and uncheck "Include Images" and "Run OCR." Click “OK.” Saving the PDF as a Word file does not capture all of the text in the headers properly.
Save the html version of the PDF file where you can locate it. Open the html file in MS Word and save it as a .doc file. Now, follow the directions outlined in the “Proofreading” post.
Proofreading is time consuming, but is one of the most important final steps before committing a book to print or online publication. Misspelled words seem to get missed even with the most careful proofreading. TextSTAT is an excellent tool to help identify misspellings. TextSTAT produces a list of every word in a Microsoft Word .doc file and the number of times the word appears (a frequency list). By sorting the word list alphabetically it is easy to see if variant spellings of a word or surname have occurred since they are often listed near each other. For example the misspelling of surname “Ulmann” as “Ullman” may not be easily seen by a proofreader, but it becomes evident in a frequency list. Exporting the frequency list as a CSV file and opening it in MS Word allows you to use Word’s dictionary to underline in red the words that may be misspelled. Here’s how:
Download the free English version of TextStat from Freie Universität Berlin:
Unzip the software. Double click on TextSTAT.exe to launch the application. Click the “Run” button. Create a New Corpus. Add a Microsoft Word file for your book. Click the “Word Forms” tab. Select “sort alphabetically.” Click the “Frequency List” button. Export “Frequency list > CSV file.” Name and save the file.
Launch MS Word. Open the just created .csv file. Click the”OK” button for the automatically selected “Unicode (UTF-8).” Save the file as a .doc or .docx file. Add a space the first line–this turns-on the underlining of possibly misspelled words. Select all [<Ctrl> a]. Change the case to “Capitalize Each Word.” Save.
Highlight and copy a word that you want to check. Paste the word into the “Define filter” field of the TextSTAT software, under the “Word forms” tab, and click the “Frequency list” button. Copy/paste may not work with Windows 7, in which case you will scroll down and locate the word in the TextSTAT list. The result is the word listed in context for each use. Double click on each line to read more. Click on the link to the book file to open it and search for the misspelled word [<Ctrl> f]. Make corrections in the file as needed.