How to normalize a Word document?

3

2

I was too cheap to hire someone to retype a really, really long scanned document full of legalese. So I OCRed it using OmniPage. But the OCR output was kind of disappointing. I got a word doc that has multiple line spacings. The before and after paragraph heights are different all over the place.

This would be easy, if the entire document had the same paragraph settings, but it does not. There are probably a half dozen different styles going on.

What is the easiest way to normalize the document? For instance, if one paragraph has a line spacing of 20.4 pt and another one has a spacing of 20.9 pt, then I'd like to consider them the same style and set them to a single value? Or really, any suggestion is welcome at this point.

alt text

AngryHacker

Posted 2010-04-20T19:58:24.527

Reputation: 14 731

Answers

13

I end up getting a lot of documents that are a complete mess and impossible to maintain that I need to clean up.

You'll want to learn and use paragraph/character styles if you're not already using them.

In the home tab of the Ribbon, look for the "Styles" window.

Selecting a paragraph or multiple paragraphs in Word and then selecting one of the Paragraph styles will apply the formatting of that style to all of the paragraphs you selected.

This also makes it easier in the future if you want to change a style - just go into the style style definition for a given style, make some changes, and then your changes will be reflected in all paragraphs that use that style.

Click on the little arrow in the bottom right corner of the Style window in the Ribbon to see a list of styles.

My Document Clean-up Process

  1. Make a clean start (Discard all current formatting and apply a default paragraph style) - Select all text and choose the Paragraph Style that matches the greatest portion of text. The selected text is usually regular paragraphs, so I choose the "Body Text" paragraph style. Now all text should be styled consistently.
  2. Gradually rebuild document styling - Apply Headings Now I go through the document adding Headings in order (Scan the whole document for Heading 1's, then Heading 2's, etc.)
  3. Apply other paragraph styles - I continue the above step for all other items that need paragraph styles applied (Lists, Additional Paragraphs, Captions, Tables, Images)
  4. Apply character styles - Then I scan for any character styles that need to be applied.
  5. Fine tune styles - At this point the document should basically be done. I tweak any styles that need modifications for the given document.

Useful Word Shortcuts

CTRL+SPACEBAR   Strip character formatting that's not contained in the applied paragraph style.
CTRL+Q          Strip paragraph formatting that's not contained in the applied paragraph style.
CTRL+SHIFT+N    Apply Normal paragraph style.

jmohr

Posted 2010-04-20T19:58:24.527

Reputation: 2 167

5

I have found that sometimes Word doesn't like fixing this situation no matter what. If that's the case, copy entire document into Notepad, trying to keep bulk formatting (like spaces and paragraphs) and then copy back into Word with fresh settings.

nicorellius

Posted 2010-04-20T19:58:24.527

Reputation: 5 865

I would just add to copy it into a new Word doc, not the one you pulled the text from. – Joe Internet – 2010-04-25T21:45:31.723

That could work... Copying to notepad removes any formatting, though... – nicorellius – 2010-04-26T00:29:07.277

4

If you select the text of the whole document and then open the paragraph format window (as you show above), I believe that the values will be blank because there are multiple values. However, if you manually type in the desired values, then all will change to the new value.

Doug Harris

Posted 2010-04-20T19:58:24.527

Reputation: 23 578