#1
|
|||
|
|||
Word formatting
Hi all I have some manuals that i have scanned in and saved as PDF image file, i have then run it through OCR software. The result is i have a fairly editable document. The document is 250+ pages in size, my problem is the formatting. The text is all grouped into hidden boxes. If i click on a section of text the box appears in grey, i can edit the text in this box within reason. What i want is to release all the text and make it a 'free' text document without these blocks of text within the boxes. Our works PC have XP and word 2003. I can send it to my own PC that has word 2010 on it. |
#2
|
|||
|
|||
The problem isn't the version of Word, it is in the nature of OCR. OCR is a complex process and pays no attention to how Word works. When I want to actually use a document in Word that has been scanned, I will usually save it as plain text (.txt) and then copy that text into Word. I then use styles to format the text.
This is a fair amount of work, but nothing compared to dealing with anomalous formatting created by the OCR process. (Some OCR programs are better than others, but none produces a document that edits like one that has been directly produced in Word.) |
#3
|
|||
|
|||
Unfortunatley my PDF is an image PDF, i can't pull of the text from it as it is seen as a picture.
I have used ABBYY OCR and it has done a really good job in my opinion. far better than i have used in the past. I have worked out how to lose the boxes i was on about and it has left me with the raw text. My document is 260 pages (headache) and i have gone through about 30 pages and hit the point where there are double vertical row of text. All the text is there i just have to reconstruct it as was the orginal manual. The manual was produced in 1998 and we have no way of getting the original electronic copies. Still head down and plow on............thanks for the reply |
#4
|
||||
|
||||
Hi hawkeyefxr,
Most good scanning & OCR packages should be able to produce a textual output without producing PDF images as an intermediate step. Even when producing an OCR output from the PDF, your ABBYY OCR package probably has a setting to send the output to a text file. That should be enough to resolve the issue.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#5
|
|||
|
|||
Hi
I have got it into text and it is much easier to sort out, many thanks |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Word 2007 doc formatting XP vs Win 7 | rluna68 | Word | 4 | 10-11-2011 12:07 PM |
Formatting from PDF to WORD | goldfish | Word | 8 | 04-25-2011 04:50 PM |
HELP! Word 2007 Formatting | ScottieG | Word | 1 | 05-06-2010 06:21 AM |
Word formatting | Partsman41953 | Word | 1 | 01-10-2010 03:23 PM |
Word Formatting | Peter B. | Word | 5 | 05-10-2006 08:13 AM |