Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 07-25-2012, 01:03 AM
hawkeyefxr hawkeyefxr is offline Word formatting Windows XP Word formatting Office 2003
Novice
Word formatting
 
Join Date: Jul 2012
Posts: 12
hawkeyefxr is on a distinguished road
Default Word formatting


Hi all
I have some manuals that i have scanned in and saved as PDF image file, i have then run it through OCR software. The result is i have a fairly editable document.
The document is 250+ pages in size, my problem is the formatting. The text is all grouped into hidden boxes. If i click on a section of text the box appears in grey, i can edit the text in this box within reason.

What i want is to release all the text and make it a 'free' text document without these blocks of text within the boxes.
Our works PC have XP and word 2003. I can send it to my own PC that has word 2010 on it.
Reply With Quote
  #2  
Old 07-25-2012, 06:31 AM
Charles Kenyon Charles Kenyon is offline Word formatting Windows Vista Word formatting Office 2010 32bit
Moderator
 
Join Date: Mar 2012
Location: Sun Prairie, Wisconsin
Posts: 9,159
Charles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant future
Default

The problem isn't the version of Word, it is in the nature of OCR. OCR is a complex process and pays no attention to how Word works. When I want to actually use a document in Word that has been scanned, I will usually save it as plain text (.txt) and then copy that text into Word. I then use styles to format the text.

This is a fair amount of work, but nothing compared to dealing with anomalous formatting created by the OCR process. (Some OCR programs are better than others, but none produces a document that edits like one that has been directly produced in Word.)
Reply With Quote
  #3  
Old 07-25-2012, 07:47 AM
hawkeyefxr hawkeyefxr is offline Word formatting Windows XP Word formatting Office 2003
Novice
Word formatting
 
Join Date: Jul 2012
Posts: 12
hawkeyefxr is on a distinguished road
Default

Unfortunatley my PDF is an image PDF, i can't pull of the text from it as it is seen as a picture.

I have used ABBYY OCR and it has done a really good job in my opinion. far better than i have used in the past.

I have worked out how to lose the boxes i was on about and it has left me with the raw text. My document is 260 pages (headache) and i have gone through about 30 pages and hit the point where there are double vertical row of text.

All the text is there i just have to reconstruct it as was the orginal manual. The manual was produced in 1998 and we have no way of getting the original electronic copies.

Still head down and plow on............thanks for the reply
Reply With Quote
  #4  
Old 07-25-2012, 07:56 AM
macropod's Avatar
macropod macropod is offline Word formatting Windows 7 64bit Word formatting Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,963
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Hi hawkeyefxr,

Most good scanning & OCR packages should be able to produce a textual output without producing PDF images as an intermediate step. Even when producing an OCR output from the PDF, your ABBYY OCR package probably has a setting to send the output to a text file. That should be enough to resolve the issue.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #5  
Old 07-27-2012, 05:16 AM
hawkeyefxr hawkeyefxr is offline Word formatting Windows XP Word formatting Office 2003
Novice
Word formatting
 
Join Date: Jul 2012
Posts: 12
hawkeyefxr is on a distinguished road
Default

Hi

I have got it into text and it is much easier to sort out, many thanks
Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Word formatting Word 2007 doc formatting XP vs Win 7 rluna68 Word 4 10-11-2011 12:07 PM
Word formatting Formatting from PDF to WORD goldfish Word 8 04-25-2011 04:50 PM
Word formatting HELP! Word 2007 Formatting ScottieG Word 1 05-06-2010 06:21 AM
Word formatting Word formatting Partsman41953 Word 1 01-10-2010 03:23 PM
Word formatting Word Formatting Peter B. Word 5 05-10-2006 08:13 AM

Other Forums: Access Forums

All times are GMT -7. The time now is 02:15 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft