#1
|
|||
|
|||
Help! Pages after OCR coming out in different sizes
OK, so I've had a major project for a while now, that involved scanning a document of a a bit over 9000 pages and trying to clean it up so that it can be shown off to the public.
The document was in some old papers and I used OCR to convert it into a Word document. The OCR did a fair-to-average job, and I have steadily gone along and cleaned it up. But occasionally for some reason some pages seem to be in landscape mode rather than portrait mode. I'm not sure how I can make them appear the right way around, and I'm worried that anything I do will knock out of kilter the page numbering. I thought it may be an issue with the borders, but I can't seem to figure out how that has gone askew, nor how to fix it. Take a look at the attached screenshot for an idea of what it looks like (with the above page justified the wrong way). Thanks in advance for any help anyone can offer. |
#2
|
||||
|
||||
Often converting from non-Word formats results in random section breaks which needlessly complicate your document. I recommend doing a global search and replace to remove all section breaks (safest to replace with an empty paragraph).
So you find: ^b Replace with: ^p This will remove the variable page layouts and then you can set the page setup and header/footers ONCE.
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#3
|
|||
|
|||
Conversion from any non-Word format results in formatting and editing anomalies. Andrew has suggested a very good way to get rid of the different page orientations or sizes.
If there are strange frames or textboxes or random changes in fonts I would suggest also selecting everything and pasting into a new document as plain text and then using styles to format that text to match the original documents (not the OCR version, but the paper documents). Images can be copied and pasted from the OCR version. OCR today is much better than it was twenty or even ten years ago but it is imperfect. Many erroneous interpretations can be caught by spell check but human proofreading is still required. |
#4
|
|||
|
|||
Thanks so much for the considered replies. The difficulty I face is that some of the page breaks are actually very useful. The OCR seems to have put in page breaks at the end of pages, which keeps the pages in the correct order, and allows me to add returns without putting the pages out of kilter, as I believe would happen if I removed all the formatting.
I'm happy to individually change each page that has gone landscape, I'm just not sure how to do that. It is annoying that the document has added tables in at times when clearly there was no table in the original document. Is there a way just to remove tables, without removing other formatting? |
#5
|
||||
|
||||
Each section break is where the page setup can be changed. So the cleanest way to remove the landscape layout is to remove the section break that follows the landscape range. You can also adjust the page setup of each individual section but that doesn't remove the complexity and is more effort in my opinion.
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#6
|
|||
|
|||
Thanks for your response. Yes, removing the section break does seem to change the page from landscape to portrait again.
Is there an easier way to deal with when the OCR changes the text into a table? Currently when it happens I am just copying it into a Notepad document to strip all the formatting, making it look like the original text, chopping it out of the Word Doc, and then pasting in the information from the Notepad. I am sure there must be an easier way. |
#7
|
|||
|
|||
Select in the Word document.
Cut (Ctrl+X) Paste special as plain text. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Number Pages - not total pages, but actual pages. | Kiminator321 | Word | 8 | 04-29-2020 03:07 PM |
Where is this Section Break coming from? | srussell | Mail Merge | 2 | 08-07-2019 07:54 AM |
Mail not coming through. | Jeff Fenerty | Outlook | 0 | 01-01-2016 07:16 PM |
See next coming slide | Dig | PowerPoint | 1 | 05-27-2011 04:53 PM |
Pages sizes of sections | jwcane | Word | 1 | 06-15-2010 01:58 AM |