Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 04-04-2024, 09:48 AM
DruidCtba DruidCtba is offline Layout of different pages in the same WORD OFFICE 2021 document Windows 11 Layout of different pages in the same WORD OFFICE 2021 document Office 2021
Novice
Layout of different pages in the same WORD OFFICE 2021 document
 
Join Date: Apr 2024
Posts: 4
DruidCtba is on a distinguished road
Question Layout of different pages in the same WORD OFFICE 2021 document

Hey guys, I have a PDF in image format of a book, and I did OCR using ABBYY Finereader, so far so good, but I just don't understand why it did Optical Character Recognition, and kept the same page image pattern from which it was scanned, but the texts are there, because if I highlight a paragraph and copy it, I can open it in any text editor like NotePad. It seems to me that it made a mix of image pages with text recognized through OCR, but when I save this ABBYY Finereader OCR operation, the PDF looks the same as the PDF of text images, do you guys understand what I mean?

So then I thought, I'll save the OCR PDF document in WORD (DOCX) format and everything will be fine, but now I'm facing another problem, some pages are in one format and others in another format, how can I explain it, since I don't know how to upload an image, I have a page divided into two pages of the book, and another page with just one page, with larger font characters.

Image - https://i.imgur.com/yhMHeOq.jpg



Documents - https://mega.nz/folder/tIsXFKCB#ZpsbNnKy3zX65E33GQDvYQ


Att.How can I solve this, please help me?

Regards.

José Roberto Chaurais.
Reply With Quote
  #2  
Old 04-04-2024, 11:50 AM
Charles Kenyon Charles Kenyon is offline Layout of different pages in the same WORD OFFICE 2021 document Windows 11 Layout of different pages in the same WORD OFFICE 2021 document Office 2021
Moderator
 
Join Date: Mar 2012
Location: Sun Prairie, Wisconsin
Posts: 9,142
Charles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant future
Default

Anything that comes from another format is going to have formatting anomalies, which may or may not be apparent when you simply view or print a document.


Going from one format to another to another to another is just begging for problems.


What the OCR process generally does is add a hidden layer of text in a file, so it has both the images and the text. This is what ABBY Finereader did.



Here is what I generally say about openning and editing pdf files in Word: PDF files can be edited in Word, sort of… How was the file created originally, and by which program? It could have been created from a scan or a picture taken by a phone camera. Those are pictures of words saved as pdfs. Just as you can have a picture of a car. You can see the car in the picture, but you can't change the timing of the engine in that picture. You can't change the order of text or otherwise edit it with a picture of text. Word can open such a file, but it can't edit it. You have a Word file that contains a picture of text rather than text.

In that case, you need to convert the picture to text. This is a process known as optical character recognition. This is built into Adobe Acrobat (but not the free Acrobat Reader) and is also in Office OneNote. Most scanner software comes with an OCR component as well.
How to OCR a PDF in OneNote
Once translated into text, it can be edited in Word but there will still be formatting anomalies.

If you simply want to write on the document (but not in it) you can add a Text Box floating on top of the document layer, whether or not it has been put through the OCR process.

Web pages or Word documents that have been saved as PDF will not need the OCR process, they retain their text, although not all their Word structure and formatting. Documents created as PDF from other programs will likely be even more problematic.

Finally, documents converted from pdf (or really any other format) to Word can be tough to edit because the conversion process never has a one-to-one matching of how formatting is done under the hood. This means that a converted document will seldom be formatted in Word in a way that uses Word features well for that formatting. An example is multiple section breaks to change margins, where in Word you would simply change the paragraph indent. Margins and Indents in Word. Another example is that Word formatting of text is best done using Styles and those will not be used. It will all be direct formatting. That can make a huge difference in how easy it is to edit. The Importance of Styles in Microsoft Word.

If possible, find the file from which the pdf was created and edit that file, using the program that created it. Then if you need it in Word format and it is not, convert it directly to Word. This will cut out one conversion process and make for fewer editing problems.

When I really need the document in Word format and intend to do much editing, I create a new Word file and paste the content into it as plain text. Then I format it to match the original using Styles for the formatting as much as possible. This takes time; for me, it is worth it and saves a lot of frustration.
Reply With Quote
  #3  
Old 04-04-2024, 10:09 PM
DruidCtba DruidCtba is offline Layout of different pages in the same WORD OFFICE 2021 document Windows 11 Layout of different pages in the same WORD OFFICE 2021 document Office 2021
Novice
Layout of different pages in the same WORD OFFICE 2021 document
 
Join Date: Apr 2024
Posts: 4
DruidCtba is on a distinguished road
Default

One thing I noticed, buddy Charles Kenyon, it seems like there was a mix-up of text with the original structure of a PDF document in Image, after using ABBYY Finereader OCR and transferring it to WORD (saving it as a WORD document). When I'm in WORD, near those "black smudges next to some pages," WORD shows a drag icon, and if I press DEL at that point, it deletes the whole drawing, leaving only the text.

What I thought, sorry if I'm wrong, is that in WORD the text is in TIMES NEW ROMAN font, which isn't bad, and so here, I would solve the issues of the pages with two columns, not sure if that's the right term, to describe some double pages and others single, and then they would tell me how to change these double pages into single pages, as that alone would be a step forward in fixing this ebook.

The ebook's source is from the internet, I assume someone scanned the book entirely in image format and decided to make it available like that, which is why I used ABBYY Finereader, as I find its OCR one of the best I've dealt with. It has three areas on the screen, one on the left showing all original pages as images, on the right the OCR result, which turned out very well, and below, a larger part (zoomed in), but I'm not sure what part it shows there.

I thought that by saving the document after OCR as a new PDF, I would have a text PDF, but unfortunately not, it is a Text PDF, but with the same structure as the characters in the image. I swear I'm still wondering why on earth a program would do this, since I believed it would only save the text part, like a book you buy online, or at least that the program would give me that option when saving the OCR PDF, but I couldn't find that option in the program.

But thank you very much for your response, it clarified a lot of what I already suspected, but never thought I couldn't change the double pages into single pages, that somehow by saving it in WORD, it would create a pattern that WORD itself couldn't handle. In reality, they are double pages that I don't have access to editing.

Regards,

José Roberto Chaurais.

Last edited by DruidCtba; 04-04-2024 at 10:10 PM. Reason: translation from Portuguese to English
Reply With Quote
  #4  
Old 04-05-2024, 10:55 AM
DruidCtba DruidCtba is offline Layout of different pages in the same WORD OFFICE 2021 document Windows 11 Layout of different pages in the same WORD OFFICE 2021 document Office 2021
Novice
Layout of different pages in the same WORD OFFICE 2021 document
 
Join Date: Apr 2024
Posts: 4
DruidCtba is on a distinguished road
Default

Guys, "SOLVED", I found the solution in ABBYY Finereader itself, when saving to PDF, there are 3 output options in the Searchable PDF Settings:

1. Text and Figures only

2. Text Over the Image of the page (Default)

3. Text Under the image of the page

Just use option 1 and it will create the PDF as text, a regular ebook downloaded when purchased on the internet.

Regards.
Reply With Quote
  #5  
Old 04-05-2024, 11:01 AM
Stefan Blom's Avatar
Stefan Blom Stefan Blom is offline Layout of different pages in the same WORD OFFICE 2021 document Windows 11 Layout of different pages in the same WORD OFFICE 2021 document Office 2021
Moderator
 
Join Date: Aug 2011
Posts: 3,908
Stefan Blom is a name known to allStefan Blom is a name known to allStefan Blom is a name known to allStefan Blom is a name known to allStefan Blom is a name known to allStefan Blom is a name known to all
Default

Question was also asked in the Microsoft Community:

Layout of different pages in the same WORD OFFICE 2021 document -- Microsoft Community
__________________
Stefan Blom
Microsoft Word MVP

Microsoft 365 apps for business
Windows 11 Professional
Reply With Quote
  #6  
Old 04-05-2024, 04:41 PM
DruidCtba DruidCtba is offline Layout of different pages in the same WORD OFFICE 2021 document Windows 11 Layout of different pages in the same WORD OFFICE 2021 document Office 2021
Novice
Layout of different pages in the same WORD OFFICE 2021 document
 
Join Date: Apr 2024
Posts: 4
DruidCtba is on a distinguished road
Default

Quote:
Originally Posted by Stefan Blom View Post
I am banned from the Microsoft community in Brazil .
Reply With Quote
  #7  
Old 04-06-2024, 04:11 AM
Charles Kenyon Charles Kenyon is offline Layout of different pages in the same WORD OFFICE 2021 document Windows 11 Layout of different pages in the same WORD OFFICE 2021 document Office 2021
Moderator
 
Join Date: Mar 2012
Location: Sun Prairie, Wisconsin
Posts: 9,142
Charles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant future
Default

Quote:
Originally Posted by DruidCtba View Post
I am banned from the Microsoft community in Brazil .
I do not know the reason for that but advise reading up on cross-posting etiquette: A Message to Forum Cross-Posters
Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Annoying office assistant in Word 2021 otuatail Word 5 11-11-2023 12:42 PM
office 2021 this file cannot be previewed because of an error in (word etc.) previewer al70 Office 0 02-04-2023 09:09 AM
Office 2021: Word fails to keep the size of the pasted selection from Paint, Jamal NUMAN Word 3 09-16-2022 10:49 AM
Can VBA change pages in Microsoft Word's “Read Mode” Layout? Bamenny Word VBA 1 05-16-2019 03:20 PM
Book layout in Word or Adobe - 2 pages automatically on the screen msbytes Word 4 04-05-2010 08:22 AM

Other Forums: Access Forums

All times are GMT -7. The time now is 10:34 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft