Ocr

Biblide · #1 12-29-2017, 04:31 AM

Hi!
I have a problem when I convert pdf in word by using ABBYY Fine Reader.
Some of the files that I converted are visulized side by side, one in a smaller format than the other.

I would like to obtain a normal file word where each page follows the other. Can anyone help me?
I'm sorry if my explanation is not clear, I wanted to upload an image to clarify it, but the system doesn't allow me.
Thank you in advance

gmayor · #2 12-29-2017, 06:13 AM

Fine Reader is great as far as it goes, but it still has to overcome one of the most daunting tasks that you might ask a PC to perform - OCR. This is essentially turning graphics into text, so you should not be too surprised that it rarely produces an accurate conversion.

I assume from the fact that you have not used Word 2016 to open and convert the PDF that it is not intended for such copying.

Try printing the PDF file to paper and scan the resulting pages into FineReader. You may get better results ... but may not.

Biblide · #3 12-29-2017, 06:48 AM

Thank you for your advice.
Could the problem be caused by the way used to scan the original page?
Because some pdf come from images created by scanner and others from images made by phone.

Charles Kenyon · #4 12-29-2017, 07:43 AM

The basic problem is that scanning, ocr & conversion do not give you a clean Word document. They give you a formatting mess that may look fine. When you try to edit it, things can get confusing. This is true of all conversion and ocr processes of which I am aware. There have been great advances in this technology but I would be surprised if it ever becomes perfect.

Generally, when I want to use a scanned document (or text from the web) to produce an editable document, I do the ocr and copy into a new Word document as plain text. I do the formatting there. Note, also, that the best ocr is still not 100% accurate. There can be errors that are not apparent to the eye like the substitution of the number 1 for the letter l or vice versa. The blurrier the original, the more the errors.

Biblide · #5 12-29-2017, 08:57 AM

Thank you for your explanation. I think I have to be quite satisfied of the result obtainted with Fine Reader, after all.