![]() |
#1
|
|||
|
|||
![]()
Hi all,
I have some large documents that are exported from pdf's (all total a few thousand pages) I've exported to .docx to work on. In this exporting there are a few problems with the headers and footers. 1. Lots of 'breaks', where header or footer is not linked to the previous. 2. The section names jump some large jumps, e.g. section 8 then next is section 14. 3. The headers and footers are all different "sizes" - the distance from the top or bottom is different. 4. When I've manually linked sections and then added new page numbers, there are some areas where the page numbering is set to start a certain value. Is there a way to clear all that section/header and footer information and start fresh? I've tried copying all the text and pasting into a new document, however all header and footer info is copied. Or a way to work on all the headers and footers at once? |
#2
|
|||
|
|||
![]()
1. no surprise there
2. this is not possible. Sections are numbered sequentially, there can be no gaps. There could be some Continuous sectionbreaks that may make it SEEM like there is a jump. But it is not possible. The Object Model does not permit it. 3. no surprise there either 4. not sure what you mean. You could remove all section breaks, thus making one section for the whole document using Find & Replace. |
#3
|
|||
|
|||
![]() |
#4
|
|||
|
|||
![]()
using find and replace to get rid of all the section breaks seems to have worked a treat. Still need to look at some parts however that has sorted most of my issues - thanks a lot!
|
#5
|
|||
|
|||
![]()
For my own interest, is it possible to attached a copy of the document with the "gaps" in sections? Removing any sensitive information. I have been doing this a long time and I really did not think this was possible. I want to see it for myself. I would very much like to confirm this. There must be continuous sections in there. If there are not and there really are gaps it will shake me up a lot.
|
#6
|
|||
|
|||
![]()
Just curious, what process did you use to "export" from pdf?
Converted documents are always painful to deal with, even ones that are converted from Word to another format and then back. They will generally have an incredible number of continuous section breaks, among other things. This is because most other programs use margin changes where Word would use indents. When they are converted to Word, they are converted with margin changes. These days, conversion software does a passable job of creating documents that look the same. The underlying structure, though, is often radically different. I suspect in your document screen shot of Section 8 there is a margin change at each paragraph. Each margin change, in Word, requires a Section break (new section), usually continous. Each Section in Word will contain three headers and three footers (even if never used) and can contain pagination instructions. The header/footer shown would be from the first section on the page. I can reproduce the section numbering "jump" with continuous section breaks on a page. Sections / Headers and Footers in Microsoft Word 2007-2013 Last edited by Charles Kenyon; 12-18-2013 at 08:29 PM. |
#7
|
|||
|
|||
![]()
Like I stated, the object model does not permit non-sequential Section numbers. The number is a dynamic sequential count, not an assigned property. There has to be continuous sections. Charles I think you are right, if you count the indents from the top of the page at section 8, the top of the next page is "section 13". There are no "jumps", and no gaps.
|
#8
|
|||
|
|||
![]()
Sample attached.
Files were exported using acrobat x pro > save as > word document It does a terrible job, however it is a lot better than old versions of acrobat pro. Tables, pictures and text don't quite get thrown all around the place as much as they used to! It's a problem we've had for nearly 10 years now, as an education provider every couple of years we have to update our course textbooks which are pdf's. In the past I just exported small parts and worked on them. This year I've resigned myself to the task of getting everything into a good word format and then pdf as needed for printing. |
#9
|
||||
|
||||
![]()
Re your attachment and:
Quote:
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#10
|
|||
|
|||
![]()
As I've probably cited before, the adage is "being given a .pdf and having to produce a source file is like being given a bowl of chicken soup and being asked to produce the chicken".
If you're a provider with the right to edit the content of these .pdf documents, you ought to be able to request source files instead of having to make them yourself: much more efficient for you, and less worry (that things are being accidentally changed by a complicated conversion process) for the original author. This is how we prepare and deliver user manuals for our products, which vendors can then customise as required. |
#11
|
|||
|
|||
![]()
The original release of certain parts were only passed on in .pdf by the authors. Then the accumulated total document has been maintained and added to ad hoc. So nothing except the .pdf is correct these days.
That's why I'm setting about glueing the chicken back together! |
#12
|
|||
|
|||
![]() Quote:
Keep in mind that the pdf text may be the result of an OCR process. If it is, there will be errors in the typing as well as the other problems that come from converted documents. Start with a Word template with numbering based on Styles following the directions on Shauna Kelly's page: How to create numbered headings or outline numbering in Word 2007 and Word 2010. This may take some time to set up the first time but it will be time well spent. Use Styles for all of your formatting. That way, when you want to change how [an indented quotation] looks, you can change the look in one place and have it apply to everything. Read Tips for Understanding Styles in Word How styles in Microsoft Word cascade How to create a table of contents in Microsoft Word Understanding Styles in Microsoft Word Skimming these is OK, you want to be familiar with the ideas and concepts, though. A couple of hours skimming these up front may save you weeks of work later. |
#13
|
|||
|
|||
![]()
I'd agree with Charles there – bite the bullet, go the extra mile and produce a clean updated copy of the source. While you're at it, consider using the "insert and link" option for graphics; that way, if a diagram that's used several times changes, all you need to do is update the original diagram and refresh your .doc files. Your children – be they offpsring or pupils – and your children's children will thank you, and perhaps even your future colleagues might remember to :-} Good luck with that chicken!
If you have access to full Acrobat, try the "save as text" options; the benefit is, you end up with paragraphs rather than lines. If not, there are postings elsewhere in this forum about converting blocks of lines to single paragraphs. My approach to this type of conversion, once the paragraphs are sorted, is to use tagging in the plain-text file to mark the different styles I'll want to apply; then I open the tagged file in Word and use search/replace. This may make more sense when you've checked the references Charles supplied. ____________________ related question to contributors, incldued here since answer may help OP: because Word is not my primary tool, I've not had to trust any valuable content to linked graphics. What's your opinion/experience with it – stable and reliable? and why is the belt-and-braces "insert and link" option there? |
#14
|
|||
|
|||
![]()
Let's make it unanimous...start clean as plain-text. You may think it more front-end work (and it is), but long term it saves you a potential massive headache on the back end work flow. And for deity sake, use Styles!
|
#15
|
|||
|
|||
![]()
Thanks a lot for the suggestions and links - sounds like some fun time over the holidays!
As I said it is something that has been a pain for a long time now and I've decided to bite the bullet. I use styles and headings for content a little already. I want to try to index everything - most likely easy, just never done it before. I'll read across the links provided and no doubt have some questions along the way that the search function of this forum will answer! |
![]() |
Thread Tools | |
Display Modes | |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
![]() |
Pierre-Hugues | Word VBA | 1 | 08-30-2013 06:06 AM |
![]() |
Cmiller | Word | 5 | 08-20-2012 07:33 PM |
![]() |
avi_sai | Word | 1 | 12-03-2011 10:52 AM |
Different Header but same Footer | Karthick | Word | 1 | 11-12-2010 09:08 AM |
Header and footer questions | boutells | Word | 0 | 07-21-2009 01:57 AM |