Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 06-07-2021, 05:47 AM
CaptainCsaba CaptainCsaba is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2016
Novice
Converting the headers & footers into body
 
Join Date: Jun 2021
Posts: 4
CaptainCsaba is on a distinguished road
Default Converting the headers & footers into body

Hello,



I am converting PDF-s into docx format via Adobe Acrobat in an automated way with Python. I am then reading in the data from the word file to do some data analysis.

The conversion is almost perfect, my problem is that some parts get converted into headers and footers (whether they should or should not is not important). Headers and footers are not searchable with the methods I am using. My question is: Is there a way in Word to convert all current headers and footer into body text? Both manual Word settings, VBA codes or any sort of other solutions are appreciated.
Reply With Quote
  #2  
Old 06-07-2021, 08:31 AM
Charles Kenyon Charles Kenyon is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2019
Moderator
 
Join Date: Mar 2012
Location: Sun Prairie, Wisconsin
Posts: 9,140
Charles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant future
Default

There is nothing built into Word.
Although simple in concept, headers and footers can be very complex.
Each section in Word has three of each, independent of one another which may or may not be activated by section or document settings.
A single page can have multiple sections.
Here is my Header/Footer Settings Recap.


Simple suggestion for manual work.
Open the document in Word.
Edit the Header/footer.
Copy the header/footer.
Paste into the body of your document on one page.
Reply With Quote
  #3  
Old 06-07-2021, 03:46 PM
Guessed's Avatar
Guessed Guessed is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2016
Expert
 
Join Date: Mar 2010
Location: Canberra/Melbourne Australia
Posts: 3,977
Guessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant future
Default

Acrobat conversions to Word typically put 'header' and 'footer' information into the body of the document anyway in my experience. This is because the structure of an acrobat document doesn't usually contain a physical header or footer - it is just more unstructured content on the page.

Perhaps you could explore the conversion options of your PDF>DOCX conversion to see why content is told to go into a header/footer.

I'm also unsure why you think it is a good idea to convert to Word when your data analysis could also be done by processing the PDF file - why change formats when Acrobat is also programmable?
__________________
Andrew Lockton
Chrysalis Design, Melbourne Australia
Reply With Quote
  #4  
Old 06-08-2021, 06:54 AM
CaptainCsaba CaptainCsaba is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2016
Novice
Converting the headers & footers into body
 
Join Date: Jun 2021
Posts: 4
CaptainCsaba is on a distinguished road
Default

I need to collect the main headers from the document, convert it to word and add bookmarks to those headers in an automatic way. For this purpose it's better to do the mining in the word format instead of the pdf, as the data is much more structured and further information regarding the text is available is an easier manner which I need (color, font, size etc). Unfortunately there are no settings in Adobe Acrobat which change the headers. Only some very basic changes are available: Include Comments, include Images, Recognize text where needed and Retain Flowing Text / Retain Page Layout.
Reply With Quote
  #5  
Old 06-08-2021, 08:18 AM
Charles Kenyon Charles Kenyon is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2019
Moderator
 
Join Date: Mar 2012
Location: Sun Prairie, Wisconsin
Posts: 9,140
Charles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant future
Default

Are you opening the pdf's in Word? Or, are you using some other conversion method.
Reply With Quote
  #6  
Old 06-08-2021, 09:02 AM
CaptainCsaba CaptainCsaba is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2016
Novice
Converting the headers & footers into body
 
Join Date: Jun 2021
Posts: 4
CaptainCsaba is on a distinguished road
Default

I am converting pdf-s into docx files with Adobe acrobat.
Reply With Quote
  #7  
Old 06-08-2021, 10:39 AM
Charles Kenyon Charles Kenyon is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2019
Moderator
 
Join Date: Mar 2012
Location: Sun Prairie, Wisconsin
Posts: 9,140
Charles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant futureCharles Kenyon has a brilliant future
Default

Try opening directly in Word and using Word to convert one.
Reply With Quote
  #8  
Old 06-08-2021, 02:49 PM
Guessed's Avatar
Guessed Guessed is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2016
Expert
 
Join Date: Mar 2010
Location: Canberra/Melbourne Australia
Posts: 3,977
Guessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant future
Default

I don't know how to convert VBA code to Python but this should give you an idea on how you could copy the headers and footers into the body of the document.
Code:
Sub HFExtractor()
  Dim aSect As Section, aHF As HeaderFooter, aRng As Range
  For Each aSect In ActiveDocument.Sections
    For Each aHF In aSect.Headers
      Set aRng = aSect.Range
      aRng.Collapse Direction:=wdCollapseEnd
      aRng.FormattedText = aHF.Range.FormattedText
    Next aHF
    For Each aHF In aSect.Footers
      Set aRng = aSect.Range
      aRng.Collapse Direction:=wdCollapseEnd
      aRng.FormattedText = aHF.Range.FormattedText
    Next aHF
  Next aSect
End Sub
__________________
Andrew Lockton
Chrysalis Design, Melbourne Australia
Reply With Quote
  #9  
Old 06-08-2021, 11:21 PM
CaptainCsaba CaptainCsaba is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2016
Novice
Converting the headers & footers into body
 
Join Date: Jun 2021
Posts: 4
CaptainCsaba is on a distinguished road
Default

Quote:
Originally Posted by Charles Kenyon View Post
Try opening directly in Word and using Word to convert one.
Unfortunately it has the same result.

Quote:
Originally Posted by Guessed View Post
I don't know how to convert VBA code to Python but this should give you an idea on how you could copy the headers and footers into the body of the document.
Code:
Sub HFExtractor()
  Dim aSect As Section, aHF As HeaderFooter, aRng As Range
  For Each aSect In ActiveDocument.Sections
    For Each aHF In aSect.Headers
      Set aRng = aSect.Range
      aRng.Collapse Direction:=wdCollapseEnd
      aRng.FormattedText = aHF.Range.FormattedText
    Next aHF
    For Each aHF In aSect.Footers
      Set aRng = aSect.Range
      aRng.Collapse Direction:=wdCollapseEnd
      aRng.FormattedText = aHF.Range.FormattedText
    Next aHF
  Next aSect
End Sub
I can call VBA codes from Python so they are definitely useful. I ran your code and it does extract headers just as needed. Only problem is that they all get added in bulk at the beginning of the next section. I need to retain their original position as much as possible (at least so they come after the text the come after in the PDF).

Might be a stupid idea as I am unfamiliar with how Word works, but can we for example get the x,y coordinates of these paragraphs and insert text "above them". Or something similar so we can keep the position as closely as possible?
Reply With Quote
  #10  
Old 06-09-2021, 01:18 AM
Guessed's Avatar
Guessed Guessed is offline Converting the headers & footers into body Windows 10 Converting the headers & footers into body Office 2016
Expert
 
Join Date: Mar 2010
Location: Canberra/Melbourne Australia
Posts: 3,977
Guessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant futureGuessed has a brilliant future
Default

The way Word works does not lend itself to extracting a header to place on the page content. Word is made up of containers for content and in a well designed Word document, the SAME header can appear on hundreds of pages with only minor differences (such as page number incrementing or StyleRef fields updating). I can't understand what your requirement might be to make you think you need to extract and repeat this information repetitively.

The 'page' in Word is not a fixed structure. The formatting and content in the body of the document when constrained by the page size and setup determines what appears on each page. Adding content earlier, pushes subsequent content down the page and potentially onto the next page. The headers/footers are associated with a section which might contain a paragraph, lots of paragraphs or the entire document - but they don't directly relate to 'pages' at all.

Perhaps you need to post a sample document and describe why you think this is necessary. If we could understand what it is you need to achieve, I would think we could suggest a better way of achieving it.

I'm still thinking that Word is not the best tool for this. Its concept of flowing content doesn't fit with the static pages that headers align with. I just watched a video on using LibreOffice Draw to open and edit a PDF. I think this might be a better fit for your requirements as it appears to be scriptable with Python and should retain the page content more accurately.
__________________
Andrew Lockton
Chrysalis Design, Melbourne Australia

Last edited by Guessed; 06-09-2021 at 04:42 AM.
Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting the headers & footers into body headers/footers scot Word 3 05-22-2015 09:45 AM
Converting the headers & footers into body Headers and Footers Kingsmoss Word 3 04-28-2014 02:43 PM
Converting the headers & footers into body Odd and Even Headers/Footers sarineochaos Word 1 02-04-2014 06:15 PM
Converting the headers & footers into body Headers and Footers teza2k06 Word 1 05-14-2013 11:07 AM
Headers and Footers OverAchiever13 Word 1 05-27-2010 01:30 PM

Other Forums: Access Forums

All times are GMT -7. The time now is 02:05 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft