Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 05-07-2020, 09:23 PM
Cendrinne's Avatar
Cendrinne Cendrinne is offline Extract Text from textboxes in converted PDFs Windows 10 Extract Text from textboxes in converted PDFs Office 2013
Competent Performer
Extract Text from textboxes in converted PDFs
 
Join Date: Aug 2019
Location: Montreal Quebec Canada
Posts: 190
Cendrinne is on a distinguished road
Question Extract Text from textboxes in converted PDFs

Hello, I've used the script in post #6 @ https://www.msofficeforums.com/word-...nd-shapes.html for extracting Text from various textboxes / shapes with text, etc. I have three problems, that I can't figure out how to fix it.


1- How to extract Text from Textbox in headers?
2- Even if I copy paste the text box into the body of the document, If there is a picture of a line in it, I get an error message.


It bugs here:
If Len(Trim(.TextFrame.TextRange.Text)) > 1 Then
3- What if the boxes are not inline. For instance, I have at times boxes that are Wrapping: ''Behind Text''
Any hint???
Reply With Quote
  #2  
Old 05-07-2020, 10:43 PM
macropod's Avatar
macropod macropod is offline Extract Text from textboxes in converted PDFs Windows 7 64bit Extract Text from textboxes in converted PDFs Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,962
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Try:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim i As Long, StryRng As Range, Rng As Range, StrType
For Each StryRng In ActiveDocument.StoryRanges
  For i = StryRng.ShapeRange.Count To 1 Step -1
    With StryRng.ShapeRange(i)
      If Not .TextFrame Is Nothing Then
        On Error GoTo SkipShp
        If .TextFrame.HasText = True Then
          Select Case .Type
            Case msoAutoShape: StrType = "AutoShape"
            Case msoCallout: StrType = "Callout"
            Case msoCanvas: StrType = "Canvas"
            Case msoChart: StrType = "Chart"
            Case msoComment: StrType = "Comment"
            Case msoDiagram: StrType = "Diagram"
            Case msoEmbeddedOLEObject: StrType = "EmbeddedOLEObject"
            Case msoFormControl: StrType = "FormControl"
            Case msoFreeform: StrType = "Freeform"
            Case msoGroup: StrType = "Group"
            Case msoInk: StrType = "Ink"
            Case msoInkComment: StrType = "InkComment"
            Case msoLine: StrType = "Line"
            Case msoLinkedOLEObject: StrType = "LinkedOLEObject"
            Case msoLinkedPicture: StrType = "LinkedPicture"
            Case msoMedia: StrType = "Media"
            Case msoOLEControlObject: StrType = "OLEControlObject"
            Case msoPicture: StrType = "Picture"
            Case msoPlaceholder: StrType = "Placeholder"
            Case msoScriptAnchor: StrType = "ScriptAnchor"
            Case msoShapeTypeMixed: StrType = "ShapeTypeMixed"
            Case msoTable: StrType = "Table"
            Case msoTextBox: StrType = "TextBox"
            Case msoTextEffect: StrType = "TextEffect"
          End Select
          Set Rng = .Anchor
          With Rng
            .InsertBefore StrType & " start << "
            .Collapse wdCollapseEnd
            .InsertAfter " >> end " & StrType
            .Collapse wdCollapseStart
          End With
          Rng.FormattedText = .TextFrame.TextRange.FormattedText
          .Delete
        End If
SkipShp:
        On Error GoTo 0
      End If
    End With
  Next
  For i = StryRng.InlineShapes.Count To 1 Step -1
    With StryRng.InlineShapes(i)
      If Not .TextEffect Is Nothing Then
        On Error GoTo SkipiShp
        If Len(Trim(.TextEffect.Text)) > 1 Then
          Select Case .Type
            Case wdInlineShapeChart: StrType = "InlineChart"
            Case wdInlineShapeDiagram: StrType = "InlineDiagram"
            Case wdInlineShapeEmbeddedOLEObject: StrType = "InlineEmbeddedOLEObject"
            Case wdInlineShapeHorizontalLine: StrType = "InlineHorizontalLine"
            Case wdInlineShapeLinkedOLEObject: StrType = "InlineLinkedOLEObject"
            Case wdInlineShapeLinkedPicture: StrType = "InlineLinkedPicture"
            Case wdInlineShapeLinkedPictureHorizontalLine: StrType = "InlineShapeLinkedPictureHorizontalLine"
            Case wdInlineShapeLockedCanvas: StrType = "InlineLockedCanvas"
            Case wdInlineShapeOLEControlObject: StrType = "InlineOLEControlObject"
            Case wdInlineShapeOWSAnchor: StrType = "InlineOWSAnchor"
            Case wdInlineShapePicture: StrType = "InlinePicture"
            Case wdInlineShapePictureBullet: StrType = "InlinePictureBullet"
            Case wdInlineShapePictureHorizontalLine: StrType = "InlinePictureHorizontalLine"
            Case msoLinkedOLEObject: StrType = "LinkedOLEObject"
            Case wdInlineShapeScriptAnchor: StrType = "InlineScriptAnchor"
          End Select
          Set Rng = .Range
          With Rng
            .Collapse wdCollapseStart
            .InsertBefore StrType & " start << "
            .Collapse wdCollapseEnd
            .InsertAfter " >> end " & StrType
            .Collapse wdCollapseStart
          End With
          Rng.Text = .TextEffect.Text
          .Delete
        End If
SkipiShp:
        On Error GoTo 0
      End If
    End With
  Next
Next
Application.ScreenUpdating = True
End Sub
The code processes inline and floating shapes - the latter regardless of whether they're positioned behind text (as does the code in post #6) - but also process content anywhere in the document.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #3  
Old 05-12-2020, 07:33 PM
Cendrinne's Avatar
Cendrinne Cendrinne is offline Extract Text from textboxes in converted PDFs Windows 10 Extract Text from textboxes in converted PDFs Office 2013
Competent Performer
Extract Text from textboxes in converted PDFs
 
Join Date: Aug 2019
Location: Montreal Quebec Canada
Posts: 190
Cendrinne is on a distinguished road
Default OMG you are brilliant. Fix it. Thank you so much....

Thank you Paul, you don't know how much time I've spent trying to figure that issue. OMG, you are a brilliant person.


i'll try to analyze the difference between the two scripts to learn and understand. But how can I understand more in dept in Word VBA programming?


Been trying, as god is my withness, I've been trying. I've created over 100's of macro's which I've used on the ribbon, to help me, but my programming is a novice programming.


This is my typical Find and Replace programming (as a novice):


Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^13"
.Replacement.Text = "^p"
.Forward = False
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWildcards = False
End With
Selection.Find.Execute Replace:=wdReplaceAll


In the undo's, I do see often ==> VBA-Find.Execute2007, which tells me I'm programming old style. LOL


Any advice, I will be so ever in your debt. But Thank so much for fixing that script


Cheers
Reply With Quote
  #4  
Old 05-12-2020, 07:53 PM
Cendrinne's Avatar
Cendrinne Cendrinne is offline Extract Text from textboxes in converted PDFs Windows 10 Extract Text from textboxes in converted PDFs Office 2013
Competent Performer
Extract Text from textboxes in converted PDFs
 
Join Date: Aug 2019
Location: Montreal Quebec Canada
Posts: 190
Cendrinne is on a distinguished road
Default macropod, I've tried it to the whole document, doesn't work

Hello, macropod, I feel we are so close.


If you take a financial document which are in PDF, then convert them to a Word document, you might find there will be many TextBoxes in headers and footers.


There is primary page and following which are often written as (continued).
I'm not interested for the following pages, only the primary pages.


You're recent script works for copy pasted a few primary pages into a new word document. Now the documents I'm having to deal with could be 50 pages or more, which have many primary headers.


I'll try as well to modify it, but I might be needing help. Could you hint me where to find the info?


Cendrinne
Reply With Quote
  #5  
Old 05-12-2020, 08:04 PM
macropod's Avatar
macropod macropod is offline Extract Text from textboxes in converted PDFs Windows 7 64bit Extract Text from textboxes in converted PDFs Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,962
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by Cendrinne View Post
But how can I understand more in dept in Word VBA programming?
There are doubtless some good books and tutorials around but, since I don't use any of that stuff, I can't recommend any. All my VBA expertise is self-taught, though studying code that others have posted on different forums over the years has been a great help, too.
Quote:
Originally Posted by Cendrinne View Post
This is my typical Find and Replace programming (as a novice):


Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^13"
.Replacement.Text = "^p"
.Forward = False
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWildcards = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Yes, that's typical macro-recorder code. The macro recorder's not much smarter than a box of rocks. For an idea of what's possible with Find/Replace coding, see: https://www.msofficeforums.com/140662-post2.html
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #6  
Old 05-12-2020, 08:07 PM
macropod's Avatar
macropod macropod is offline Extract Text from textboxes in converted PDFs Windows 7 64bit Extract Text from textboxes in converted PDFs Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,962
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by Cendrinne View Post
If you take a financial document which are in PDF, then convert them to a Word document, you might find there will be many TextBoxes in headers and footers.


There is primary page and following which are often written as (continued).
I'm not interested for the following pages, only the primary pages.
That requires a quite different approach. It would have been helpful if you had said what your aim was up front. Besides which, documents converted from PDFs typically have only a primary header, if any.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #7  
Old 05-12-2020, 08:14 PM
Cendrinne's Avatar
Cendrinne Cendrinne is offline Extract Text from textboxes in converted PDFs Windows 10 Extract Text from textboxes in converted PDFs Office 2013
Competent Performer
Extract Text from textboxes in converted PDFs
 
Join Date: Aug 2019
Location: Montreal Quebec Canada
Posts: 190
Cendrinne is on a distinguished road
Default sorry, I'm a novice, so I thought it would have fix all headers

I'm so sorry, Paul. I didn't want to lead you astray, since I don't know everything about programming, I figure it would have resolve the issue. I'll try to think about my end game next time.


Please accept my appology


But thanks again.


I am an analytical person, so I guess with time, I might get it too. Now it's been 3-4 years I've been programming but again, as a novice


Cheers
Reply With Quote
  #8  
Old 05-12-2020, 08:27 PM
Cendrinne's Avatar
Cendrinne Cendrinne is offline Extract Text from textboxes in converted PDFs Windows 10 Extract Text from textboxes in converted PDFs Office 2013
Competent Performer
Extract Text from textboxes in converted PDFs
 
Join Date: Aug 2019
Location: Montreal Quebec Canada
Posts: 190
Cendrinne is on a distinguished road
Default Thank you. When I have more time, I'll take a look :)

Very sweet of you to guide me with script to analyze to understand


Cendrinne


Quote:
Originally Posted by macropod View Post
There are doubtless some good books and tutorials around but, since I don't use any of that stuff, I can't recommend any. All my VBA expertise is self-taught, though studying code that others have posted on different forums over the years has been a great help, too.

Yes, that's typical macro-recorder code. The macro recorder's not much smarter than a box of rocks. For an idea of what's possible with Find/Replace coding, see: https://www.msofficeforums.com/140662-post2.html
Reply With Quote
  #9  
Old 05-12-2020, 08:39 PM
macropod's Avatar
macropod macropod is offline Extract Text from textboxes in converted PDFs Windows 7 64bit Extract Text from textboxes in converted PDFs Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,962
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

So what are you calling primary headers?
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #10  
Old 05-12-2020, 09:50 PM
Cendrinne's Avatar
Cendrinne Cendrinne is offline Extract Text from textboxes in converted PDFs Windows 10 Extract Text from textboxes in converted PDFs Office 2013
Competent Performer
Extract Text from textboxes in converted PDFs
 
Join Date: Aug 2019
Location: Montreal Quebec Canada
Posts: 190
Cendrinne is on a distinguished road
Default

Hello Paul, was trying to find a way to show a picture to show. I don't know how to show you without having a web link. Anyway.


Well whenever I get TXTBOXES in headers, especially when PDF is converted to Word, and I see headers it it, I get the header as with a line spacing of Multiples of 0.06 99% of the time. The primary I can't really explain it since I don't fully understand it. But I was told there are different types of headers. 404 - Content Not Found | Microsoft Docs)


Create headers and footers of all three types - VBA Visual Basic for Applications (Microsoft) - Tek-Tips
Word Layout - Headers & Footers


I've join a 3 links that talks about it.


I have so many sections, I'm trying to extract all headers that are in text boxes to document. But only the ones that are not duplicates, ahhh OK now I think I know how to explain it. No link to the preceding section, cause the first page of a section is the main or primary page. Am I making sense?


Cendrinne

Last edited by Cendrinne; 05-13-2020 at 08:21 AM.
Reply With Quote
  #11  
Old 05-12-2020, 10:58 PM
macropod's Avatar
macropod macropod is offline Extract Text from textboxes in converted PDFs Windows 7 64bit Extract Text from textboxes in converted PDFs Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,962
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by Cendrinne View Post
Hello Paul, was trying to find a way to show a picture to show. I don't know how to show you without having a web link. Anyway.
You can attach images to posts here. You do that via the paperclip symbol on the 'Go Advanced' tab at the bottom of this screen.
Quote:
Originally Posted by Cendrinne View Post
But I was told there are different types of headers.
Yes, Word has three header (and footer) types:
• Primary - wdHeaderFooterPrimary
• First Page - wdHeaderFooterFirstPage
• Even Page - wdHeaderFooterEvenPages
and each one can exist in every Section in a document. But, other than the Primary one (which must exist in every Section), the First Page and Even Page headers (and footers) aren't necessarily used in any given Section. Whether your document uses all of them in every Section really depends on how the page layout is configured. Plus, the Primary, First Page and Even Page headers (and footers) for Sections 2 and later can be linked to the corresponding header (or footer) in the preceding Section.
Quote:
Originally Posted by Cendrinne View Post
I have so many sections, I'm trying to extract all headers that are in text boxes to document. But only the ones that are not duplicates, ahhh OK know I think I know how to explain it. No link to the preceding section, cause the first page of a section is the main or primary page. Am I making sense?
OK, so you only want the header content from the first Section. But which of the three header types does that Section use?
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #12  
Old 05-13-2020, 10:14 PM
Cendrinne's Avatar
Cendrinne Cendrinne is offline Extract Text from textboxes in converted PDFs Windows 10 Extract Text from textboxes in converted PDFs Office 2013
Competent Performer
Extract Text from textboxes in converted PDFs
 
Join Date: Aug 2019
Location: Montreal Quebec Canada
Posts: 190
Cendrinne is on a distinguished road
Default I'll get back to you shortly. Been busy with work at home

I'll have more time on Friday. Get back to you, Paul
Reply With Quote
  #13  
Old 05-15-2020, 08:08 PM
Cendrinne's Avatar
Cendrinne Cendrinne is offline Extract Text from textboxes in converted PDFs Windows 10 Extract Text from textboxes in converted PDFs Office 2013
Competent Performer
Extract Text from textboxes in converted PDFs
 
Join Date: Aug 2019
Location: Montreal Quebec Canada
Posts: 190
Cendrinne is on a distinguished road
Red face Request help for Textboxes in Primary headers...

Hello Paul,
From the 3 examples, the text boxes are often coming from either the two bullets below:


Word has three header (and footer) types:
• Primary - wdHeaderFooterPrimary
• First Page - wdHeaderFooterFirstPage


I could either get a combination, in the same document, a check mark to first page is different and some sections, no check marks to first page is different.


So I'm not sure if a macro could be written with all of these factors.


Need to extract all Text from Text Boxes in headers. Hopefully, they will also keep their text forat (color, size, style). Just a way to remove the boxes.


Think it's doable?


Cendrinne
Reply With Quote
  #14  
Old 05-15-2020, 09:03 PM
macropod's Avatar
macropod macropod is offline Extract Text from textboxes in converted PDFs Windows 7 64bit Extract Text from textboxes in converted PDFs Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,962
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

The code in post #15 already does all of that extraction - and more. So what is the problem?
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #15  
Old 05-15-2020, 09:32 PM
Cendrinne's Avatar
Cendrinne Cendrinne is offline Extract Text from textboxes in converted PDFs Windows 10 Extract Text from textboxes in converted PDFs Office 2013
Competent Performer
Extract Text from textboxes in converted PDFs
 
Join Date: Aug 2019
Location: Montreal Quebec Canada
Posts: 190
Cendrinne is on a distinguished road
Default

I'll try it again #15 but on the large document of 174 pages, where there are many headers, and lot's of textboxes with text in those headers, it didn't work the last time. Let me try it again


Cendrinene
Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Text in two textboxes guest_gast PowerPoint 5 03-26-2018 01:21 PM
Extract Text from textboxes in converted PDFs Format Multiple Textboxes Based On the Contents of One Text Box dmcgettigan Word VBA 1 02-27-2017 08:50 PM
Extract Text from textboxes in converted PDFs Replace text of textboxes tng Word VBA 1 12-22-2013 05:23 PM
My plain text post got converted to rich text in a reply, how to convert it back? david.karr Outlook 0 01-05-2012 09:46 AM
Extract Text from textboxes in converted PDFs Incoming Mail Converted to Text luke1438 Outlook 4 03-13-2011 07:47 AM

Other Forums: Access Forums

All times are GMT -7. The time now is 08:47 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft