#1
|
||||
|
||||
Extract Text from textboxes in converted PDFs
Hello, I've used the script in post #6 @ https://www.msofficeforums.com/word-...nd-shapes.html for extracting Text from various textboxes / shapes with text, etc. I have three problems, that I can't figure out how to fix it. 1- How to extract Text from Textbox in headers? 2- Even if I copy paste the text box into the body of the document, If there is a picture of a line in it, I get an error message. It bugs here: If Len(Trim(.TextFrame.TextRange.Text)) > 1 Then 3- What if the boxes are not inline. For instance, I have at times boxes that are Wrapping: ''Behind Text'' Any hint??? |
#2
|
||||
|
||||
Try:
Code:
Sub Demo() Application.ScreenUpdating = False Dim i As Long, StryRng As Range, Rng As Range, StrType For Each StryRng In ActiveDocument.StoryRanges For i = StryRng.ShapeRange.Count To 1 Step -1 With StryRng.ShapeRange(i) If Not .TextFrame Is Nothing Then On Error GoTo SkipShp If .TextFrame.HasText = True Then Select Case .Type Case msoAutoShape: StrType = "AutoShape" Case msoCallout: StrType = "Callout" Case msoCanvas: StrType = "Canvas" Case msoChart: StrType = "Chart" Case msoComment: StrType = "Comment" Case msoDiagram: StrType = "Diagram" Case msoEmbeddedOLEObject: StrType = "EmbeddedOLEObject" Case msoFormControl: StrType = "FormControl" Case msoFreeform: StrType = "Freeform" Case msoGroup: StrType = "Group" Case msoInk: StrType = "Ink" Case msoInkComment: StrType = "InkComment" Case msoLine: StrType = "Line" Case msoLinkedOLEObject: StrType = "LinkedOLEObject" Case msoLinkedPicture: StrType = "LinkedPicture" Case msoMedia: StrType = "Media" Case msoOLEControlObject: StrType = "OLEControlObject" Case msoPicture: StrType = "Picture" Case msoPlaceholder: StrType = "Placeholder" Case msoScriptAnchor: StrType = "ScriptAnchor" Case msoShapeTypeMixed: StrType = "ShapeTypeMixed" Case msoTable: StrType = "Table" Case msoTextBox: StrType = "TextBox" Case msoTextEffect: StrType = "TextEffect" End Select Set Rng = .Anchor With Rng .InsertBefore StrType & " start << " .Collapse wdCollapseEnd .InsertAfter " >> end " & StrType .Collapse wdCollapseStart End With Rng.FormattedText = .TextFrame.TextRange.FormattedText .Delete End If SkipShp: On Error GoTo 0 End If End With Next For i = StryRng.InlineShapes.Count To 1 Step -1 With StryRng.InlineShapes(i) If Not .TextEffect Is Nothing Then On Error GoTo SkipiShp If Len(Trim(.TextEffect.Text)) > 1 Then Select Case .Type Case wdInlineShapeChart: StrType = "InlineChart" Case wdInlineShapeDiagram: StrType = "InlineDiagram" Case wdInlineShapeEmbeddedOLEObject: StrType = "InlineEmbeddedOLEObject" Case wdInlineShapeHorizontalLine: StrType = "InlineHorizontalLine" Case wdInlineShapeLinkedOLEObject: StrType = "InlineLinkedOLEObject" Case wdInlineShapeLinkedPicture: StrType = "InlineLinkedPicture" Case wdInlineShapeLinkedPictureHorizontalLine: StrType = "InlineShapeLinkedPictureHorizontalLine" Case wdInlineShapeLockedCanvas: StrType = "InlineLockedCanvas" Case wdInlineShapeOLEControlObject: StrType = "InlineOLEControlObject" Case wdInlineShapeOWSAnchor: StrType = "InlineOWSAnchor" Case wdInlineShapePicture: StrType = "InlinePicture" Case wdInlineShapePictureBullet: StrType = "InlinePictureBullet" Case wdInlineShapePictureHorizontalLine: StrType = "InlinePictureHorizontalLine" Case msoLinkedOLEObject: StrType = "LinkedOLEObject" Case wdInlineShapeScriptAnchor: StrType = "InlineScriptAnchor" End Select Set Rng = .Range With Rng .Collapse wdCollapseStart .InsertBefore StrType & " start << " .Collapse wdCollapseEnd .InsertAfter " >> end " & StrType .Collapse wdCollapseStart End With Rng.Text = .TextEffect.Text .Delete End If SkipiShp: On Error GoTo 0 End If End With Next Next Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#3
|
||||
|
||||
OMG you are brilliant. Fix it. Thank you so much....
Thank you Paul, you don't know how much time I've spent trying to figure that issue. OMG, you are a brilliant person.
i'll try to analyze the difference between the two scripts to learn and understand. But how can I understand more in dept in Word VBA programming? Been trying, as god is my withness, I've been trying. I've created over 100's of macro's which I've used on the ribbon, to help me, but my programming is a novice programming. This is my typical Find and Replace programming (as a novice): Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "^13" .Replacement.Text = "^p" .Forward = False .Wrap = wdFindStop .Format = False .MatchCase = False .MatchWildcards = False End With Selection.Find.Execute Replace:=wdReplaceAll In the undo's, I do see often ==> VBA-Find.Execute2007, which tells me I'm programming old style. LOL Any advice, I will be so ever in your debt. But Thank so much for fixing that script Cheers |
#4
|
||||
|
||||
macropod, I've tried it to the whole document, doesn't work
Hello, macropod, I feel we are so close.
If you take a financial document which are in PDF, then convert them to a Word document, you might find there will be many TextBoxes in headers and footers. There is primary page and following which are often written as (continued). I'm not interested for the following pages, only the primary pages. You're recent script works for copy pasted a few primary pages into a new word document. Now the documents I'm having to deal with could be 50 pages or more, which have many primary headers. I'll try as well to modify it, but I might be needing help. Could you hint me where to find the info? Cendrinne |
#5
|
||||
|
||||
Quote:
Quote:
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#6
|
||||
|
||||
Quote:
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#7
|
||||
|
||||
sorry, I'm a novice, so I thought it would have fix all headers
I'm so sorry, Paul. I didn't want to lead you astray, since I don't know everything about programming, I figure it would have resolve the issue. I'll try to think about my end game next time.
Please accept my appology But thanks again. I am an analytical person, so I guess with time, I might get it too. Now it's been 3-4 years I've been programming but again, as a novice Cheers |
#8
|
||||
|
||||
Thank you. When I have more time, I'll take a look :)
Very sweet of you to guide me with script to analyze to understand
Cendrinne Quote:
|
#9
|
||||
|
||||
So what are you calling primary headers?
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#10
|
||||
|
||||
Hello Paul, was trying to find a way to show a picture to show. I don't know how to show you without having a web link. Anyway.
Well whenever I get TXTBOXES in headers, especially when PDF is converted to Word, and I see headers it it, I get the header as with a line spacing of Multiples of 0.06 99% of the time. The primary I can't really explain it since I don't fully understand it. But I was told there are different types of headers. 404 - Content Not Found | Microsoft Docs) Create headers and footers of all three types - VBA Visual Basic for Applications (Microsoft) - Tek-Tips Word Layout - Headers & Footers I've join a 3 links that talks about it. I have so many sections, I'm trying to extract all headers that are in text boxes to document. But only the ones that are not duplicates, ahhh OK now I think I know how to explain it. No link to the preceding section, cause the first page of a section is the main or primary page. Am I making sense? Cendrinne Last edited by Cendrinne; 05-13-2020 at 08:21 AM. |
#11
|
||||
|
||||
Quote:
Yes, Word has three header (and footer) types: • Primary - wdHeaderFooterPrimary • First Page - wdHeaderFooterFirstPage • Even Page - wdHeaderFooterEvenPages and each one can exist in every Section in a document. But, other than the Primary one (which must exist in every Section), the First Page and Even Page headers (and footers) aren't necessarily used in any given Section. Whether your document uses all of them in every Section really depends on how the page layout is configured. Plus, the Primary, First Page and Even Page headers (and footers) for Sections 2 and later can be linked to the corresponding header (or footer) in the preceding Section. Quote:
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#12
|
||||
|
||||
I'll get back to you shortly. Been busy with work at home
I'll have more time on Friday. Get back to you, Paul
|
#13
|
||||
|
||||
Request help for Textboxes in Primary headers...
Hello Paul,
From the 3 examples, the text boxes are often coming from either the two bullets below: Word has three header (and footer) types: • Primary - wdHeaderFooterPrimary • First Page - wdHeaderFooterFirstPage I could either get a combination, in the same document, a check mark to first page is different and some sections, no check marks to first page is different. So I'm not sure if a macro could be written with all of these factors. Need to extract all Text from Text Boxes in headers. Hopefully, they will also keep their text forat (color, size, style). Just a way to remove the boxes. Think it's doable? Cendrinne |
#14
|
||||
|
||||
The code in post #15 already does all of that extraction - and more. So what is the problem?
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#15
|
||||
|
||||
I'll try it again #15 but on the large document of 174 pages, where there are many headers, and lot's of textboxes with text in those headers, it didn't work the last time. Let me try it again
Cendrinene |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Text in two textboxes | guest_gast | PowerPoint | 5 | 03-26-2018 01:21 PM |
Format Multiple Textboxes Based On the Contents of One Text Box | dmcgettigan | Word VBA | 1 | 02-27-2017 08:50 PM |
Replace text of textboxes | tng | Word VBA | 1 | 12-22-2013 05:23 PM |
My plain text post got converted to rich text in a reply, how to convert it back? | david.karr | Outlook | 0 | 01-05-2012 09:46 AM |
Incoming Mail Converted to Text | luke1438 | Outlook | 4 | 03-13-2011 07:47 AM |