#1
|
|||
|
|||
Possibility to copy and paste plain texts with image?
I'm using adobe acrobat to convert a pdf into a word doc, but that doc has its own styles that if I directly copy and paste all the contents into my custom template, it'll flush out my custom styles in that template. As a compromise, I currently just copy and paste the texts as plain texts into my template and manually insert individual images into the doc. Is there a way to copy and paste texts as plain texts but with images such that the order of them is maintained through VBA code? My current code below can only copy and paste the images without changing my template, but it even sometimes also skips some images that I don't know why. I also attached images and files at the very end as examples for clarification.
Code:
Sub CopyPasteToTemplate() Dim InputDoc, OutputDoc As word.Document Dim paragraph As word.paragraph Dim shape As word.InlineShape Application.ScreenUpdating = False Set InputDoc = Application.ActiveDocument Set OutputDoc = Documents.Add(Template:="XXX\Templates\Normal.dotm") For Each paragraph In InputDoc.Paragraphs For Each shape In paragraph.Range.InlineShapes If shape.Type = wdInlineShapePicture Then 'Copy and paste the images without formating shape.Select Selection.Copy With OutputDoc.Content .InsertParagraphAfter .Paragraphs.Last.Range.Paste End With End If Next ' paragraph.Range.Copy ' With OutputDoc.Content ' .InsertParagraphAfter ' .Paragraphs.Last.Range.PasteSpecial wdFormatPlainText 'Sometimes gives an error ' End With Next OutputDoc.Activate Application.ScreenUpdating = True End Sub Edit: What I mean by "flushing the styles" is simply that before the pasting, my template and its instance doc have the styles as the left screenshot below, but after pasting in the contents, the styles section becomes the right screenshot below. In the attached files, "acrobat export.docx" is the example docx output by acrobat, "direct copy paste.docx" is the resulting doc if I simply ctrl + A and ctrl + C to copy all contents in "acrobat export.docx" and paste in my custom dotm. "manually processed.docx" is what I want the copy and paste into my template to look like. You can see that it's quite different from the previous 2 docx, including the page size/orientation and headers etc--these are the same as my dotm template. Currently, I can only make it by copying and pasting "acrobat export.docx" as plain texts into my template and manually inserting the images with a reference to the "acrobat export.docx" file. Is there a way to iterate through all the objects (or paragraphs?) in the "acrobat export.docx" and judge... - If text, copy and paste plain text into "code processed.docx" - If bullet list, copy and paste plain text into "code processed.docx" with that "·" in the front. Same for the numbered list to preserve the numbering - If image (no matter anchored or not), copy and paste the image as it is into "code processed.docx" - If text box, extract the content, copy, and paste plain text into "code processed.docx" - Ignore all other cases I just want to maintain the order of texts and images in a cleaner way. Last edited by puff; 01-11-2023 at 07:00 PM. |
#2
|
||||
|
||||
I'm not sure of your 'flush out my custom styles' statement but you are trying to use a sledgehammer to crack an egg by stepping through a paragraph at a time. It is no surprise that you are missing graphics because you only select inline shapes and then further restrict those to one specific type.
I would prep the source file by applying 'Normal' style to the entire content and removing character and paragraph local formatting and then pasting that cleaned content across into the target document. Code:
Sub CopyPasteToTemplate() Dim InputDoc As Word.Document, OutputDoc As Word.Document Set InputDoc = ActiveDocument Set OutputDoc = Documents.Add 'Normal template by default With InputDoc.Range .Style = wdStyleNormal .ParagraphFormat.Reset .Font.Reset OutputDoc.Range.FormattedText = .FormattedText End With OutputDoc.Activate End Sub
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#3
|
|||
|
|||
Thanks for the suggestion and sorry for the confusion. I have added example files and screenshots to make my points clearer.
I tried your code, but the pasting in still brings in new styles into my template and the paper size and header/footer of the template will also be changed. |
#4
|
||||
|
||||
The Paper Size and Header/Footers will only change if there are section breaks included in what you are bringing across. So you can clean the content by deleting section breaks before the transfer to avoid that issue. The Style list reset is repaired by refreshing the styles from the Normal template.
Try this version of the macro Code:
Sub CopyPasteToTemplate() Dim InputDoc As Word.Document, OutputDoc As Word.Document, i As Integer Set InputDoc = ActiveDocument Set OutputDoc = Documents.Add 'Normal template by default 'remove section breaks from input, replace with a hard page break With InputDoc.Range.Find .ClearFormatting .Replacement.ClearFormatting .Text = "^b" .Replacement.Text = "^m" .Wrap = wdFindContinue .Execute Replace:=wdReplaceAll End With 'make autonumbers and bullets hardcoded InputDoc.ConvertNumbersToText (wdNumberAllNumbers) 'Make graphics inline, skip errors On Error Resume Next For i = InputDoc.Shapes.Count To 1 Step -1 InputDoc.Shapes(i).ConvertToInlineShape Next i On Error GoTo 0 'remove formatting from input With InputDoc.Range .Style = wdStyleNormal .ParagraphFormat.Reset .Font.Reset OutputDoc.Range.FormattedText = .FormattedText 'bring content into new doc End With OutputDoc.UpdateStyles 'restore style settings in output from Normal template OutputDoc.Activate End Sub
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia Last edited by Guessed; 01-12-2023 at 06:54 PM. Reason: Edited code to deal with subsequent request |
#5
|
|||
|
|||
Thank you so much! It really works for the most part of the document I'm processing! I have some follow-ups:
-Is it possible to preserve the little black dot in the front if the text belongs to a bullet list? I want to remake bullet list in the output doc. If I copy and paste a bullet list as plain text, I can still get that black dot, which is very useful for me to remake the bullet list in the template after pasting. Or if the VBA can directly apply my bullet style called "XX_style" to remake those bullet lists in the output document, it would be perfect. -I see pretty much all the images are anchored in the output doc. Is it possible to make them wrap in line with the texts instead? -I want to have a way to tell contents from different pages, like having "page xx" above the lines of contents for a page in the output doc. Should I then iterate through each page with your code inside the for loop? Again, I cannot thank you enough as the current code already saves me lots of time on converting these pdf! |
#6
|
||||
|
||||
I've updated the code to attempt to deal with your later requirements. These are each dodgy in their own special way and won't give perfect results in most documents but at least most of the drama will be avoided. In the case of your sample doc, you will be able to spot the shortcomings but it gets pretty close to what you probably wanted.
The replacement of section breaks with hard page breaks is a kludge fix that happens to work on your sample doc but will give non-ideal results on better formatted documents. The graphic conversion to inline shapes failed on a shape in your sample doc so I just put an error skip on that - rather than work out 'why' it failed.
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#7
|
|||
|
|||
Thanks a lot! I didn't even know the existence of ConvertNumbersToText function. It's really handy.
I was writing my own code as well and approached the inline wrapping a bit differently by iterating through all the shapes and do shape.WrapFormat.Type = wdWrapInline after the pasting. I used your code to process another document and notice that some images are still anchored. I can fix this by using my code on the output doc. Don't really know why. I also tried to put InputDoc.Shapes(i).WrapFormat.Type = wdWrapInline after your InputDoc.Shapes(i).ConvertToInlineShape but that didn't help. Graphic conversion failed on 1 probably because that 1 shape is a textbox. I was processing another document several hours ago and run into this situation, which is really hard to spot on. I do notice that your code will put the inlineshape at the beginning of the texts, which in the original file are visually above the image. I'll try to iterate through the paragraphs after the pasting and move that image to be after the texts. Anyway, it's really close to what I want to do and I will keep working on this and ask later. |
#8
|
||||
|
||||
The inlineshape will be whereever the shape's anchor is located.
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#9
|
|||
|
|||
I have further developed your code and it works great so far. Regarding the style flushing issue, I noticed that sometimes the Acrobat-exported word document contains some styles that don't exist in my template. Those styles will be carried over during the copy-and-paste process. Is it safe for me to delete those styles before the transfer?
Also the style called "Body Text" from the source document will still overwrite my template's body text style. Could I just delete this style from the source document before copying? |
#10
|
||||
|
||||
When you copy from one file to another there are two ways that the styles might be updated in the target file.
1. Custom (non built-in) styles get added to the target 2. Style aliases in source might get added to the built-in style names in the target To deal with both these, since we are intentionally removing the style info from the source, there is no problem with blowing away the custom styles in the target before we copy. Similarly, you can remove all style aliases before that copy. The following macro could be applied to the source before copying. Code:
Sub NukeStyles() Dim aSty As Style, i As Integer On Error Resume Next 'Get rid of custom styles For i = ActiveDocument.Styles.Count To 1 Step -1 Set aSty = ActiveDocument.Styles(i) If Not aSty.BuiltIn Then aSty.Delete Next i 'remove any style aliases For Each aSty In ActiveDocument.Styles aSty.NameLocal = Split(aSty.NameLocal, ",")(0) Next aSty End Sub You can't delete 'Body Text' since it is a built-in style. Word's GUI pretends that you can delete it but really it just hides the style. What attributes of the Body Text style are being imported from the source file that weren't already in the template's body text style?
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#11
|
|||
|
|||
Now that you say the Body Text is a built-in text, I do feel like maybe what I did, aka deleting that style, is simply asking word to hide it. I mainly use the style gallery (the one in the tool bar) to see the styles but a doc should have more styles than that.
What I saw was that the copying and pasting into my template will add the Body Text style into the gallery. I deleted the Body Text from source doc before copying and that solved this behaviour. So yeah probably that's simply hidden, which is fine, as long as it doesn't change my template styles. I compared the styles in source doc and template and Body Text was found in both files, but deleting Body Text style was allowed in source doc, so I guess after pasting into the template, the template's Body Text style just took over. |
Tags |
copy and paste, image insert, template document styles |
Thread Tools | |
Display Modes | |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Copy Between Two Headings/Texts | Noob_VBA | Word VBA | 14 | 12-01-2021 12:28 PM |
copy paste webp image from internet to word fails to appear | DBenz | Word | 4 | 05-20-2020 06:49 PM |
Need to copy texts from excel and paste in to the Notepad++ in between the particular tags | ganesang | Word VBA | 2 | 08-27-2018 02:05 AM |
Keyboard shortcut to paste as plain unformatted text. | Wyck | Word | 1 | 09-21-2016 03:49 PM |
OneNote - Copy and paste image to the table | MartinK | OneNote | 0 | 09-03-2013 05:08 AM |