#1
|
|||
|
|||
Separate strings in a text file
I have a text file with several sequences of this type:
<p class=NonNrPar align=center style='text-align:center;line-height: normal'><img width=453 height=255 id="Immagine 1" src="E_DIME_ORIG_file/image001.jpg"></p> I'm trying to write a macro that will 1) find the string |<img .... ><| (eventually to be deleted); 2) return the string |width=and its digits|; 3) return the string |height=and its digits|. The code here below doesn't work. Should I try Split? Can someone help? Thanks! Code:
Sub ThreeStrings() Dim oRng As Range Set oRng = ActiveDocument.Range With oRng.Find .Text = "<img width" .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False While .Execute oRng.MoveEndUntil Cset:="><" MyString = oRng MsgBox MyString '1. this one works 'oRng.Delete oRng.Collapse wdCollapseStart oRng.MoveEndUntil Cset:=" h", Count:=wdForward 'this doesn't work VarWd = oRng MsgBox VarWd oRng.Move Unit:=wdCharacter, Count:=1 'oRng.Select oRng.Collapse wdCollapseEnd oRng.MoveUntil oRng Like "^# " 'this doesn't work VarHi = oRng MsgBox VarHi Wend End With End Sub |
#2
|
||||
|
||||
Is the content always valid XML? It might be better to work with xml if you know the source is always valid xml
Is there variations in the img tag that the code would need to deal with. eg width, height, id, source is the order of that example but the order could change easily. Also the pixels may sometimes be included in quotes and still be valid html.
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#3
|
|||
|
|||
Thank you, Guessed! All I can say is that the code in question is contained in an html file with replaced extension (*.txt). In my experience the recurring sequence is always the same, except for img name and size. I am not familiar with xml.
|
#4
|
|||
|
|||
Hi, RobiNew! I have three suggestions:
1) oRng.Collapse wdCollapseStart can be replaced by oRng.Collapse because wdCollapseStart is the default value (a little shorter); 2) oRng.MoveEndUntil Cset:=" h" will find any ONE char of the set of chars (either ' ' or 'h', not both of them) because Cset is a single-character function; 3) oRng.MoveUntil oRng Like "^# " is incorrect. I'd use oRng.MoveUntil Cset:="0123456789" to find any digit. These are just minor suggestions, maybe someone will help with the whole code because not everything in it is clear to me. |
#5
|
|||
|
|||
Thanks a lot, Vivka, for your corrections! But now I've come to the conclusion that I should use Split in order to extract the two strings I need ('width=###' and 'height=###') from the text here below:
<p class=NonNrPar align=center style='text-align:center;line-height: normal'><img width=453 height=255 id="Immagine 1" src="E_DIME_ORIG_file/image001.jpg"></p> So far I've had no luck with the usual Split procedures. |
#6
|
|||
|
|||
RobiNew, what do you mean by "extract": hilighting, msgboxing, adding to a collection, deleting...? For me your task is not quite clear.
If you need to hilight the found strings, you may use the followng code, which was once made by gmaxey and slightly modified by me: Sub Hilite_Array_T() 'In selection, hilite all instances of 'width=' and 'height=' followed 'by three-digit numbers. Application.ScreenUpdating = False Dim vFindTxt As Variant Dim oRng As range Dim i As Long vFindTxt = Array("width=", "height=") For i = 0 To UBound(vFindTxt) Set oRng = selection.range With oRng.Find .ClearFormatting .Replacement.ClearFormatting Do While .Execute(FindText:=vFindTxt(i), _ MatchWholeWord:=True, _ Forward:=True, _ Wrap:=wdFindStop) = True And _ oRng.End <= selection.range.End oRng.MoveEnd unit:=wdCharacter, count:=3 oRng.HighlightColorIndex = wdYellow oRng.Collapse wdCollapseEnd Loop End With DoEvents Next lbl_Exit: Set oRng = Nothing Exit Sub Application.ScreenUpdating = True End Sub |
#7
|
|||
|
|||
Thank you, Vivka! Your code leaves everything unchanged. But what I need is not to highlight all instances of 'width=' and 'height=' followed by three-digit numbers.
As I suggested in the original post, I have a text file with several sequences of this type: <p class=NonNrPar align=center style='text-align:center;line-height: normal'><img width=453 height=255 id="Immagine 1" src="E_DIME_ORIG_file/image001.jpg"></p> I'm trying to write a code that will 1) Find part of the above string: from <img to ><; 2) create a variable that contains the string width=### in the Found string; 3) create a variable that contains the string height=### in the Found string; 4) Delete the Found string: from <img to ><. Hope someone can help. Thanks! |
#8
|
|||
|
|||
I managed to make it work correctly, provisionally on a single instance of the text mentioned above.
Perhaps someone can make it less clumsy. Code:
Sub ThreeStrings() Dim oRng As Range Set oRng = ActiveDocument.Range With oRng.Find .Text = "<img width" .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchWildcards = False While .Execute oRng.MoveEndUntil Cset:="><" VarStrg1 = oRng 'MsgBox VarStrg1 oRng.Collapse oRng.MoveEndUntil Cset:=" " oRng.Move Unit:=wdCharacter, Count:=2 oRng.MoveEndUntil Cset:=" " VarWd = oRng 'MsgBox VarWd oRng.Move Unit:=wdCharacter, Count:=2 oRng.MoveEndUntil Cset:=" " VarHi = oRng 'MsgBox VarHi Set oRng = ActiveDocument.Range With oRng.Find .Text = "<img width" .Replacement.Text = "" .Forward = False .Wrap = wdFindStop .Format = False .MatchWildcards = False .Execute End With oRng.MoveEndUntil Cset:="><" oRng.Delete Goto End Wend End With End: End Sub |
#9
|
|||
|
|||
As a simple variant:
Code:
Sub Test() Dim oRng As range Dim vWd As Variant Dim vHi As Variant Set oRng = ActiveDocument.range With oRng.Find .ClearFormatting .Replacement.ClearFormatting .text = "width=^#^#^#" .Replacement.text = "" .MatchWildcards = False .Forward = True .Wrap = wdFindStop If .Execute Then vWd = oRng End With Set oRng = ActiveDocument.range With oRng.Find .text = "height=^#^#^#" If .Execute Then vHi = oRng End With With ActiveDocument.range.Find .text = "\<" & "img width" & "*" & "\>\<" .MatchWildcards = True .Execute Replace:=wdReplaceAll End With MsgBox vWd & vbCr & vHi lbl_Exit: Set oRng = Nothing End Sub |
#10
|
|||
|
|||
Thank you,Vivka! That's a nice variant, but if you Find 'width' you might land on an non-image string;
and ^#^#^# could be different (e.g.: ^#^#). The code should operate within the image string mentioned in post 7. |
#11
|
|||
|
|||
Then what about this:
Code:
Sub Test() Dim oRng As range Dim oRngD As range Dim oRngDD As range Dim vWd As Variant Dim vHi As Variant Set oRng = ActiveDocument.range With oRng.Find .ClearFormatting .Replacement.ClearFormatting .text = "\<img width*\>" .Replacement.text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchWildcards = True If .Execute Then Set oRngD = oRng.Duplicate Set oRngDD = oRng.Duplicate .text = "width=" & "[0-9]@ " .MatchWildcards = True If .Execute Then vWd = oRng With oRngD.Find .text = "height=" & "[0-9]@ " .MatchWildcards = True If .Execute Then vHi = oRng End With End If End With oRngDD.Delete MsgBox "'" & vWd & "'" & vbCr & "'" & vHi & "'" lbl_Exit: Set oRng = Nothing Set oRngD = Nothing Set oRngDD = Nothing End Sub 1) Word & Height are remembered with trailing space, which I think is insignificant; 2) If you want to delete the final '<', change .text = "\<img" & "*\>" to .text = "\<img" & "*\>\<" Last edited by vivka; 12-10-2023 at 09:31 AM. |
#12
|
|||
|
|||
Thank you very much indeed, Vivka! It works like charm!
|
#13
|
|||
|
|||
I'm glad I could suggest something workable! And thank you for another challenge!
|
#14
|
|||
|
|||
Hi Vivka! Perhaps you can help me again. If my text file contains more than one image, how can I repeat the procedure without messing up the various ranges? Thanks!
|
#15
|
|||
|
|||
RobiNew, please see the slightly changed code in post 11. The code will work properly if 'img' is always followed by space and 'width'. One run - one replace.
Last edited by vivka; 12-10-2023 at 12:45 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to hide/delete slides based on keyword in a separate text file based on AND|OR condition? | rupd911 | PowerPoint | 0 | 02-22-2021 08:22 AM |
Auto Save Every Page(s) as a separate file, and name each new file automatically by the first line? | commissarmo | Word VBA | 3 | 03-14-2015 12:53 AM |
Copying a part of a docx file as a separate file | officeboy09 | Word | 6 | 09-26-2014 05:15 PM |
Search Multiple strings and create new word file | subodhgupta | Word | 4 | 05-22-2014 03:34 AM |
update style of all strings available between two specific strings | vikrantkale | Word | 1 | 03-28-2011 06:13 PM |