![]() |
|
|
|
#1
|
|||
|
|||
|
Hello
I'm looking for a macro that will extract data from an HTML file, Here is the code that did the job, but in the HTML file the search code is changed and now the macro doesn't work. The macro worked pretty fast, if any one can fix this i would really appreciate it, attached is the input file with raw data and the output file that shows what it should look like. Thanks. Code:
Application.ScreenUpdating = False
Dim StrOut As String, wdDoc As Document
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "^34\>[!\<]@\</a\>^13[ ]@\</h3\>*address^34\>[!\<]@\</div\>*phone^34\>[!\<]@\</div\>"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchWildcards = True
.Execute
End With
Do While .Find.Found
StrOut = StrOut & Trim(Split(Split(.Text, "</a>")(0), vbCr)(1)) & vbTab
StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(0), ">")(2) & vbTab
If InStr(.Text, "<span>") = 0 Then
StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(1), ">")(1)
End If
StrOut = StrOut & vbCr
.MoveStart wdCharacter, InStr(.Text, Split(Split(Split(.Text, "</a>")(1), "</div>")(0), ">")(2))
.Collapse wdCollapseStart
.Find.Execute
Loop
End With
Set wdDoc = Documents.Add
wdDoc.Range.Text = StrOut
Application.ScreenUpdating = True
Dim I As Integer
For I = 1 To 100 ' Loop 100 times.
Beep ' Sound a tone.
Next I
Last edited by macropod; 06-28-2019 at 06:09 PM. Reason: Added code tags |
|
#2
|
||||
|
||||
|
I would do it differently e.g.
Code:
Sub Macro1()
Dim oSource As Document
Dim oDoc As Document
Dim oRng As Range, oParaRng As Range
Dim lngP As Long
Dim sName As String, sAdd As String, sPhone As String, sExtract As String
Set oSource = ActiveDocument
Set oRng = oSource.Range
Set oDoc = Documents.Add
oDoc.Range.Font.Name = "Courier New"
oDoc.Range.Font.Size = 10
With oRng.Find
Do While .Execute(FindText:="<div class=" & Chr(34) & "c-people-result__address" & Chr(34) & ">")
oRng.MoveEnd wdParagraph, 2
oRng.MoveStart wdParagraph, -4
For lngP = 1 To oRng.Paragraphs.Count
Select Case lngP
Case 1
Set oParaRng = oRng.Paragraphs(lngP).Range
oParaRng.End = oParaRng.End - 1
sName = Trim(Replace(oParaRng.Text, "</a>", ""))
sExtract = sName
Case 4
Set oParaRng = oRng.Paragraphs(lngP).Range
oParaRng.End = oParaRng.End - 1
oParaRng.MoveStartUntil ">"
oParaRng.Start = oParaRng.Start + 1
sAdd = Replace(oParaRng.Text, "</div>", "")
sExtract = sExtract & vbTab & sAdd
Case 5
Set oParaRng = oRng.Paragraphs(lngP).Range
oParaRng.End = oParaRng.End - 1
oParaRng.MoveStartUntil "("
sPhone = Replace(oParaRng.Text, "</div>", "")
sExtract = sExtract & vbTab & sPhone
End Select
Next lngP
oDoc.Range.InsertAfter Trim(sExtract) & vbCr
oRng.Collapse 0
Loop
End With
End Sub
__________________
Graham Mayor - MS MVP (Word) (2002-2019) Visit my web site for more programming tips and ready made processes www.gmayor.com |
|
#3
|
|||
|
|||
|
Hello Gmayor
You modified a macro earlier and it worked great, but the search code has changed, can you fix this macro, i will really appreciate it. blow is the link to the post. Thanks. https://www.msofficeforums.com/word-...tml#post144478 |
|
#4
|
|||
|
|||
|
Hello Gmayor
This worked, it extracted the data real quick. Thanks for all your help!! |
|
#5
|
||||
|
||||
|
I looked at that, but it is a hairball and does not readily lend itself to the same type of extraction. It would be better if the HTML creation was improved to make it easier to extract data, but that may be out of your control.
__________________
Graham Mayor - MS MVP (Word) (2002-2019) Visit my web site for more programming tips and ready made processes www.gmayor.com |
|
#6
|
|||
|
|||
|
I wrote a macro which does sort of job, but its too slow and takes too long, which is basically have 2 docs open and switch back and forth to extract from bigger file to the 2nd file where the new data appears, and i used the word marco recorder function to record keystrokes.
Below keys are the one that needs to be search since they appears as Fix all over the doc. loop search <a href="/name/ <span itemprop="streetAddress"> span itemprop="addressLocality"> <dt class="col-md-4">Phone Number</dt> <dt class="col-md-4">Email Address Here is a macro you wrote a while back and it was very accurate and less coding. Dim oSource As Document Dim oDoc As Document Dim oRng As Range, oParaRng As Range Dim lngP As Long Dim sName As String, sAdd As String, sPhone As String, sExtract As String Set oSource = ActiveDocument Set oRng = oSource.Range Set oDoc = Documents.Add oDoc.Range.Font.Name = "Courier New" oDoc.Range.Font.Size = 10 With oRng.Find Do While .Execute(FindText:="<div class=" & Chr(34) & "c-people-result__address" & Chr(34) & ">") oRng.MoveEnd wdParagraph, 2 oRng.MoveStart wdParagraph, -4 For lngP = 1 To oRng.Paragraphs.Count Select Case lngP Case 1 Set oParaRng = oRng.Paragraphs(lngP).Range oParaRng.End = oParaRng.End - 1 sName = Trim(Replace(oParaRng.Text, "</a>", "")) sExtract = sName Case 4 Set oParaRng = oRng.Paragraphs(lngP).Range oParaRng.End = oParaRng.End - 1 oParaRng.MoveStartUntil ">" oParaRng.Start = oParaRng.Start + 1 sAdd = Replace(oParaRng.Text, "</div>", "") sExtract = sExtract & vbTab & sAdd Case 5 Set oParaRng = oRng.Paragraphs(lngP).Range oParaRng.End = oParaRng.End - 1 oParaRng.MoveStartUntil "(" sPhone = Replace(oParaRng.Text, "</div>", "") sExtract = sExtract & vbTab & sPhone End Select Next lngP oDoc.Range.InsertAfter Trim(sExtract) & vbCr oRng.Collapse 0 Loop End With Let me know if anything can be done. Thanks.! |
|
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Extract data from HTML File. | donlincolnmsof | Word VBA | 0 | 03-07-2019 12:17 PM |
| Need help to extract specific data from text file using vba | victor92 | Excel Programming | 0 | 12-01-2017 12:53 AM |
Extract Data From Text file based on Pattern
|
PRA007 | Word VBA | 13 | 11-01-2015 11:20 PM |
| Macro to highlight repeated words in word file and extract into excel file | aabri | Word VBA | 1 | 06-14-2015 07:20 AM |
How to Extract key data from word
|
iliauk | Word | 3 | 11-08-2013 04:37 PM |