#1
|
|||
|
|||
Extract data from HTML File.
Hello
I'm looking for a macro that will extract data from an HTML file, Here is the code that did the job, but in the HTML file the search code is changed and now the macro doesn't work. The macro worked pretty fast, if any one can fix this i would really appreciate it, attached is the input file with raw data and the output file that shows what it should look like. Thanks. Code:
Application.ScreenUpdating = False Dim StrOut As String, wdDoc As Document With ActiveDocument.Range With .Find .ClearFormatting .Replacement.ClearFormatting .Text = "^34\>[!\<]@\</a\>^13[ ]@\</h3\>*address^34\>[!\<]@\</div\>*phone^34\>[!\<]@\</div\>" .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchWildcards = True .Execute End With Do While .Find.Found StrOut = StrOut & Trim(Split(Split(.Text, "</a>")(0), vbCr)(1)) & vbTab StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(0), ">")(2) & vbTab If InStr(.Text, "<span>") = 0 Then StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(1), ">")(1) End If StrOut = StrOut & vbCr .MoveStart wdCharacter, InStr(.Text, Split(Split(Split(.Text, "</a>")(1), "</div>")(0), ">")(2)) .Collapse wdCollapseStart .Find.Execute Loop End With Set wdDoc = Documents.Add wdDoc.Range.Text = StrOut Application.ScreenUpdating = True Dim I As Integer For I = 1 To 100 ' Loop 100 times. Beep ' Sound a tone. Next I Last edited by macropod; 06-28-2019 at 06:09 PM. Reason: Added code tags |
#2
|
||||
|
||||
I would do it differently e.g.
Code:
Sub Macro1() Dim oSource As Document Dim oDoc As Document Dim oRng As Range, oParaRng As Range Dim lngP As Long Dim sName As String, sAdd As String, sPhone As String, sExtract As String Set oSource = ActiveDocument Set oRng = oSource.Range Set oDoc = Documents.Add oDoc.Range.Font.Name = "Courier New" oDoc.Range.Font.Size = 10 With oRng.Find Do While .Execute(FindText:="<div class=" & Chr(34) & "c-people-result__address" & Chr(34) & ">") oRng.MoveEnd wdParagraph, 2 oRng.MoveStart wdParagraph, -4 For lngP = 1 To oRng.Paragraphs.Count Select Case lngP Case 1 Set oParaRng = oRng.Paragraphs(lngP).Range oParaRng.End = oParaRng.End - 1 sName = Trim(Replace(oParaRng.Text, "</a>", "")) sExtract = sName Case 4 Set oParaRng = oRng.Paragraphs(lngP).Range oParaRng.End = oParaRng.End - 1 oParaRng.MoveStartUntil ">" oParaRng.Start = oParaRng.Start + 1 sAdd = Replace(oParaRng.Text, "</div>", "") sExtract = sExtract & vbTab & sAdd Case 5 Set oParaRng = oRng.Paragraphs(lngP).Range oParaRng.End = oParaRng.End - 1 oParaRng.MoveStartUntil "(" sPhone = Replace(oParaRng.Text, "</div>", "") sExtract = sExtract & vbTab & sPhone End Select Next lngP oDoc.Range.InsertAfter Trim(sExtract) & vbCr oRng.Collapse 0 Loop End With End Sub
__________________
Graham Mayor - MS MVP (Word) (2002-2019) Visit my web site for more programming tips and ready made processes www.gmayor.com |
#3
|
|||
|
|||
Hello Gmayor
This worked, it extracted the data real quick. Thanks for all your help!! |
#4
|
|||
|
|||
Hello Gmayor
You modified a macro earlier and it worked great, but the search code has changed, can you fix this macro, i will really appreciate it. blow is the link to the post. Thanks. https://www.msofficeforums.com/word-...tml#post144478 |
#5
|
||||
|
||||
I looked at that, but it is a hairball and does not readily lend itself to the same type of extraction. It would be better if the HTML creation was improved to make it easier to extract data, but that may be out of your control.
__________________
Graham Mayor - MS MVP (Word) (2002-2019) Visit my web site for more programming tips and ready made processes www.gmayor.com |
#6
|
|||
|
|||
I wrote a macro which does sort of job, but its too slow and takes too long, which is basically have 2 docs open and switch back and forth to extract from bigger file to the 2nd file where the new data appears, and i used the word marco recorder function to record keystrokes.
Below keys are the one that needs to be search since they appears as Fix all over the doc. loop search <a href="/name/ <span itemprop="streetAddress"> span itemprop="addressLocality"> <dt class="col-md-4">Phone Number</dt> <dt class="col-md-4">Email Address Here is a macro you wrote a while back and it was very accurate and less coding. Dim oSource As Document Dim oDoc As Document Dim oRng As Range, oParaRng As Range Dim lngP As Long Dim sName As String, sAdd As String, sPhone As String, sExtract As String Set oSource = ActiveDocument Set oRng = oSource.Range Set oDoc = Documents.Add oDoc.Range.Font.Name = "Courier New" oDoc.Range.Font.Size = 10 With oRng.Find Do While .Execute(FindText:="<div class=" & Chr(34) & "c-people-result__address" & Chr(34) & ">") oRng.MoveEnd wdParagraph, 2 oRng.MoveStart wdParagraph, -4 For lngP = 1 To oRng.Paragraphs.Count Select Case lngP Case 1 Set oParaRng = oRng.Paragraphs(lngP).Range oParaRng.End = oParaRng.End - 1 sName = Trim(Replace(oParaRng.Text, "</a>", "")) sExtract = sName Case 4 Set oParaRng = oRng.Paragraphs(lngP).Range oParaRng.End = oParaRng.End - 1 oParaRng.MoveStartUntil ">" oParaRng.Start = oParaRng.Start + 1 sAdd = Replace(oParaRng.Text, "</div>", "") sExtract = sExtract & vbTab & sAdd Case 5 Set oParaRng = oRng.Paragraphs(lngP).Range oParaRng.End = oParaRng.End - 1 oParaRng.MoveStartUntil "(" sPhone = Replace(oParaRng.Text, "</div>", "") sExtract = sExtract & vbTab & sPhone End Select Next lngP oDoc.Range.InsertAfter Trim(sExtract) & vbCr oRng.Collapse 0 Loop End With Let me know if anything can be done. Thanks.! |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extract data from HTML File. | donlincolnmsof | Word VBA | 0 | 03-07-2019 12:17 PM |
Need help to extract specific data from text file using vba | victor92 | Excel Programming | 0 | 12-01-2017 12:53 AM |
Extract Data From Text file based on Pattern | PRA007 | Word VBA | 13 | 11-01-2015 11:20 PM |
Macro to highlight repeated words in word file and extract into excel file | aabri | Word VBA | 1 | 06-14-2015 07:20 AM |
How to Extract key data from word | iliauk | Word | 3 | 11-08-2013 04:37 PM |