Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 06-28-2019, 12:38 PM
donlincolnmsof donlincolnmsof is offline Extract data from HTML File. Windows 7 64bit Extract data from HTML File. Office 2003
Advanced Beginner
Extract data from HTML File.
 
Join Date: Oct 2011
Posts: 36
donlincolnmsof is on a distinguished road
Default Extract data from HTML File.

Hello

I'm looking for a macro that will extract data from an HTML file, Here is the code that did the job, but in the HTML file the search code is changed and now the macro doesn't work. The macro worked pretty fast, if any one can fix this i would really appreciate it, attached is the input file with raw data and the output file that shows what it should look like.



Thanks.

Code:
Application.ScreenUpdating = False
Dim StrOut As String, wdDoc As Document
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "^34\>[!\<]@\</a\>^13[ ]@\</h3\>*address^34\>[!\<]@\</div\>*phone^34\>[!\<]@\</div\>"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = False
    .MatchWildcards = True
    .Execute
  End With
  Do While .Find.Found
    StrOut = StrOut & Trim(Split(Split(.Text, "</a>")(0), vbCr)(1)) & vbTab
    StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(0), ">")(2) & vbTab
    
    If InStr(.Text, "<span>") = 0 Then
      StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(1), ">")(1)
    End If
    StrOut = StrOut & vbCr
    
    

.MoveStart wdCharacter, InStr(.Text, Split(Split(Split(.Text, "</a>")(1), "</div>")(0), ">")(2))
.Collapse wdCollapseStart

    .Find.Execute
  Loop
End With
Set wdDoc = Documents.Add
wdDoc.Range.Text = StrOut
Application.ScreenUpdating = True
 
 
Dim I As Integer
For I = 1 To 100   ' Loop 100 times.
   Beep   ' Sound a tone.
Next I
Attached Files
File Type: doc sample output.doc (19.0 KB, 12 views)
File Type: doc input file.doc (290.0 KB, 12 views)

Last edited by macropod; 06-28-2019 at 06:09 PM. Reason: Added code tags
Reply With Quote
  #2  
Old 06-29-2019, 12:49 AM
gmayor's Avatar
gmayor gmayor is offline Extract data from HTML File. Windows 10 Extract data from HTML File. Office 2016
Expert
 
Join Date: Aug 2014
Posts: 4,101
gmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud of
Default

I would do it differently e.g.

Code:
Sub Macro1()
Dim oSource As Document
Dim oDoc As Document
Dim oRng As Range, oParaRng As Range
Dim lngP As Long
Dim sName As String, sAdd As String, sPhone As String, sExtract As String
    Set oSource = ActiveDocument
    Set oRng = oSource.Range
    Set oDoc = Documents.Add
    oDoc.Range.Font.Name = "Courier New"
    oDoc.Range.Font.Size = 10
    With oRng.Find
        Do While .Execute(FindText:="<div class=" & Chr(34) & "c-people-result__address" & Chr(34) & ">")
            oRng.MoveEnd wdParagraph, 2
            oRng.MoveStart wdParagraph, -4
            For lngP = 1 To oRng.Paragraphs.Count
                Select Case lngP
                    Case 1
                        Set oParaRng = oRng.Paragraphs(lngP).Range
                        oParaRng.End = oParaRng.End - 1
                        sName = Trim(Replace(oParaRng.Text, "</a>", ""))
                        sExtract = sName
                    Case 4
                        Set oParaRng = oRng.Paragraphs(lngP).Range
                        oParaRng.End = oParaRng.End - 1
                        oParaRng.MoveStartUntil ">"
                        oParaRng.Start = oParaRng.Start + 1
                        sAdd = Replace(oParaRng.Text, "</div>", "")
                        sExtract = sExtract & vbTab & sAdd
                    Case 5
                        Set oParaRng = oRng.Paragraphs(lngP).Range
                        oParaRng.End = oParaRng.End - 1
                        oParaRng.MoveStartUntil "("
                        sPhone = Replace(oParaRng.Text, "</div>", "")
                        sExtract = sExtract & vbTab & sPhone
                End Select
            Next lngP
            oDoc.Range.InsertAfter Trim(sExtract) & vbCr
            oRng.Collapse 0
        Loop
    End With
End Sub
__________________
Graham Mayor - MS MVP (Word) (2002-2019)
Visit my web site for more programming tips and ready made processes www.gmayor.com
Reply With Quote
  #3  
Old 06-29-2019, 10:45 AM
donlincolnmsof donlincolnmsof is offline Extract data from HTML File. Windows 7 64bit Extract data from HTML File. Office 2003
Advanced Beginner
Extract data from HTML File.
 
Join Date: Oct 2011
Posts: 36
donlincolnmsof is on a distinguished road
Default

Hello Gmayor

This worked, it extracted the data real quick.

Thanks for all your help!!
Reply With Quote
  #4  
Old 08-26-2019, 01:45 PM
donlincolnmsof donlincolnmsof is offline Extract data from HTML File. Windows 7 64bit Extract data from HTML File. Office 2003
Advanced Beginner
Extract data from HTML File.
 
Join Date: Oct 2011
Posts: 36
donlincolnmsof is on a distinguished road
Default

Hello Gmayor

You modified a macro earlier and it worked great, but the search code has changed, can you fix this macro, i will really appreciate it. blow is the link to the post.

Thanks.

https://www.msofficeforums.com/word-...tml#post144478
Reply With Quote
  #5  
Old 08-26-2019, 08:26 PM
gmayor's Avatar
gmayor gmayor is offline Extract data from HTML File. Windows 10 Extract data from HTML File. Office 2016
Expert
 
Join Date: Aug 2014
Posts: 4,101
gmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud ofgmayor has much to be proud of
Default

I looked at that, but it is a hairball and does not readily lend itself to the same type of extraction. It would be better if the HTML creation was improved to make it easier to extract data, but that may be out of your control.
__________________
Graham Mayor - MS MVP (Word) (2002-2019)
Visit my web site for more programming tips and ready made processes www.gmayor.com
Reply With Quote
  #6  
Old 08-26-2019, 08:36 PM
donlincolnmsof donlincolnmsof is offline Extract data from HTML File. Windows 7 64bit Extract data from HTML File. Office 2003
Advanced Beginner
Extract data from HTML File.
 
Join Date: Oct 2011
Posts: 36
donlincolnmsof is on a distinguished road
Default

I wrote a macro which does sort of job, but its too slow and takes too long, which is basically have 2 docs open and switch back and forth to extract from bigger file to the 2nd file where the new data appears, and i used the word marco recorder function to record keystrokes.

Below keys are the one that needs to be search since they appears as Fix all over the doc.



loop search
<a href="/name/
<span itemprop="streetAddress">
span itemprop="addressLocality">
<dt class="col-md-4">Phone Number</dt>
<dt class="col-md-4">Email Address


Here is a macro you wrote a while back and it was very accurate and less coding.


Dim oSource As Document
Dim oDoc As Document
Dim oRng As Range, oParaRng As Range
Dim lngP As Long
Dim sName As String, sAdd As String, sPhone As String, sExtract As String
Set oSource = ActiveDocument
Set oRng = oSource.Range
Set oDoc = Documents.Add
oDoc.Range.Font.Name = "Courier New"
oDoc.Range.Font.Size = 10
With oRng.Find
Do While .Execute(FindText:="<div class=" & Chr(34) & "c-people-result__address" & Chr(34) & ">")
oRng.MoveEnd wdParagraph, 2
oRng.MoveStart wdParagraph, -4
For lngP = 1 To oRng.Paragraphs.Count
Select Case lngP
Case 1
Set oParaRng = oRng.Paragraphs(lngP).Range
oParaRng.End = oParaRng.End - 1
sName = Trim(Replace(oParaRng.Text, "</a>", ""))
sExtract = sName
Case 4
Set oParaRng = oRng.Paragraphs(lngP).Range
oParaRng.End = oParaRng.End - 1
oParaRng.MoveStartUntil ">"
oParaRng.Start = oParaRng.Start + 1
sAdd = Replace(oParaRng.Text, "</div>", "")
sExtract = sExtract & vbTab & sAdd
Case 5
Set oParaRng = oRng.Paragraphs(lngP).Range
oParaRng.End = oParaRng.End - 1
oParaRng.MoveStartUntil "("
sPhone = Replace(oParaRng.Text, "</div>", "")
sExtract = sExtract & vbTab & sPhone
End Select
Next lngP
oDoc.Range.InsertAfter Trim(sExtract) & vbCr
oRng.Collapse 0
Loop
End With


Let me know if anything can be done. Thanks.!
Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract data from HTML File. donlincolnmsof Word VBA 0 03-07-2019 12:17 PM
Need help to extract specific data from text file using vba victor92 Excel Programming 0 12-01-2017 12:53 AM
Extract data from HTML File. Extract Data From Text file based on Pattern PRA007 Word VBA 13 11-01-2015 11:20 PM
Macro to highlight repeated words in word file and extract into excel file aabri Word VBA 1 06-14-2015 07:20 AM
Extract data from HTML File. How to Extract key data from word iliauk Word 3 11-08-2013 04:37 PM

Other Forums: Access Forums

All times are GMT -7. The time now is 12:03 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft