![]() |
#1
|
|||
|
|||
![]() Hi everyone, maybe someone could help me! I'm trying to identify the classname related with each string named "toSearch" in each URL. My logic is to loop every link and search for a text that is present in each level especified. With my code below I'm able to identify classname of level1 and level2, but is not working for level3 and level4 and the other issue is that some cities don't have more than one location, in that case level4 exists but level3 doesn´t exist. Then maybe someone could help me how to identify the classname for level3 and level4 taking in consideration the cases when there are only 3 levels (level1, level2, levell4) and if there is a way to give as input only the first URL and the macro be able to identify the other 3 as needed in each stage. Level1 = Name of the state Level2 = Name of the city Level3 = Some text (location) in a link that is present for each city Level4 = The street address (is not a link, but a text) Thanks in advance Code:
Sub GetClass() Dim url1 As String, url2 As String, url3 As String, url4 As String Dim toSearch1 As String, toSearch2 As String, toSearch3 As String, toSearch4 As String Dim HTMLDoc As New HTMLDocument 'URL levels url1 = "https://locations.bojangles.com/" url2 = "https://locations.bojangles.com/al.html" url3 = "https://locations.bojangles.com/al/huntsville.html" url4 = "https://locations.bojangles.com/al/huntsville/11375-south-memorial-pkwy.html" 'Text to search in each level toSearch1 = "Alabama" toSearch2 = "Huntsville" toSearch3 = "South Memorial Pkwy" toSearch4 = "11375 South Memorial Pkwy" 'Print className for each level Call LoopElements(url1, toSearch1, "Level1") Call LoopElements(url2, toSearch2, "Level2") Call LoopElements(url2, toSearch3, "Level3") Call LoopElements(url4, toSearch4, "Level4") End Sub Function LoopElements(url As String, toSearch As String, level As String) Dim HTMLDoc As New HTMLDocument Dim links As Object Dim i As Integer With New ServerXMLHTTP60 .Open "Get", url, False .send HTMLDoc.body.innerHTML = .responseText End With Set links = HTMLDoc.body.getElementsByTagName("a") With links For i = 0 To .Length - 1 If .Item(i).innerText Like "*" & toSearch & "*" Then Debug.Print level & " ClassName: " & .Item(i).className End If Next i End With End Function |
#2
|
||||
|
||||
![]()
I'm not sure what you are asking on the levels question but this variation returns the values you found along the way so you can derive the subsequent URLs. I suspect getting an empty string back will answer your levels question.
Code:
Sub GetClass() Dim url1 As String, url2 As String, url3 As String, url4 As String Dim toSearch1 As String, toSearch2 As String, toSearch3 As String, toSearch4 As String Dim HTMLDoc As New HTMLDocument 'URL levels url1 = "https://locations.bojangles.com/" 'Text to search in each level toSearch1 = "Alabama" toSearch2 = "Huntsville" toSearch3 = "South Memorial Pkwy" toSearch4 = "11375 South Memorial Pkwy" 'Print className for each level url2 = url1 & LoopElements(url1, toSearch1, "Level1") url3 = url1 & LoopElements(url2, toSearch2, "Level2") url4 = url1 & LoopElements(url3, toSearch3, "Level3") url4 = Replace(url4, "../", "") Call LoopElements(url4, toSearch4, "Level4") End Sub Function LoopElements(url As String, toSearch As String, level As String) As String Dim HTMLDoc As New HTMLDocument Dim links As Object Dim i As Integer Debug.Print "Now searching: " & url, toSearch, level With New ServerXMLHTTP60 .Open "Get", url, False .send HTMLDoc.body.innerHTML = .responseText End With Set links = HTMLDoc.body.getElementsByTagName("a") With links For i = 0 To .Length - 1 If .Item(i).innerText Like "*" & toSearch & "*" Then Debug.Print "", .Item(i), .Item(i).innerText Debug.Print "", level & " ClassName: " & .Item(i).className LoopElements = Split(.Item(i), ":")(1) End If Next i End With End Function
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#3
|
|||
|
|||
![]() Quote:
What I mean by levels is, if you see this test is againts city "Huntsville", that have 4 locations and to get the class of the "adress", is needed to drilldown until 4th level. Level1=State=Alabama, level2=city=Hunstville, level3=locations=11375 South Memorial Pkwy (in this case the anchor is an address but is a link) level4=addess=11375 South Memorial Pkwy (the actual address that is not a link <a></a>, but a text) Now if we select Albertville instead of Huntsville, we can see that only have one location and is needed to click only 3 times to reach the address. Click in state alabama(level1), click in city Albertville (level2) and takes us directly to the same window that happens in 4th level when city is Huntsville, only 3 levels exist (1,2 and 4). Then, how to get the classname when my input is a city that only has one location (3 levels)? (handle both cases, 3 and 4 levels) And how to get the classname of the 4th level (url4)? since is not printing anything because is not a link, is text. Thanks again |
#4
|
||||
|
||||
![]()
OK, now I understand what you are saying about levels but I don't see the point of trying to apply that logic. If you working with hard coded search parameters (toSearchX) then you already know all four levels and the code is just revealing something you already know.
Regardless, looking at this line shows you have a key piece of info in the innerText value Code:
Debug.Print "", .Item(i), .Item(i).innerText Code:
about:al.html Alabama(33) about:al/huntsville.html Huntsville(4)
__________________
Andrew Lockton Chrysalis Design, Melbourne Australia |
#5
|
|||
|
|||
![]()
Yes, if the state or city shows "(number)" is the number of cities or locations in that city.
Let me explain, there is already a macro (macro 2) that needs, as manual user input, the classes of each level, then the macro gets the addresses/street only with those inputs. Now, I'm using this logic because I'm trying to do something a kind of "generic" macro (macro 1) that could work for other few sites that have similar structure (with same levels) only introducing some pure text content (state, city, location, address) present in website (nothing of html). This macro I'm trying to do, will extract the classes that would be the input for "macro 2". Macro 1 would feed with the classes Macro2, and Macro 2 would get the text info needed. One of the goals is only give text input to "macro 2" that any without knowledge of html, could look the desired website and write in 4 cells of worksheet, the text for state, city, location, address. I know each site needs a custom code when we are trying to do some web scraping, but in this case, my idea is to do this logic to take advantage of visual similarities between some websites. I hope make sense. Regards |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
![]() |
ilcaa72 | Word VBA | 3 | 05-01-2017 07:13 PM |
Returned to the stage to enter a license code product | Rahayu Sinuraya | Office | 0 | 01-09-2017 03:28 AM |
![]() |
omahadivision | Excel Programming | 12 | 11-23-2013 12:10 PM |
Evernote--Class Notes | markg2 | Outlook | 0 | 05-10-2012 05:50 PM |
Word forces a white border at the Print stage | niceguyjin | Word | 1 | 08-13-2011 01:46 AM |