View Single Post
 
Old 10-26-2021, 03:10 PM
Belle Belle is offline Windows 8 Office 2007
Novice
 
Join Date: Oct 2021
Posts: 2
Belle is on a distinguished road
Default extract text from website / html - anyone?

Hi there! For weeks I've been trying to solve this problem and I already sought for help on other forums. Unfortunately, nobody was able to help me.
I have a very long list of words, several thousand words that I collected over time. I'd like to create a personal dictionary. (Only for personal use)

My plan is to import the meanings from a specific website (Duden | Sprache sagt alles.) into Excel. Duden is a dictionary of the german language and their website is well organized. The names of the links are very predictable.

For instance the adress for "Unboxing" is "https://www.duden.de/rechtschreibung/Unboxing"

(Note: There are a lot of english words as well in the dictionary and "Rechtschreibung" means orthography)

Now, I'm lucky to have ALL the links. (I simply generated them by automatically combining the first part "https://www.duden.de/rechtschreibung/" with each word (for ex. "Unboxing")
This means that I got all the sources for the imports. These links are meant to be integrated into the formula that is supposed to extract the text beneath the subtitle "Bedeutung" (meaning) that you can find on each of those websites.

Because of the vast amount of words it needs to be a formula that I can apply for all words (in some sort of order to not over-strain my pc, excel, the connection.... the conventional way to import data from an external source won't work here.

I tried to modify formulas that I found on the web

=import(website;html string) (or something like that)

but because of my lack of knowledge when it comes to html I did not get anywhere.

I give up. Can anyone please help me?

Last edited by Belle; 10-27-2021 at 04:47 AM.
Reply With Quote