Join us
@developergaurav-exe ă» Jul 12,2022 ă» 2 min read ă» 342 views
This will require you some knowledge of HTML and CSS so that you can easily play around with Tags and Elements.
We are finally grabbing elements from the HTML text we beautified from the last time! This will be the finale! Donât miss out on the core Web Scraping!
Congratulations on completing your Part 2 series of this journey! What? You didnât go through Part 2 yet? Worry not, Hereâs the link, Quickly go through it, Perform all the instructions, it will not take more than 15 mins:
You werenât there in part 1 too? I got you, check this out:
Also we are following along the GitHub Repository which I created while learning Web Scraping. Here are the notebooks for you:
Wow, that was a lot of promotion! I am glad you didnât just fume over me :â)
Now this will require you some knowledge of HTML and CSS so that you can easily play around with Tags and Elements.
This also doesnât mean that you cannot do web scraping if you are a complete newbie in HTML and CSS.
This guy just made the job easier for you! Watch some parts of this video and you will get the essence! We will require no more than preliminary knowledge about itâŠ
Grabbing elements off of HTML text
So coming back, We wanted to grab elements off of beautified soup which we currently have from our last article.
Syntax for grabbing elements:
Now if we wanted to grab paragraph elements from the HTML text. Then upon running the First command, it will generate an output which will contain list of all paragraph elements.
Second command will give out the first element of the list which is at the index 0.
Third one will simply give the text inside the paragraph tag selected.
Side Activity
Grabbing this Table of Contentsâ sub points from: https://en.wikipedia.org/wiki/Jonas_Salk
Wikipedia is open sourced, So we can scrape out the data without any difficulties!
You will have to find out the class which will give out the contents, In my case â.toclevel-2â class contained these sub headings.
So this is the little hidden trick which you will have to solve every time you want to scrape something specific out!
Also, remember classes can change accordingly from IP Address to IP Address. So, You will most probably get to see some another class containing these sub headings.
If you have performed everything till now, and it all worked fine, Then Congratulations! You can now successfully scrape data out from the websites over the internet from your Python Script!
I have not gone explaining too much so that not to overwhelm you and for the sake of simplicity! I am expecting that you know the art of googling! :)
By the Way,
You would have noticed master or main written like this in my terminal:
This means my web scraping folder is a git initialized repository. Donât know what git is? Let me know if you would want an easy explanation about it!
I promise I will explain it in one article! :)
Get to know all about Web Development in this short article:
~Follow Harsh Gaurav for more Technical and Interestingly random content :)
Join other developers and claim your FAUN account now!
Influence
Total Hits
Posts
Only registered users can post comments. Please, login or signup.