How to do Web Scraping in Python? | Part 3 (Finale)

This will require you some knowledge of HTML and CSS so that you can easily play around with Tags and Elements.

We are finally grabbing elements from the HTML text we beautified from the last time! This will be the finale! Don’t miss out on the core Web Scraping!

Congratulations on completing your Part 2 series of this journey! What? You didn’t go through Part 2 yet? Worry not, Here’s the link, Quickly go through it, Perform all the instructions, it will not take more than 15 mins:

How to do Web Scraping in Python? | Part 2

You weren’t there in part 1 too? I got you, check this out:

How to do Web Scraping with Python? | Part 1

Also we are following along the GitHub Repository which I created while learning Web Scraping. Here are the notebooks for you:

GitHub - developergaurav-exe/web-scraping-in-python: Web Scraping in Python through BeautifulSoup…

Wow, that was a lot of promotion! I am glad you didn’t just fume over me :’)


We are obsessed with glasses in this series!

We are obsessed with glasses in this series!

This finale part contains:
Grabbing elements from the beautified soup object, we created last time

Now this will require you some knowledge of HTML and CSS so that you can easily play around with Tags and Elements.

This also doesn’t mean that you cannot do web scraping if you are a complete newbie in HTML and CSS.

This guy just made the job easier for you! Watch some parts of this video and you will get the essence! We will require no more than preliminary knowledge about it…


Grabbing elements off of HTML text

So coming back, We wanted to grab elements off of beautified soup which we currently have from our last article.

Syntax for grabbing elements:

soup.select()

soup.select()

Now if we wanted to grab paragraph elements from the HTML text. Then upon running the First command, it will generate an output which will contain list of all paragraph elements.

Second command will give out the first element of the list which is at the index 0.

Third one will simply give the text inside the paragraph tag selected.

                soup.select('p')
soup.select('p')[0]
soup.select('p')[0].get_text()
            
All command in one go!

All command in one go!

Side Activity

Grabbing this Table of Contents’ sub points from: https://en.wikipedia.org/wiki/Jonas_Salk

Wikipedia is open sourced, So we can scrape out the data without any difficulties!

Grabbing Sub points: 1.1, 1.2, 1.3, 3.1, 3.2, 8.1

Grabbing Sub points: 1.1, 1.2, 1.3, 3.1, 3.2, 8.1

See you scraped information out of the Wikipedia article! :)

See you scraped information out of the Wikipedia article! :)

You will have to find out the class which will give out the contents, In my case ‘.toclevel-2’ class contained these sub headings.

So this is the little hidden trick which you will have to solve every time you want to scrape something specific out!

Also, remember classes can change accordingly from IP Address to IP Address. So, You will most probably get to see some another class containing these sub headings.


If you have performed everything till now, and it all worked fine, Then Congratulations! You can now successfully scrape data out from the websites over the internet from your Python Script!

I have not gone explaining too much so that not to overwhelm you and for the sake of simplicity! I am expecting that you know the art of googling! :)

By the Way,

You would have noticed master or main written like this in my terminal:

main written in brackets

main written in brackets

This means my web scraping folder is a git initialized repository. Don’t know what git is? Let me know if you would want an easy explanation about it!

I promise I will explain it in one article! :)

Get to know all about Web Development in this short article:

Want to learn Web Development? Grasp these major Languages.

~Follow Harsh Gaurav for more Technical and Interestingly random content :)


Only registered users can post comments. Please, login or signup.

Start blogging about your favorite technologies and get more readers

Join other developers and claim your FAUN account now!

Avatar

Harsh Gaurav

@developergaurav-exe
Web Developer | Software Developer | Cloud Computing and Cyber Security Enthusiast
Stats
10

Influence

288

Total Hits

3

Posts