Join us
@developergaurav-exe ă» Jul 12,2022 ă» 2 min read ă» 317 views
Importing packages in Python Notebook.
Congratulations on completing your Part 1 series of this journey! What? You didnât go through Part 1 yet? Worry not, Hereâs the link, Quickly go through to install necessary packages to proceed through this article, it will take not more than 5 mins:
Wow! You came back so early! :)
I know you were never gone :p
Anyways, Letâs start by importing packages in Python Notebook now!
Why use a notebook and not a plain python script?
Because it gives much flexibility over python script and you can observe your output anytime and anywhere!
I usually prefer this and It is my personal opinion, but you are free to use anything. It is pure python code and it will work everywhere!
Scrapping friendly website used: https://example.com
All notebooks are available on my GitHub repository! Check them out if you feel stuck anywhere! :)
More Scraping friendly sites are:
Example.com website landing page:
Check if you donât get any firewall issues or errors while running below statements. Disabling firewall can do the trick if your firewall blocked this request.
Quickly generating get request on the Scrapping friendly website which is example.com.
import requests
result = requests.get('https://example.com')
type(result)
If your code ran smoothly, Congratulations! Letâs see what is stored in result! :)
result.text
# shows all the HTML content in kind of an ugly manner
This need to be beautified!
Our newly installed bs4 package a.k.a Beautiful Soup and lxml package will now come into the Picture!
It will be yummy I guarantee you! :)
import bs4
soup = bs4.BeautifulSoup(result.text, 'lxml')
type(soup)
soup
BeautifulSoup is a class in bs4 package and we are creating soup object of this class by providing it with two values, result.text (Remember this was our Ugly looking non readable HTML text).
Sorry âresult.textâ, but that was the truth, I cannot understand you.
So we provide it with lxml, a beautifying engine required to beautify HTML data. This is given as string while making soup object of BeautifulSoup class.
Now, this is what soup looks like!
Looks yummy, Right? :)
If you have performed everything till now, and it all worked fine, Then Congratulations! You can now successfully request and beautify HTML data over the internet from your Python Script!
Following up next with Grabbing Elements from HTML data, to scrape our required information:
This part contains core Web Scraping, So donât miss your chance to learn something new! :)
~Follow Harsh Gaurav for more Technical and Interestingly random content :)
Join other developers and claim your FAUN account now!
Influence
Total Hits
Posts
Only registered users can post comments. Please, login or signup.