Scraping Twitter Data with an API and Python

1_V3hZ9Q82ya69ged2VrD9BQ.png

The Twitter API provides companies, developers, and users with programmatic access to Twitter’s vast amount of public data. Twitter defines its platform as “what’s happening in the world and what people are talking about right now.”

How to use Twitter’s API to capture tweet data with Python.

What is an API?

API (Application Programming Interface) is a software interface that allows two applications to interact with each other without interaction with a user. An API provides a way for a developer to request services from an operating system or other application and expose data within different contexts and across multiple platforms.

How does an API Work?

An API communicates through a set of rules that define how applications and computers communicate with each other. In other words, an API acts as a middleman between any two machines that want to connect with each other for a specific task.

For example, suppose you want to incorporate a map of your location on your website. Google’s platform provides an API that allows authorized users to access and retrieve maps from their sites. When your website is launched, you are connected to the Google platform where you are authenticated, and a map is retrieved and sent to your website where it is displayed.

The Twitter API provides companies, developers, and users with programmatic access to Twitter’s vast amount of public data. Twitter defines its platform as “what’s happening in the world and what people are talking about right now.” Twitter currently has 396.5 million users. 206 million users access Twitter daily and 75% of them are not based in the United States.

If you would like a more detailed description of the Twitter API go to this link.

Setup a Twitter Developer Account

You are required to set up a Twitter developer account prior to using their API.

Listed below are the steps to set up a Twitter developer account.

1. Create a personal Twitter account. If you already have a Twitter account, you can skip this step.

1_t6RdULy3dHr8shT14PbFQg.png

Image by Michael Galarnyk (source)

2. Go to the Developer Platform from this link and click the Sign up button.

3. Answer a few questions and click the Next button.

4. Review and accept the developer agreement and then click Submit Application.

It usually takes two or three days for your application to be reviewed and approved by the Twitter Developer Team. Twitter may request additional information from you about your intended use of the API prior to approval. Be very specific and detailed in your responses to twitter inquiries.

It’s time to verify your email address.

5. Twitter sends you an email confirmation message. Click Confirm your email.

6. After confirming your email address, enter a name for your app and click Get Keys.

7. A screen will display your API Keys and Bearer Token (I have omitted displaying the API keys and token for security purposes). You can generate your access key tokens in the Developers Portal under the project app by clicking the key icon and then clicking Generate in the Access Token and Secret section. Copy and keep these keys in a secure location as you will need them when you do web scraping. If you forget to copy the keys, you can always generate new ones in the Developer Portal.

The Program

The remainder of this article will present a python program that will use the Twitter API to capture and store Twitter data. The following steps will be performed.

Install the Package and Libraries.

                # Install Tweepy.
pip install tweepy
# Import the libraries.
import os
import tweepy as tw
import pandas as pd
import numpy as np
            

Assign your Twitter Developer APIs and access tokens to variables.

                api_key = 'API Key Here' 
api_secret = 'API Secret Key Here'
access_token = 'Access Token Here'
access_token_secret = 'Access Token Secret Here'
            

Authenticate the APIs and Access Tokens.

                auth = tw.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
            

Define the search words and start date as variables.

                search_words = "LA rams"
date_since = "2022-01-01"
            

Collect the tweets using the search words and start date.

                tweets = tw.Cursor(api.search,
              q = search_words,
              lang = "en",
              since = date_since).items(5)
            

Define a function to get Twitter users, tweet times, and tweet text for tweets that match the search criteria and create a new data frame.

                def get_related_tweets(key_word):
    twitter_users = []
    tweet_time = []
    tweet_string = [] 
    for tweet in tw.Cursor(api.search,q=key_word, count=1000).items(1000):
            if (not tweet.retweeted) and ('RT @' not in tweet.text):
                if tweet.lang == "en":
                    twitter_users.append(tweet.user.name)
                    tweet_time.append(tweet.created_at)
                    tweet_string.append(tweet.text)
                    
    df = pd.DataFrame({'name':twitter_users, 'time': tweet_time, 'tweet': tweet_string})
    df.to_csv(f"{key_word}.csv")
    return df
            

Call the function to get related tweets and display a few tweets.

                df = get_related_tweets("LA Rams")
df.head(7)
            

Once you have the data in a structured format, you can easily use it in your machine learning projects and analysis. Twitter text data is ideal for Natural Language Processing and sentiment analysis on a wide variety of topics.

Putting it all together…

                # Install Tweepy
pip install tweepy
# Import the libraries.
import os
import tweepy as tw
import pandas as pd
import numpy as np

# Assign your Twitter Developer API’s and Access Tokens to variables.
api_key= 'API Key Here' 
api_secret= 'API Secret Key Here'
access_token= 'Access Token Here'
access_token_secret= 'Access Token Secret Here'

# Authenticate the API’s and Access Tokens.
auth = tw.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

# Define the search words and start date as variables. 
search_words = "LA rams"
date_since = "2022-01-01"

# Collect the tweets using the search words and start date.
tweets = tw.Cursor(api.search,
              q = search_words,
              lang = "en",
              since = date_since).items(5)

# Define a function to get twitter users, tweet times and tweet text for tweets that match the search criteria and create a new data frame.
def get_related_tweets(key_word):
    twitter_users = []
    tweet_time = []
    tweet_string = [] 
    for tweet in tw.Cursor(api.search,q=key_word, count=1000).items(1000):
            if (not tweet.retweeted) and ('RT @' not in tweet.text):
                if tweet.lang == "en":
                    twitter_users.append(tweet.user.name)
                    tweet_time.append(tweet.created_at)
                    tweet_string.append(tweet.text)
                    
    df = pd.DataFrame({'name':twitter_users, 'time': tweet_time, 'tweet': tweet_string})
    df.to_csv(f"{key_word}.csv")
    return df

# Call the function to get related tweets and display a few tweets.
df = get_related_tweets("LA Rams")
df.head(7)
            

Thanks so much for reading my article! If you have any comments or feedback please add them below.

If you enjoy reading stories like these and want to support me as a writer, consider signing up to become a Medium member. Membership gives you unlimited access to all articles on Medium. You can sign up using this link https://medium.com/@dniggl/membership

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Check out our Community Discord and join our Talent Collective.


Only registered users can post comments. Please, login or signup.

Start blogging about your favorite technologies and get more readers

Join other developers and claim your FAUN account now!

Avatar

Dennis Niggl

Consultant, Business Automation, Inc

@dennisniggl
Machine Learning / Predictive Analytics. Passionate about how machines learn and make accurate predictions about the future.
Stats
10

Influence

364

Total Hits

1

Posts