How to Create a Database on AWS RDS using Sagemaker with Mysql-connector in python

1_4l8lkFS83fqBfRrDw0Wegg.jpeg

In this post I’ll explain how you can create a RDS database using Sagemaker with python and mysql-connector library, and also show you how you can create a table and populate with records on the rds db using a sample csv.

Create a Database/Table on AWS RDS using Sagemaker with python mysql-connector library

In this post I’ll explain how you can create a RDS database using Sagemaker with python and mysql-connector library, and also show you how you can create a table and populate with records on the rds db using a sample csv.

First thing first, you need to create a database instance with a Mysql engine, for people new to AWS analytics tools, Amazon RDS is a managed relational database service for MySQL, PostgreSQL, MariaDB, Oracle BYOL, or SQL Server.

I wont be going into the step by step details of creating a RDS database you can visit here to see how you can create a RDS database using your desired engine, a sample snapshot of a new database with mysql engine can be seen below. Note the database identifier and the engine.

RDS database instance

RDS database instance

The next step is for us to copy the host (endpoint) of the database we created as we’ll be making use of it in our code. Click on the database identifier to show you the details of your db. An example can be seen below

RDS instance Endpoint

RDS instance Endpoint

Now lets lunch Sagemaker to access a notebook instance. A notebook instance is a machine learning (ML) compute instance that helps you run Jupyter Notebook App and run your python codes. You can learn how to create a notebook instance here. A snap shot of an already created instance can be seen below. Click on open jupyter to lunch your Jupyter notebook.

Sagemaker notebook instance

Sagemaker notebook instance

Now lets get to creating a database

To start with we need to install mysql connector library. MySQL-Connector enables Python programs to access MySQL databases. Click on one of the code cell block type the following code

                ##install library
! pip install mysql-connector
            

Next we import the library and insert the credentials. This is where we make use of the database host(endpoint) url we copied initially. Make sure it matches with your RDS instance URL, Username and passwords. Run the cell below to create the connection

                import mysql.connector, csv
## Make sure the Host matches your RDS instance
mydb = mysql.connector.connect(
  host="inventory.carmijfsdgsr.us-west-2.rds.amazonaws.com", 
  user="XXXXX", the password when creating your rds database
  passwd="XXXXXXX"
)
cursor = mydb.cursor()
            

Now let's create a database. Run the cursor.execute method which executes the given database operation (query or command)

                cursor.execute(“CREATE DATABASE IF NOT EXISTS inventory_database”)
            

We have successfully created the inventory_database

Now let create a table on the database using the create table statement and specifying the column datatypes and execute the command.

                mydb.database = “inventory_database”
cursor.execute(“CREATE TABLE IF NOT EXISTS product_table_qs (ProductName varchar(50),Description varchar(255),ProductSku bigint(12),QuantityInStock int(5),ReorderLevel int(5),WholeSaleCost decimal(8,2),RetailCost decimal(8,2));”)
            

We have successfully created a table product_table_qs on the inventory_database. Now we can populate the table with a sample record using a csv file. For this article I’ve uploaded a sample csv file to my jupyter notebook rootfolder called lab2_inventory

CSV files on Rootfolder

CSV files on Rootfolder

Run the code cell block below to populate the table product_table_qs with records from the csv file.

                print(“Database setup complete.”)
with open(‘./data_files/Lab2_Inventory.csv’) as datafile:
 csv_data = csv.reader(datafile)
 next(csv_data) 
 for row in csv_data:
 cursor.execute (“INSERT INTO product_table_qs (ProductName,Description,ProductSku,QuantityInStock,\
 ReorderLevel,WholeSaleCost,RetailCost) VALUES (%s, %s, %s, %s, %s, %s, %s)”,row)
 mydb.commit()
print(“Success! Data inserted”)
mydb.close()
            

Now you have successfully populated the table product_table_qs with some records

Now let's read data from the table created on the inventory_database and store into a pandas dataframe.

Run the code cell block below

                cursor = mydb.cursor()
mydb.database = “inventory_databse”
mySql_select_Query=”select * from product_table_qs “
result = cursor.execute(mySql_select_Query)
queryFetch = cursor.fetchall()
queryResults = pd.DataFrame.from_records(queryFetch, columns = [i[0] for i in cursor.description])
queryResults.head(10)
            

Your output should resemble below

dataframe

dataframe

Cool stuff right!!

In summary, this article has explained how you can can create an RDS database/table with sagemaker using mysql-connector, as well as populating the table with a sample records from a CSV file and reading the table into a pandas dataframe.

Happy reading. Don't forget to clap and share comments!!!

Connect with me on Twitter

Connect with me on linkedin


Only registered users can post comments. Please, login or signup.

Start blogging about your favorite technologies and get more readers

Join other developers and claim your FAUN account now!

Avatar

Favour Oyelami

Data Scientist

@oyelamifiyin
An experienced Data Scientist passionate about build solving impactful business problems with analytics and machine learning Email:oyelamifiyin@gmail.com
Stats
19

Influence

537

Total Hits

1

Posts