Join us

SQL for Beginners 2022 — A Practical SQL Guide

1_3pxeWvgRK4XZUyx_qjpv-A.jpeg

SQL is one of the most-sought after tech skills in 2022 for data analysis and data engineering.

In this SQL story, we will look at the SQL programming language, the SQL basics and fundamentals in a practical manner. We will start with a brief introduction to the SQL programming language followed by an Overview of Databases and the Relational Database Management System — RDBMS.

This story isn’t death by powerpoint. We will learn by doing. We will install Microsoft SQL Server in a Docker Container to start learning. This installation section will provide you with a brief introduction to Docker.

Docker is an open platform for developing, shipping, and running applications. It provides an isolated environment to run applications.

Once we’re done with the SQL Server installation in Docker, we will then dive into heart of this story — Write SQL queries to access and analyze data.

Using a Healthcare System use case, we will create a database, create tables, and finally insert some data into those tables. This is where the fun begins — performing SQL operations on the data to learn SQL in a practical manner. We will cover SQL syntax, and SQL DML operations used to access, modify or retrieve the data from the database.

At the end of this story, you will be able to speak eloquently about SQL to your colleagues, and during your next interview and perform SQL operations to retrieve, update and insert data into a database.

Why Should I learn SQL?

SQL is one of the most-sought after tech skills in 2022, and is in high demand for data analysis and data engineering. With the growth of Cloud computing and generation of petabytes of data, SQL has become one of the de-facto programming languages to access and manipulate data stored in the Cloud. In addition, Data Engineers are required to know SQL. And according to Glassdoor.com, the average base pay for a Data Engineer is $102,864 / yr.

Why Should You Listen To Me?

I suppose if you’ve read my bio, chances are that you may have an idea about my background. But, if you don’t, let me briefly summarize. I’m a Master Data Management Engineer and Healthcare IT Subject Matter Expert (SME) and have worked for fortune 500 companies such as GE and Caradigm for the last 10 years to design and build data products utilizing ETL, HL7 v2.x, eMPI, IHE on disparate RDBMS/OS platforms. I’ve worked with terabytes of healthcare data running in clustered environments in Amazon Web Services (AWS) and Microsoft Azure to enable secure, reliable and connected healthcare.

I’ve worked with complex datasets requiring building reliable patient matching applications for statewide data exchange. These large data applications require an understanding of all data characteristics and aspects. Prior to loading and processing large datasets (ETL), the data needs to be loaded into a database, and analyzed for consistency and accuracy and the SQL programming language is a go-to language to understand this data.

Rest assured, you’re in perfect hands

What is a Database?

Applications such as Facebook, Google, and Netflix store information about users and products in data stores know as relational databases. A relational database is made up of collections of objects or relations that store the data. This collection of related objects are stored in a database table. Put differently, a database is a collection of two-dimensional tables. The table is the basic storage structure of a RDBMS. Each table is composed of rows and columns. Data in these rows is manipulated using SQL. The rows represent a single transaction, record or data entry. The rows are column values in a table. The tables are connected to each other using relationships. And that is why they are called Relational Databases.

User → Relational Database Management System (RDBMS) → Tables (relational tables) → Data

In a database table, the columns represent the properties of that data and the column values make up the row(s) in a table. In the same context, a field is the intersection between a row and column. It may or may not contain data (is null or empty).

Relational Database Management Systems (RDBMS)

RDBMS stands for Relational Database Management System — the software that manages relational databases. RDBMS manages the SQL code execution between the database and the computer system application. There are different types of vendors that provide RDBMS software and each flavor of RDMS is slightly different but its implementation is the same.

A few examples of RDMS are :

  1. Oracle 10g
  2. Microsoft SQL Server
  3. Microsoft Access
  4. Amazon Redshift
  5. MySQL — MySQL is a widely used, open-source relational database management system (RDBMS).
  6. IBM DB2

For our purposes, we will use the SQL Server database provided by Microsoft. There are free editions available which I will include in the video description. The same SQL principles should apply to other RDMS software used because SQL is an ANSI standard. All major RDBMS support some flavor of SQL.

Types of SQL Databases

There are two main types of databases.

  1. Relational databases

A relational database is made up of collections of objects or relations that store the data. Related collection of objects are stored in a database table. Examples of relational databases are SQL Server and IBM DB2.

  1. NoSQL databases (or non-relational databases)

As the term implies, NoSQL databases store data as documents. They do not have relations. One example of a NoSQL database is Hadoop. The rise of Web 2.0 companies made NoSQL databases very popular due to the storage of unstructured data such as from IoT devices, entertainment and AI/ML. As datasets handled by Internet companies grew bigger in size, a new approach to designing databases came to the fore. The strict schema of a relational database was avoided in favor of a schema-less database. NoSQL databases come in different forms and address different use cases. The spectrum includes :

(1) Key-Value stores (Redis, Amazon Dynamo DB)
(2) Column stores (HBase, Cassandra)
(3) Document stores (MongoDB, Couchbase)
(4) Graph databases (Neo4J)
(5) Search engines (Solr, ElasticSearch, Splunk)

Difference between relational databases and NoSQL databases.

The primary distinction between NoSQL databases and SQL databases is the absence of a rigid schema in the former. Relational databases store structured data which conforms to a rigid schema. Whereas NoSQL databases store unstructured, and semi-structured that have a dynamic schema which fall under the purview of Big Data.

Healthcare System Database Use Case

Let’s take a look at a practical example of a hospital system. When a patient arrives at medical facility seeking medical treatment or for an appointment, a nurse practitioner or the help desk typically checks them in. The patient look up or new patient registration occurs in a healthcare application that stores patient data. When a patient registration or event happens, the Healthcare system assigns what is called a Medical Record Number — MRN. Each patient has a unique MRN in the registration application. A patient can have an insurance id number and an insurance group number. Thus, this establishes a relationship. A patient can have one or more medication(s). Thus, this establishes a relationship.

Healthcare System Tables

A Healthcare system database contains the following sample tables — a Patient table, a Medications table, or an Insurance table. These tables are related through certain columns — relations. A patient can have one insurance and a patient can have one or more medications.

In RDBMS software, the concept of Primary Key (PK) and Foreign Key (FK) denotes the relationship between entities or objects in a database.

The relationships are established through certain common columns. In the case below, a medical record number is a Primary Key (PK) in the Patient table. This PK is a Foreign Key (FK) in the Insurance and Medication table.

Insurance — Patient — Medication

Let’s take a look at a sample Patient table in a healthcare system.

Sample Patients Table (dummy data — not real)

The Patients table above has a primary key column. A Primary Key (denoted as PK) uniquely identifies each row in a table. It allows for fast access and data retrieval because each row is unique to that key. For example, when a patient is registered in a patient registration system, a medical record number often referred to as MRN is assigned. The medical_record_number in the Patient table is the PK. Each patient has a single (one) medical_record_number that uniquely identifies them — the PK. This same identifier column (PK) can be used in the Insurance or Medication table to identify the patients. However, it be a Foreign Key (FK) in these tables. Alternatively, a PK can be made up of more than one column. It can be a combination of two columns. The combination of the medical_record_number and personid columns could also be used to form a Primary Key (PK).

What is SQL?

SQL pronounced ‘SEQUEL’ stands for Structured Query Language. SQL was originally created by two IBM engineers working on querying data on an IBM relational database. SQL allows you to access and manipulate data stored in a database.

SQL is an ANSI standard language for accessing and manipulating data stored in a database. The American National Standards Institute (ANSI) is a private, non-profit organization that administers and coordinates the U.S. voluntary standards and conformity assessment system. Since SQL is an ANSI-standard language, it has been accepted as the standard language for accessing and manipulating data stored in a database.

How To Perform Data Access — SQL to the rescue

To access data in the database, you run SQL commands that interact with the relational database. The table in our example is the Patients table. That table has rows and columns. The columns represent the properties of a Patient such as name, age, date of birth (dob), gender, address and so forth. The Patient table is thus a collection of data representing the same entity — patient.

Our focus in this SQL story will be working with data utilizing SQL in the databases.

Structured Query Language Example

The SQL query below returns the number (count) of all male patients from the Patients table.

SQL Fundamentals

In this section, we will look at the SQL query building blocks and SQL command categories. This section gives you a foundation for writing basic SQL queries to filter, analyze and summarize your data.

The Basic SQL Query Block

A SQL query is made up four basic clauses.

  1. The SELECT identifies the what columns are accessed, or retrieved.
  2. FROM keyword identifies which tables are accessed.
  3. WHERE limit or restricts the SQL query to rows that meet a certain criteria.
  4. ORDER BY sorts the rows

SQL command categories

There are FOUR common categories of SQL query statements.

  1. DDL commands
  2. DML commands
  3. DQL commands
  4. DCL commands

DDL — Data Definition Language

DDL commands are used to :

  • Specify the database schema database structure
  • Create and or modify the structure of database objects

Examples

  • CREATE TABLE — used to build and create tables in a database
  • ALTER TABLE — used to alter the structure of a table in a database

These are a few examples. We will use the CREATE TABLE sql statement later in this story to build our sample table.

DML — Data Manipulation Language

DML (Data Manipulation Language) commands are used to modify the data in the database.

Examples

  • INSERT — used to insert data into a database
  • UPDATE — used to update or modify data in a database
  • DELETE — used to delete data from a database

DQL — Data Query Language

DQL commands are used to access and retrieve data in a database. A main focus of this story will be on the SELECT statement

Examples

  • SELECT statement — use to retrieve data from a database

DCL — Data Control Language

These commands offer privileges or access rights for database user to perform certain actions in a database based on their roles.

Examples

GRANT — gives a user access privileges to the database

REVOKE — removes user access privileges to the database

Now that we’ve learn about SQL and performed a database overview, let’s dive right in and learn by practice.

Setting up your Environment — Getting Started — Install SQL Server Database

This is a practical story. To make it more interactive, we will need to learn by doing — practicing in an environment.

This isn’t death by PowerPoint.

Depending on your Operating System (OS), you will need to install SQL server in different ways. I have included links to download the required software to install SQL Server on :

  • Choose Basic and Accept the License Terms and Privacy statement
  • You’ll need ~ 9 GB of disk space to proceed — click Install
  • Open my other medium article for the rest of the steps — with screenshots included — to Install SSMS
  • Next, click Install SSMS which will take you to the SSMS downloads page

For this story, I will use Microsoft SQL Server running in Docker on a Mac. All the material should work on a Windows and also Linux running SQL Server.

  • VirtualBox — virtualization software that allows you to create a Virtual Machine to run the Windows ISO Image in order to run SQL Server. This is the easier option if you are not familiar with Docker.

How To Install SQL Server on Docker

Refer to my medium article shown below on how to install SQL Server on Docker. It’s fairly straight forward and it should take you less than 10 minutes. https://faun.pub/how-to-install-sql-server-on-docker-cb6eb8e1c657

After a successful installation, you should see the following Azure Data Studio status page pointing to localhost.

Up to this point, we have successfully installed SQL Server on Docker, and installed Azure Data Studio (GUI) in order to interact with the SQL Server 2019 RDBMS. From this point onwards, you should be able to create databases, create tables, insert data into the tables, and query the data inside the tables.

Learn SQL Step by Step — Hands on Tutorial

Before you begin — Prerequisites

Download the attached SQL scripts Create the sample tables and insert data required to the database. Create a new database.

Create a new database on SQL Server Management Studio (SSMS)

  1. In Object Explorer, connect to an instance of the SQL Server Database Engine and then expand that instance.
  2. Right-click Databases, and then click New Database
  3. In New Database, enter a database name — HealthCareDB
  4. To create the database by accepting all default values, click OK; otherwise, continue with the following optional steps.

Create a new database on Azure Data Studio

Azure Data Studio (formerly SQL Operations Studio) is a free GUI management tool that you can use to manage SQL Server on your Mac or PC. You can use it to create and manage databases, write queries, backup and restore databases, and more.

A SQL Server GUI for your Mac — Azure Data Studio

The Azure Data Studio dashboard.

Open Azure Data Studio and Click File → New Query

Set your connection — Connect to the localhost and Execute the SQL query shown below to create a HealthCareDB

After the new database has been created, Go to the file menu and open a new SQL script and execute the following SQL statements in the order provided. The SQL statement will do the following :

After the new database has been created, open a new SQL script and run (execute) the following SQL statements in the order provided. The SQL statement will do the following :

Create the Patient table

Click New Query and Run the SQL query shown below to insert data into the dummy Patients table.

Insert new patient rows into table ‘Patients’

You have now successfully created a sample database, created a sample table and inserted data into that table.

SQL COMMANDS

Here comes the fun part. We will now run various SQL queries for data analysis in our sample database.

SQL SELECT statement

The SQL SELECT statement retrieves data from the database. The basic SELECT statement syntax is as follows :

SQL select — retrieve all records from the Patients table

SQL SELECT TOP

The SQL SELECT TOP limits the number of rows returned from the sql statement. This is very important for large tables where you want to limit the output result in order to avoid impacting the database performance. In MySQL, there is a feature to limit by 10, 50 etc. built into the MySQL Workbench. In SQL Server, its important to set this limit in order to avoid impacting the application performance and disrupt services to the end user. This is because a SELECT * call retrieves all records from a database table and this is a very expensive operative for the database server.

SQL WHERE clause

The WHERE clause limit or restricts the SQL query to rows that meet a certain criteria or condition. The filter criteria can be column names, expressions or comparison operators (more on these later on).

SQL SELECT Distinct

The Distinct keyword returns unique results in a data set and it applies to ALL columns in the SELECT. You can specify columns needed after the DISTINCT qualifier and note that the result does not show duplicate rows. You can select or include multiple columns to return the desired result — include one or more columns.

SQL Order by

Sort rows with the ORDER BY clause in either ASC — ascending order, or DESC — descending order. The ORDER BY clause comes last in the SELECT statement. The ordering by position — You can use a column alias or a number for the column position in the ORDER BY clause

SQL GROUP BY

The GROUP BY statement returns a set of rows to give one result per group. The column(s) groups appear in the SELECT statement.

If you’ve reached this point, you are a rock star!

Enroll In Lifetime Access for FREE to this complete Udemy Course — SQL for Beginners 2022 — Learn SQL Step by Step — for the first 100 students 😃

https://www.udemy.com/course/sql-for-beginners-2022-learn-sql-step-by-step/?couponCode=MEDIUMDATA365

USE THE COUPON — MEDIUMDATA365

Get Lifetime Access for $0 — $19.99 OFF

FREE COUPON - 100 Students❗️COUPON will EXPIRE in 30 days ❗️

Thank you for reading this article. You should now be able to speak eloquently about SQL to your colleagues, and or during your next technical interview and perform SQL operations to retrieve, update and insert data into a database. 🙏


Only registered users can post comments. Please, login or signup.

Start blogging about your favorite technologies, reach more readers and earn rewards!

Join other developers and claim your FAUN account now!

Avatar

Benjamin Sila

@data-engineer-365
Master Data Engineer with data integration, data analytics and data governance experience across major OS/RDBMS platforms
User Popularity
23

Influence

2k

Total Hits

1

Posts