SQL is one of the most-sought after tech skills in 2022, and is in high demand for data analysis and data engineering. With the growth of Cloud computing and generation of petabytes of data, SQL has become one of the de-facto programming languages to access and manipulate data stored in the Cloud. In addition, Data Engineers are required to know SQL. And according to Glassdoor.com, the average base pay for a Data Engineer is $102,864 / yr.
Why Should You Listen To Me?
I suppose if you’ve read my bio, chances are that you may have an idea about my background. But, if you don’t, let me briefly summarize. I’m a Master Data Management Engineer and Healthcare IT Subject Matter Expert (SME) and have worked for fortune 500 companies such as GE and Caradigm for the last 10 years to design and build data products utilizing ETL, HL7 v2.x, eMPI, IHE on disparate RDBMS/OS platforms. I’ve worked with terabytes of healthcare data running in clustered environments in Amazon Web Services (AWS) and Microsoft Azure to enable secure, reliable and connected healthcare.
I’ve worked with complex datasets requiring building reliable patient matching applications for statewide data exchange. These large data applications require an understanding of all data characteristics and aspects. Prior to loading and processing large datasets (ETL), the data needs to be loaded into a database, and analyzed for consistency and accuracy and the SQL programming language is a go-to language to understand this data.
Rest assured, you’re in perfect hands
What is a Database?
Applications such as Facebook, Google, and Netflix store information about users and products in data stores know as relational databases. A relational database is made up of collections of objects or relations that store the data. This collection of related objects are stored in a database table. Put differently, a database is a collection of two-dimensional tables. The table is the basic storage structure of a RDBMS. Each table is composed of rows and columns. Data in these rows is manipulated using SQL. The rows represent a single transaction, record or data entry. The rows are column values in a table. The tables are connected to each other using relationships. And that is why they are called Relational Databases.
User → Relational Database Management System (RDBMS) → Tables (relational tables) → Data
In a database table, the columns represent the properties of that data and the column values make up the row(s) in a table. In the same context, a field is the intersection between a row and column. It may or may not contain data (is null or empty).
Relational Database Management Systems (RDBMS)
RDBMS stands for Relational Database Management System — the software that manages relational databases. RDBMS manages the SQL code execution between the database and the computer system application. There are different types of vendors that provide RDBMS software and each flavor of RDMS is slightly different but its implementation is the same.
A few examples of RDMS are :
- Oracle 10g
- Microsoft SQL Server
- Microsoft Access
- Amazon Redshift
- MySQL — MySQL is a widely used, open-source relational database management system (RDBMS).
- IBM DB2
For our purposes, we will use the SQL Server database provided by Microsoft. There are free editions available which I will include in the video description. The same SQL principles should apply to other RDMS software used because SQL is an ANSI standard. All major RDBMS support some flavor of SQL.
Types of SQL Databases
There are two main types of databases.
- Relational databases
A relational database is made up of collections of objects or relations that store the data. Related collection of objects are stored in a database table. Examples of relational databases are SQL Server and IBM DB2.
- NoSQL databases (or non-relational databases)
As the term implies, NoSQL databases store data as documents. They do not have relations. One example of a NoSQL database is Hadoop. The rise of Web 2.0 companies made NoSQL databases very popular due to the storage of unstructured data such as from IoT devices, entertainment and AI/ML. As datasets handled by Internet companies grew bigger in size, a new approach to designing databases came to the fore. The strict schema of a relational database was avoided in favor of a schema-less database. NoSQL databases come in different forms and address different use cases. The spectrum includes :
(1) Key-Value stores (Redis, Amazon Dynamo DB)
(2) Column stores (HBase, Cassandra)
(3) Document stores (MongoDB, Couchbase)
(4) Graph databases (Neo4J)
(5) Search engines (Solr, ElasticSearch, Splunk)
Difference between relational databases and NoSQL databases.
The primary distinction between NoSQL databases and SQL databases is the absence of a rigid schema in the former. Relational databases store structured data which conforms to a rigid schema. Whereas NoSQL databases store unstructured, and semi-structured that have a dynamic schema which fall under the purview of Big Data.
Healthcare System Database Use Case
Let’s take a look at a practical example of a hospital system. When a patient arrives at medical facility seeking medical treatment or for an appointment, a nurse practitioner or the help desk typically checks them in. The patient look up or new patient registration occurs in a healthcare application that stores patient data. When a patient registration or event happens, the Healthcare system assigns what is called a Medical Record Number — MRN. Each patient has a unique MRN in the registration application. A patient can have an insurance id number and an insurance group number. Thus, this establishes a relationship. A patient can have one or more medication(s). Thus, this establishes a relationship.
Healthcare System Tables
A Healthcare system database contains the following sample tables — a Patient table, a Medications table, or an Insurance table. These tables are related through certain columns — relations. A patient can have one insurance and a patient can have one or more medications.
In RDBMS software, the concept of Primary Key (PK) and Foreign Key (FK) denotes the relationship between entities or objects in a database.
The relationships are established through certain common columns. In the case below, a medical record number is a Primary Key (PK) in the Patient table. This PK is a Foreign Key (FK) in the Insurance and Medication table.
Insurance — Patient — Medication
Let’s take a look at a sample Patient table in a healthcare system.