SQL Essentials
Introduction
Structured Query Language (SQL) empowers data analysts to communicate with their databases.
- While spreadsheets (e.g. Microsoft Excel and Google Sheets) offer data storage and management, databases excel at handling larger and more complex datasets.
Popular SQL programs include MySQL, Microsoft SQL Server and BigQuery.
SQL Statements
Database actions are primarily performed using SQL statements, which consist of easy-to-understand keywords like:
Database
- CREATE DATABASE - Creates a new database.
- BACKUP DATABASE - Creates a backup of an existing database
- DROP DATABASE - Deletes an existing database.
- CREATE TABLE - Creates a new table within a database.
- ALTER TABLE - Modifies an existing table.
- DROP TABLE - Deletes a table from a database.
- TRUNCATE TABLE - Delete the data inside a table, but not the table itself.
- SELECT - Extracts data from a database
- SELECT DISTINCT - Returns only distinct (different) values
- UPDATE - Updates existing data in a database.
- DELETE - Deletes data from a database.
- INSERT INTO - Inserts new data into a database.
The semicolon (;) at the end of each SQL statement terminates it.
- While part of the ANSI SQL-92 standard, not all SQL databases require it.
NOTE: SQL keywords are not case-sensitive (e.g., select is the same as SELECT).
Database
The CREATE DATABASE statement is used to create a new SQL database.
- CREATE DATABASE databasename;
The DROP DATABASE statement is used to delete an existing SQL database.
- DROP DATABASE databasename;
The BACKUP DATABASE statement is used in SQL Server to create a full backup of an existing SQL database.
- BACKUP DATABASE databasename
TO DISK = 'filepath';
Create Table
The CREATE TABLE statement is used to create a new table in a database.
- CREATE TABLE table_name (
column1 datatype constraint,
column2 datatype constraint,
column3 datatype constraint,
...
);
OR
- CREATE TABLE new_table_name AS
SELECT column1, column2,...
FROM existing_table_name
WHERE ....;
Key points
- The column parameters specify the names of the columns of the table.
- The datatype parameter specifies the type of data the column can hold (e.g. varchar, integer, date, etc.).
- Constraints (optional) are used to ensure the accuracy and reliability of the data by limiting the type of data that can be stored in a column or table. Common types of constraints include NOT NULL, UNIQUE, PRIMARY KEY and DEFAULT.
- As illustrated in the second example, you can also create a copy of an existing table, including the existing values from the old table.
Delete or Truncate Table
The DROP TABLE statement is used to drop an existing table in a database.
- DROP TABLE table_name;
The TRUNCATE TABLE statement is used to delete the data inside a table, but not the table itself.
- TRUNCATE TABLE table_name;
Alter Table
The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.
Add column
- ALTER TABLE table_name
ADD column_name datatype;
Delete column
- ALTER TABLE table_name
DROP COLUMN column_name;
Rename Column
- ALTER TABLE table_name
RENAME COLUMN old_name to new_name;
- ALTER TABLE table_name
MODIFY COLUMN column_name datatype;
Insert Data
The INSERT INTO statement is used to add new rows of data into a database table.
- INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...),
(value4, value5, value6, ...),
(value7, value8, value9, ...);
- You can omit column names (use with caution), but the order of the values you provide must strictly match the order of the columns in the table.
- If you are inserting data into specific columns and not all of them, you can only specify the relevant column names in the query.
- Each rows of data is enclosed in parentheses and separated by commas.
Filtering and Updating Data
The UPDATE statement is used to modify existing data in a database table, but use it very carefully.
- UPDATE table_name
SET column1=value1, column2 = value 2, ...
WHERE condition;
Caution:
- If the WHERE clause is omitted, all rows in the table will be updated, which can cause significant and unintended data loss.
Deleting Data
The DELETE statement allows you to permanently remove existing records from a database table. Use it with extreme caution, as deleted data cannot be easily recovered.
- DELETE FROM table_name
WHERE condition;
Caution:
- Omitting the WHERE clause can accidentally delete your entire table, resulting in unrecoverable data loss.
- Back up your database regularly before running any DELETE statements on your main database.
Query
A query is a request for specific data or information from a database. Every SQL query follows the same basic syntax:
- Use SELECT to choose the columns you want to return.
- Separate each column with a comma.
- SELECT * means select all columns from a table
- Use FROM to specify the table(s) containing the desired columns.
- The dataset name (if applicable) is followed by a dot (.) and then the table name.
- Use WHERE (optional) to filter the results based on specific criteria.
- Text values in the WHERE clause require single quotes, while numeric values should not be quoted.
- The LIKE operator uses wildcards to match patterns.
- Percent sign (%) or asterisk (*) - Matches zero, one or more characters.
- Underscore sign (_) - Matches a single character.
- Square brackets ([]) - Matches a single character from a specified set of characters within the brackets (e.g. [abc] matches "a","b" or "c").
- SQL also supports logical operators like AND, OR, IN, BETWEEN, NOT, IS NULL and IS NOT NULL for complex filtering. You can use parentheses to group conditions and control the order of evaluation.
- Use ORDER BY to sort the query results in ascending or descending order.
- DESC keyword: Sorts records in descending order.
- Comma (,) allows sorting by multiple columns.
Aggregate Functions
Aggregate functions process multiple rows of data from a table and return a single value that summarizes the data. They are incredibly useful for gaining insights from large datasets.
- SELECT column_to_group_by, aggregate_function(column_name) AS alias_name
FROM table_name
GROUP BY column_to_group_by;
Key points
- Common aggregate functions include COUNT, SUM, AVG, MAX, MIN.
- COUNT(*) counts all rows in a table including those with null values. COUNT(column_name) counts only the rows where the specified column_name has a value.
- column_to_group_by is optional, but it allows you to summarize data for each group.
- Use the AS keyword (Alias) (optional) to give the result of the aggregate function a descriptive name, enhancing readability of your queries.
- The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.
SQL Joins
Most databases store data in multiple interrelated tables.
- This structured approach allows for efficient data organization and avoids redundancy.
- However, to gain valuable insights, you often need to combine data from these related tables.
There are various types of joins used for different scenarios.
- Inner Join - most common
- Left Join
- Right Join
- Full Join
Example: Combining Orders and Customers
Imagine you have an Orders table and a Customers table. To retrieve customer names along with their order details, you would use an INNER JOIN on the CustomerID field (common column) to combine data from both tables. This query retrieves the OrderID, CustomerName and OrderDate for each order.
- SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Union Operator
The UNION operator is used to combine the result-set of two or more SELECT statements.
- SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
Key points
- Every SELECT statement within UNION must have the same number of columns.
- To allow duplicate values, use UNION ALL.
Comments
Comments are text explanations embedded within SQL to improve readability.
- They are placed between certain characters, /* and */ (multiple line comments), or after two dashes (--) (single line comments).
Summary
SQL queries primarily use SELECT, FROM, and WHERE to specify the data to be retrieved.
- Capitalization, indentation, and semicolons enhance readability and comments can further explain your queries for others.
Comments
Post a Comment