SQL Essentials

Introduction

Structured Query Language (SQL) empowers data analysts to communicate with their databases.

  • While spreadsheets (e.g. Microsoft Excel and Google Sheets) offer data storage and management, databases excel at handling larger and more complex datasets.

Popular SQL programs include MySQL, Microsoft SQL Server and BigQuery.



SQL Statements

Database actions are primarily performed using SQL statements, which consist of easy-to-understand keywords like:

Database

  • CREATE DATABASE - Creates a new database.
  • BACKUP DATABASE - Creates a backup of an existing database
  • DROP DATABASE - Deletes an existing database.
Table
  • CREATE TABLE - Creates a new table within a database.
  • ALTER TABLE - Modifies an existing table.
  • DROP TABLE - Deletes a table from a database.
  • TRUNCATE TABLE - Delete the data inside a table, but not the table itself.
Data
  • SELECT - Extracts data from a database
  • SELECT DISTINCT - Returns only distinct (different) values
  • UPDATE - Updates existing data in a database.
  • DELETE - Deletes data from a database.
  • INSERT INTO - Inserts new data into a database.

The semicolon (;) at the end of each SQL statement terminates it.

  • While part of the ANSI SQL-92 standard, not all SQL databases require it.

NOTE: SQL keywords are not case-sensitive (e.g., select is the same as SELECT).



Database

The CREATE DATABASE statement is used to create a new SQL database.

  • CREATE DATABASE databasename;

The DROP DATABASE statement is used to delete an existing SQL database.

  • DROP DATABASE databasename;

The BACKUP DATABASE statement is used in SQL Server to create a full backup of an existing SQL database.

  • BACKUP DATABASE databasename
    TO DISK = 'filepath';



Create Table

The CREATE TABLE statement is used to create a new table in a database.

  • CREATE TABLE table_name (
    column1 datatype constraint,
    column2 datatype constraint,
    column3 datatype constraint,
    ...
    );
Create Table in SQL

OR

  • CREATE TABLE new_table_name AS
    SELECT column1, column2,...
    FROM existing_table_name
    WHERE ....;

Key points

  • The column parameters specify the names of the columns of the table.
  • The datatype parameter specifies the type of data the column can hold (e.g. varchar, integer, date, etc.).
  • Constraints (optional) are used to ensure the accuracy and reliability of the data by limiting the type of data that can be stored in a column or table. Common types of constraints include NOT NULL, UNIQUE, PRIMARY KEY and DEFAULT.
  • As illustrated in the second example, you can also create a copy of an existing table, including the existing values from the old table.



Delete or Truncate Table

The DROP TABLE statement is used to drop an existing table in a database.

  • DROP TABLE table_name;

The TRUNCATE TABLE statement is used to delete the data inside a table, but not the table itself.

  • TRUNCATE TABLE table_name;



Alter Table

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.

Add column

  • ALTER TABLE table_name
    ADD column_name datatype;

Delete column

  • ALTER TABLE table_name
    DROP COLUMN column_name;

Rename Column

  • ALTER TABLE table_name
    RENAME COLUMN old_name to new_name;
Modify Datatype
  • ALTER TABLE table_name
    MODIFY COLUMN column_name datatype;



Insert Data

The INSERT INTO statement is used to add new rows of data into a database table.

  • INSERT INTO table_name (column1, column2, column3, ...)
    VALUES (value1, value2, value3, ...),
    (value4, value5, value6, ...),
    (value7, value8, value9, ...);

Key points
  • You can omit column names (use with caution), but the order of the values you provide must strictly match the order of the columns in the table.
  • If you are inserting data into specific columns and not all of them, you can only specify the relevant column names in the query.
  • Each rows of data is enclosed in parentheses and separated by commas.



Filtering and Updating Data

The UPDATE statement is used to modify existing data in a database table, but use it very carefully.

  • UPDATE table_name
    SET column1=value1, column2 = value 2, ...
    WHERE condition;

UPDATE in SQL

Caution:

  • If the WHERE clause is omitted, all rows in the table will be updated, which can cause significant and unintended data loss.



Deleting Data

The DELETE statement allows you to permanently remove existing records from a database table. Use it with extreme caution, as deleted data cannot be easily recovered.

  • DELETE FROM table_name
    WHERE condition;

Caution:

  • Omitting the WHERE clause can accidentally delete your entire table, resulting in unrecoverable data loss.
  • Back up your database regularly before running any DELETE statements on your main database.



Query

A query is a request for specific data or information from a database. Every SQL query follows the same basic syntax:

  • Use SELECT to choose the columns you want to return.
    • Separate each column with a comma.
    • SELECT * means select all columns from a table
  • Use FROM to specify the table(s) containing the desired columns.
    • The dataset name (if applicable) is followed by a dot (.) and then the table name.
  • Use WHERE (optional) to filter the results based on specific criteria.
    • Text values in the WHERE clause require single quotes, while numeric values should not be quoted.
    • The LIKE operator uses wildcards to match patterns.
      • Percent sign (%) or asterisk (*) - Matches zero, one or more characters.
      • Underscore sign (_) - Matches a single character.
      • Square brackets ([]) - Matches a single character from a specified set of characters within the brackets (e.g. [abc] matches "a","b" or "c").
    • SQL also supports logical operators like AND, OR, IN, BETWEEN, NOT, IS NULL and IS NOT NULL for complex filtering. You can use parentheses to group conditions and control the order of evaluation.

    Where Clause
    SQL Query
    • Use ORDER BY to sort the query results in ascending or descending order.
      • DESC keyword: Sorts records in descending order.
      • Comma (,) allows sorting by multiple columns.

    ORDER BY in SQL



    Aggregate Functions

    Aggregate functions process multiple rows of data from a table and return a single value that summarizes the data. They are incredibly useful for gaining insights from large datasets.

    • SELECT column_to_group_by, aggregate_function(column_name) AS alias_name
      FROM table_name
      GROUP BY column_to_group_by;

    Key points

    • Common aggregate functions include COUNT, SUM, AVG, MAX, MIN.
    • COUNT(*) counts all rows in a table including those with null values. COUNT(column_name) counts only the rows where the specified column_name has a value.
    • column_to_group_by is optional, but it allows you to summarize data for each group.
    • Use the AS keyword (Alias) (optional) to give the result of the aggregate function a descriptive name, enhancing readability of your queries.
    • The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.

    Agreggate Function in SQL



    SQL Joins

    Most databases store data in multiple interrelated tables.

    • This structured approach allows for efficient data organization and avoids redundancy.
    • However, to gain valuable insights, you often need to combine data from these related tables.
    Joins allow you to combine data from multiple tables based on a common column that establishes a relationship between them. This combined data is then presented in a single result set for further analysis.

    There are various types of joins used for different scenarios.

    • Inner Join - most common
    • Left Join
    • Right Join
    • Full Join

    Joins in SQL

    Example: Combining Orders and Customers

    Imagine you have an Orders table and a Customers table. To retrieve customer names along with their order details, you would use an INNER JOIN on the CustomerID field (common column) to combine data from both tables. This query retrieves the OrderID, CustomerName and OrderDate for each order.

    • SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
      FROM Orders
      INNER JOIN
      Customers
      ON Orders.CustomerID=Customers.CustomerID;



    Union Operator

    The UNION operator is used to combine the result-set of two or more SELECT statements.

    • SELECT column_name(s) FROM table1
      UNION
      SELECT column_name(s) FROM table2;

    Key points

    • Every SELECT statement within UNION must have the same number of columns.
    • To allow duplicate values, use UNION ALL.

    SQL UNION



    Comments

    Comments are text explanations embedded within SQL to improve readability.

    • They are placed between certain characters, /* and */ (multiple line comments), or after two dashes (--) (single line comments).

    SQL Query



    Summary

    SQL queries primarily use SELECT, FROM, and WHERE to specify the data to be retrieved.

    • Capitalization, indentation, and semicolons enhance readability and comments can further explain your queries for others.



    External Links

    Comments