Welcome to SQL 2: Selecting, Updating, and Deleting Data

Welcome to SQL 2: Selecting, Updating, and Deleting Data

Explore the many flavors of SQL data manipulation in part 2 of our series.

    Now that we've gotten the fundamentals of creating databases and tables out of the way, we can start getting into the meat and potatoes of SQL interactions: selecting, updating, and deleting data.

    We'll start with the basic structure of these queries and then break into the powerful operations with enough detail to make you dangerous.

    Selecting Data From a Table

    As mentioned previously, SQL operations have a rather strict order of operations which clauses have to respect in order to make a valid query. We'll begin by dissecting a common SELECT statement:

    SELECT
      column_name_1,
      column_name_2
    FROM
      schema_name.table_name
    WHERE
      column_name_1 = "Value";
    

    This is perhaps the most common structure of SELECT queries. First, we list the names of the columns we'd like to select separated by commas. To receive all columns, we can simply say SELECT *.

    These columns need to come from somewhere, so we specify the table we're referring to next. This either takes a form of FROM table_name (non-PostgreSQL), or FROM schema_name.table_name (PostgreSQL). In theory, a semicolon here would result in a valid query, but we usually want to select rows that meet certain criteria.

    This is where the WHERE clause comes in: only rows which return "true" for our WHERE conditional will be returned. In the above example, we're validating that a string matches exactly "Value".

    Selecting only Distinct Values

    Something that often comes in handy is selecting distinct values in a column. In other words, if a value exists in the same column in 100 rows, running DISTINCT query will only show us that value once. This is a good way of seeing the unique content of a column without yet diving into the distribution of said value. The effect is similar to the United States Senate, or the Electoral College: forget the masses, and prop up Wyoming 2020:

    SELECT DISTINCT column_name 
    FROM table_name;
    

    Offsetting and Limiting Results in our Queries

    When selecting data, the combination of OFFSET and LIMIT are critical at times. If we're selecting from a database with hundreds of thousands of rows, we would be wasting an obscene amount of system resources to fetch all rows at once; instead, we can have our application or API paginate the results.

    LIMIT is followed by an integer, which in essence says "return no more than X results."

    OFFSET is also followed by an integer, which denotes a numerical starting point for returned results, aka: "return all results which occur after the Xth result:"

    SELECT
     *
    FROM
     table_name
    LIMIT 50 OFFSET 0;
    

    The above returns the first 50 results. If we wanted to build paginated results on the application side, we could construct our query like this:

    from SQLAlchemy import engine, session
    
    # Set up a SQLAlchemy session
    Session = sessionmaker()
    engine = create_engine('sqlite:///example.db')
    Session.configure(bind=engine)
    sess = Session()
    
    # Appication variables
    page_number = 3
    page_size = 50
    results_subset = page_number * results limit
    
    # Query
    session.query(TableName).limit(page_size).offset(results_subset)
    

    Such an application could increment page_number by 1 each time the user clicks on to the next page, which would then appropriately modify our query to return the next page of results.

    Another use for OFFSET could be to pick up where a failed script left off. If we were to write an entire database to a CSV and experience a failure. We could pick up where the script left off by setting OFFSET equal to the number of rows in the CSV, to avoid running the entire script all over again.

    Sorting Results

    Last to consider for now is sorting our results by using the ORDER BY clause. We can sort our results by any specified column, and state whether we'd like the results to be ascending (ASC) or descending (DESC):

    SELECT
      *
    FROM
      schema_name.table_name
    WHERE
      column_name_1 = "Value"
    ORDER BY
      updated_date DESC
    LIMIT 50 OFFSET 10;
    

    Sophisticated SELECT Statements

    Of course, we can select rows with WHERE logic that goes much deeper than an exact match. One of the most versatile of these operations is LIKE.

    Using Regex with LIKE

    LIKE is perhaps the most powerful way to select columns with string values. With LIKE, we can leverage regular expressions to build highly complex logic. Let's start with some of my favorites:

    SELECT
      *
    FROM
      people
    WHERE
      name LIKE "%Wade%";
    

    Passing a string to LIKE with percentage signs on both sides is essentially a "contains" statement. % is equivalent to a wildcard, thus placing % on either side of our string will return true whether the person's first name, middle name, or last name is Wade. Check out other useful combinations for %:

    Finding Values which are NOT LIKE

    The opposite of LIKE is of course NOT LIKE, which runs the same conditional, but returns the opposite true/false value of LIKE:

    SELECT
      *
    FROM
      people
    WHERE
      name NOT LIKE "%Wade%";
    

    Conditionals With DateTime Columns

    DateTime columns are extremely useful for selecting data. Unlike plain strings, we can easily extract numerical values for month, day, and year from a DateTime by using MONTH(column_name), DAY(column_name), and YEAR(column_name) respectively. For example, using MONTH() on a column that contains a DateTime of 2019-01-26 05:42:34 would return 1, aka January. Because the values come back as integers, it is then trivial to find results within a date range:

    SELECT 
      * 
    FROM 
      posts 
    WHERE YEAR(created_at) < 2018;
    

    Finding Rows with NULL Values

    NULL is a special datatype which essentially denotes the "absence of something," therefore no conditional will never equal NULL. Instead, we find rows where a value IS NULL:

    SELECT 
      * 
    FROM 
      posts 
    WHERE author IS NULL;
    

    This should not come as a surprise to anybody familiar with validating datatypes.

    The reverse of this, of course, is NOT NULL:

    SELECT 
      * 
    FROM 
      posts 
    WHERE author IS NOT NULL;
    

    Inserting Data

    An INSERT query creates a new row, and is rather straightforward: we state the columns we'd like to insert data into, followed by the values to insert into said columns:

    INSERT INTO table_name (column_1, column_2, column_3)
    VALUES ("value1", "value2", "value3");
    

    Many things could result in a failed insert. For one, the number of values must match the number of columns we specify; if we don't we've either provided too few or too many values.

    Second, vales must respect a column's data type. If we try to insert an integer into a DateTime column, we'll receive an error.

    Finally, we must consider the keys and constraints of the table. If keys exist that specify certain columns must not be empty, or must be unique, those keys must too be respected.

    As a shorthand trick, if we're inserting values into all of a table's columns, we can skip the part where we explicitly list the column names:

    INSERT INTO table_name
    VALUES ("value1", "value2", "value3");
    

    Here's a quick example of an insert query with real data:

    INSERT INTO friends (id, name, birthday) 
    VALUES (1, 'Jane Doe', '1990-05-30');
    

    UPDATE Records: The Basics

    Updating rows is where things get interesting. There's so much we can do here, so let's work our way up:

    UPDATE table_name 
    SET column_name_1 = 'value' 
    WHERE column_name_2 = 'value';
    

    That's as simple as it gets: the value of a column, in a row that matches our conditional. Note that SET always comes before WHERE. Here's the same query with real data:

    UPDATE celebs 
    SET twitter_handle = '@taylorswift13' 
    WHERE id = 4;
    

    UPDATE Records: Useful Logic

    Joining Strings Using CONCAT

    You will find that it's common practice to update rows based on data which already exists in said rows: in other words, sanitizing or modifying data. A great string operator is CONCAT(). CONCAT("string_1", "string_2") will join all the strings passed to a single string.

    Below is a real-world example of using CONCAT() in conjunction with NOT LIKE to determine which post excerpts don't end in punctuation. If the excerpt does not end with a punctuation mark, we add a period to the end:

    UPDATE
      posts
    SET 
      custom_excerpt = CONCAT(custom_excerpt, '.')
    WHERE
      custom_excerpt NOT LIKE '%.'
      AND custom_excerpt NOT LIKE '%!'
      AND custom_excerpt NOT LIKE '%?';
    

    Using REPLACE

    REPLACE() works in SQL as it does in nearly every programming language. We pass REPLACE() three values:

    1. The string to be modified.
    2. The substring within the string which will be replaced.
    3. The value of the replacement.

    We can do plenty of clever things with REPLACE(). This is an example that changes the featured image of blog posts to contain the “retina image” suffix:

    UPDATE
      posts
    SET
      feature_image = REPLACE(feature_image, '.jpg', '@2x.jpg');
    

    Scenario: Folder Structure Based on Date

    I across a fun exercise the other day when dealing with a nightmare situation involving changing CDNs. It touches on everything we’ve reviewed thus far and serves a great illustration of what can be achieved in SQL alone.

    The challenge in moving hundreds of images for hundreds of posts came in the form of a file structure. Ghost likes to save images in a dated folder structure, like 2019/02/image.jpg. Our previous CDN did not abide by this at all, so had a dump of all images in a single folder. Not ideal.

    Thankfully, we can leverage the metadata of our posts to discern this file structure. Because images are added to posts when posts are created, we can use the created_at column from our posts table to figure out the right dated folder:

    UPDATE
      posts
    SET
      feature_image = CONCAT("https://cdn.example.com/posts/", 
    	YEAR(created_at),
    	"/", 
    	LPAD(MONTH(created_at), 2, '0'), 
    	"/",
    	SUBSTRING_INDEX(feature_image, '/', - 1)
      );
    

    Let's break down the contents in our CONCAT:

    The result for every image will now look like this:

    https://cdn.example.com/posts/2018/02/image.jpg
    

    DELETE Records

    Let's wrap up for today with our last type of query, deleting rows:

    DELETE FROM celebs 
    WHERE twitter_handle IS NULL;
    
    Todd Birchard's' avatar
    New York City Website
    Engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.

    Engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.