Databases in Python Made Easy with SQLAlchemy

When we think about software interacting with databases directly, many envision a similar pattern of managing connections, explicitly committing queries, and a similarly tiresome boilerplate. Python libraries like PyMySQL and Psycopg2 perfectly fit this paradigm and uphold the status quo of what we've come to accept. SQLAlchemy has become one of Python's most iconic libraries by abstracting the mundane and trivializing databases in software.

It is also home to the most piss poor documentation for a Python library I've ever had the misfortune of navigating.

SQLAlchemy's API offers a better, simpler, and faster way to work with relational databases. This is useless to those attempting to make sense of the esoteric jargon masquerading as SQLAlchemy's documentation that slaps you across the face and says, "I don't give a shit about you."

I'm here to say I do give a shit about you and would like nothing more than to express these feelings in the form of an SQLAlchemy tutorial.

SQL or ORM?

SQLAlchemy can serve two purposes: making SQL interactions easier and serving as a full-blown ORM. Even if you aren't interested in implementing an ORM (or don't know what an ORM is), SQLAlchemy is invaluable as a tool to simplify connections to SQL databases and perform raw queries. I thought that was an original statement as I wrote it, but it turns out SQLAlchemy describes itself this way:

SQLAlchemy consists of two distinct components, known as the Core and the ORM. The Core is itself a fully featured SQL abstraction toolkit, providing a smooth layer of abstraction over a wide variety of DBAPI implementations and behaviors, as well as a SQL Expression Language which allows expression of the SQL language via generative Python expressions.

Connecting to a Database

SQLAlchemy gives us a few options for managing database connections, but they all begin with the concept of an engine. An "engine" is a Python object representing a database. Engines are created one time upfront by passing in connection info for a target database.

Once an engine is created, we can swiftly work with our database at any time by interacting with the engine object. Passing a SQL statement to an engine directly lets SQLAlchemy handle opening a new connection, executing, and immediately closing the connection. In the case of sophisticated web apps sustaining a high volume of data transfer, it's preferable to persistent continuous database connections via a session. We'll get to that in the next post.

The syntax for creating an engine is refreshingly simple. create_engine() requires one positional argument, which is a string representing the connection information to connect to a database:

"""Create database connection."""
from sqlalchemy import create_engine


engine = create_engine(
    "mysql+pymysql://user:password@host:3306/database",
)

Initializing an SQLAlchemy database engine

That string being passed into create_engine() is a connection URI. Assuming you have a database to work with, we'll figure out what your URI should look like together.

Database Connection URIs

Dissecting a URI into its parts looks like this:

[DB_TYPE]+[DB_CONNECTOR]://[USERNAME]:[PASSWORD]@[HOST]:[PORT]/[DB_NAME]

Database connection URI structure

[DB_TYPE]: Specifies the kind (dialect) of database we're connecting to. SQLAlchemy can interface with all mainstream flavors of relational databases. Depending on which database you're connecting to, replace [DB_TYPE] with the matching dialect:
- MySQL: mysql
- PostgreSQL: postgresql
- SQLite: sqlite
- Oracle (ugh): oracle
- Microsoft SQL (slightly less exasperated "ugh"): mssql
[DB_CONNECTOR]: To manage your database connections, SQLAlchemy leverages whichever Python database connection library you chose to use. If you aren't sure what this means, here are the libraries I'd recommend per dialect:
- MySQL: pymysql, mysqldb
- PostgreSQL: psycopg2, pg8000
- SQLite: (none needed)
- Oracle: cx_oracle
- Microsoft SQL: pymssql, pyodbc

The variables that come after should all look familiar; these refer to your target database's URL, a database user, that user's password, etc.

With that out of the way, we can install SQLAlchemy along with your connector:

$ pip install sqlalchemy pymysql

Install SQLAlchemy & database connector libraries

Additional Engine Configuration

Besides a connection URI, engines can be configured to accommodate your database setup's specific needs (or preferences). I require connections to happen over SSL; this requires a PEM key to be provided for extra security, which we're able to pass via the connect_args keyword argument when creating our engine:

"""Create database connection."""
from sqlalchemy import create_engine
from config import SQLALCHEMY_DATABASE_PEM


engine = create_engine(
    "mysql+pymysql://user:password@host:3306/database",
    connect_args={"ssl": {"key": SQLALCHEMY_DATABASE_PEM}},
    echo=True,
)

SQLAlchemy database engine

If you're the curious type, you can also pass echo=True to your engine, which will print all SQL statements being executed to your console as they happen (including connections, table creation, etc.). This is great for getting insight into what's happening for newcomers, but it quickly gets annoying and spammy.

Executing Queries

We can run ad hoc queries on an engine object directly by using engine.execute("SELECT * FROM mytable"). When we call execute() on an engine, SQLAlchemy handles:

Opening a connection to our database.
Executing raw SQL on our database and returning the results.
Immediately close the database connection used to run this query to avoid hanging connections.

This way of interacting with a database is called explicit connectionless execution. This type of operation is "explicit" in the sense that queries are automatically committed upon execution (contrast this to conventional Python libraries, which abide by PEP-249, which do not run queries unless followed by a commit()).

This is great when we want to run ad hoc queries on demand (i.e.: nearly all cases that are not user-facing applications). We can fetch or mutate the data we want without worrying about the extra stuff.

SELECT Data

I've set up a database of dummy data for this example. To demonstrate how easy it is to work with SQLAlchemy, I'm going to start by fetching some data from a table called nyc_jobs:

"""Execute raw SQL queries against an SQLAlchemy engine."""
from typing import List, Optional
import json

from sqlalchemy import text
from sqlalchemy.engine.base import Engine
from sqlalchemy.exc import SQLAlchemyError

from logger import LOGGER


def fetch_job_listings(engine: Engine) -> Optional[List[dict]]:
    """
    Select rows from database and parse as list of dicts.

    :param Engine engine: Database engine to handle raw SQL queries.

    :return: Optional[List[dict]]
    """
    try:
        with engine.begin() as conn:
            result = conn.execute(
                text(
                    "SELECT job_id, agency, business_title, \
                    salary_range_from, salary_range_to \
                    FROM nyc_jobs ORDER BY RAND() LIMIT 10;"
                ),
            )
            results = result.fetchall()
            LOGGER.info(f"Selected rows: {results}"
    except SQLAlchemyError as e:
        LOGGER.error(f"SQLAlchemyError while fetching records: {e}")
    except Exception as e:
        LOGGER.error(f"Unexpected error while fetching records: {e}")

Execute a SELECT query

I've passed a simple SQL query conn.execute() , which SELECTs a few columns from my nyc_jobs table (returned in random order). Let's check the output of that:

<sqlalchemy.engine.result.Result object at 0x10c59bdc0>

Queries return a Result object

Oh God, what the hell is that?! What's a Result? Was this all too good to be true?!

As of SQLAlchemy 1.4.X, the object ResultProxy has been renamed as Result. Check your version of SQLAlchemy to know which object to except when fetching a result.

Parsing Query Results

Result (is a useful data structure summarizing the result of our query and wrapping the results themselves. The Result object makes it easy to get a high-level look at how our query succeeded by giving us access to attributes like rowcount:

...


with engine.begin() as conn:
    result = conn.execute(
        text(
            "SELECT job_id, agency, business_title, \
            salary_range_from, salary_range_to \
            FROM nyc_jobs ORDER BY RAND();"
       ),
    )
    results = result.fetchall()
    LOGGER.info(f"Selected {result.rowcount} rows.")

Number of rows returned by query

Which gives us...

Selected 3123 rows.

Output of results.rowcount

...But enough horsing around. Let's see what our data looks like by calling fetchall() on our Result:

...


with engine.begin() as conn:
    result = conn.execute(
        text(
            "SELECT job_id, agency, business_title, \
            salary_range_from, salary_range_to \
            FROM nyc_jobs ORDER BY RAND() LIMIT 5;"
       ),
    )
    results = result.fetchall()
    print(f"Selected {result.rowcount} rows.")
    for row in result.fetchall():
        print(row)

Print rows fetched from query

Now we're talking:

Selected 5 rows.
(399274, 'NYC HOUSING AUTHORITY', 'Vice President Portfolio Planning, Project Development and Asset Mgt', 86346, 217244)
(399276, 'BOROUGH PRESIDENT-QUEENS', 'Director of Graphic Design', 52524, 81535)
(399276, 'BOROUGH PRESIDENT-QUEENS', 'Director of Graphic Design', 52524, 81535)
(399300, 'NYC HOUSING AUTHORITY', 'VICE PRESIDENT TRANSACTIONS', 86346, 217244)
...

Query results

Hey, those look like rows containing job listings!

Most SQLAlchemy users would likely prefer fetched rows to be formatted as a different data type instead of an SQLAlchemy object. A common scenario is parsing rows as Python dictionaries. Luckily, the SQLAlchemy Row object has a built-in _asdict() method for this exact purpose:

...

results_dict = [row._asdict() for row in results]
LOGGER.info(
    f"Selected {result.rowcount} rows: {results_dict}"
)

Parse fetched data as a list of dicts

Output:

Selected 5 rows: [{'job_id': 312335, 'agency': 'DEPT OF ENVIRONMENT PROTECTION', 'business_title': 'Dynamic CRM Developer', 'salary_range_from': 85823, 'salary_range_to': 121363}, {'job_id': 385305, 'agency': 'POLICE DEPARTMENT', 'business_title': 'Case Management Nurse', 'salary_range_from': 81653, 'salary_range_to': 81653}, {'job_id': 384016, 'agency': 'DEPT OF INFO TECH & TELECOMM', 'business_title': '.NET Developer/Programmer Analyst', 'salary_range_from': 56990, 'salary_range_to': 110000}, {'job_id': 221045, 'agency': 'NYC EMPLOYEES RETIREMENT SYS', 'business_title': 'CERTIFIED IT ADMINISTRATOR (WAN)', 'salary_range_from': 89383, 'salary_range_to': 134914}, {'job_id': 397017, 'agency': 'DEPARTMENT OF BUILDINGS', 'business_title': 'Operations Associate', 'salary_range_from': 31893, 'salary_range_to': 36677}]

Pretty close but a bit hard to read. Using Python's built-in json library can help us here:

...
import json


results_dict = [row._asdict() for row in results]
LOGGER.info(
    f"Selected {result.rowcount} rows: {json.dumps(results_dict, indent=2)}"
)

Parse rows as a list of dictionaries

Now you have a format that will surely suit your needs regardless of what you're trying to achieve:

[
...
  {
    "job_id": 398955,
    "agency": "DEPT OF INFO TECH & TELECOMM",
    "business_title": "Cyber Threat Analyst",
    "salary_range_from": 76500,
    "salary_range_to": 95000
  },
  {
    "job_id": 380613,
    "agency": "DEPT OF INFO TECH & TELECOMM",
    "business_title": "Vulnerability Manager",
    "salary_range_from": 69940,
    "salary_range_to": 165000
  },
  {
    "job_id": 390870,
    "agency": "DEPT OF INFO TECH & TELECOMM",
    "business_title": "Cyber Threat Analyst",
    "salary_range_from": 75000,
    "salary_range_to": 95000
  },
  {
    "job_id": 356355,
    "agency": "DEPT OF ENVIRONMENT PROTECTION",
    "business_title": "Engineering Technician I",
    "salary_range_from": 36239,
    "salary_range_to": 41675
  },
...
]

UPDATE Rows

We can run UPDATE queries the same way we ran our SELECT query, with the caveat that there is such a thing as a bad UPDATE query. Let's make sure we avoid those.

I'm picking a random row from my table to update. I'm going to change row 229837, which looks like this in JSON form:

[
 {
   "job_id": 229837,
   "agency": "DEPT OF INFO TECH & TELECOMM",
   "business_title": "Senior Quality Oversight Analyst",
   "job_category": "Information Technology & Telecommunications",
   "salary_range_from": 58675,
   "salary_range_to": 125000,
   "salary_frequency": "Annual",
   "work_location": "75 Park Place New York Ny",
   "division": "General Counsel",
   "created_at": "2016-02-17T00:00:00.000",
   "updated_at": "2016-02-17T00:00:00.000"
 }
]

SELECT * FROM nyc_jobs WHERE job_id = 229837;

Let's write a potentially dangerous SQL query that introduces some problematic characters:

UPDATE
	nyc_jobs
SET
	business_title = 'Senior QA Scapegoat 🏆',
	job_category = 'Info <>!#%%Technology%%#^&%* & Telecom' 
WHERE
	job_id = 229837;

UPDATE query containing quotes and emojis

It's a bad idea to attempt an UPDATE where the value contains operators sure to break the operation. Let's see what happens:


with engine.begin() as conn:
    result = engine.execute(
        text(
            "UPDATE nyc_jobs SET business_title = 'Senior QA Scapegoat 🏆', \
            job_category = 'Information? <>!#%%Technology!%%#^&%* & Telecom' \
            WHERE job_id = 229837;"
        )
    )
    LOGGER.info(result.rowcount)

Executing a problematic UPDATE query

Running this pisses off everything:

2021-01-09 15:44:14,122 INFO sqlalchemy.engine.base.Engine UPDATE nyc_jobs SET business_title = 'Senior QA Scapegoat 🏆',         job_category = 'Information <>!#%%Technology%%#^&%* & Telecommunications'         WHERE job_id = 229837;
2021-01-09 15:44:14,123 INFO sqlalchemy.engine.base.Engine {}
2021-01-09 15:44:14,123 INFO sqlalchemy.engine.base.Engine ROLLBACK
Traceback (most recent call last):
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/main.py", line 4, in <module>
    init_script()
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/sqlalchemy_tutorial/__init__.py", line 17, in init_script
    rows_updated = update_job_listing(db)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/sqlalchemy_tutorial/queries.py", line 18, in update_job_listing
    result = db.execute(
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2235, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1003, in execute
    return self._execute_text(object_, multiparams, params)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1172, in _execute_text
    ret = self._execute_context(
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1514, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 609, in do_execute
    cursor.execute(statement, parameters)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/pymysql/cursors.py", line 146, in execute
    query = self.mogrify(query, args)
  File "/Users/toddbirchard/projects/sqlalchemy-tutorial/.venv/lib/python3.8/site-packages/pymysql/cursors.py", line 125, in mogrify
    query = query % self._escape_args(args, conn)
TypeError: * wants int

Stack trace for bad SQL UPDATE query

Luckily for us, SQLAlchemy has a method called text() which escapes dangerous characters found in queries like these. Always wrap your queries in this method:

...
from sqlalchemy import text

with engine.begin() as conn:
    result = conn.execute(
        text(
            "UPDATE nyc_jobs SET business_title = 'Senior QA Scapegoat 🏆',\
            job_category = 'Information <>!#%%Technology%%#^&%* & Telecom' \
            WHERE job_id = 229837;"
         )
    )
    LOGGER.success(f"Updated {result.rowcount} row: {result}")

Wrap query in text()

Output:

08-31-2023 08:11:10 | SUCCESS: Updated 1 row: <sqlalchemy.engine.cursor.CursorResult object at 0x107021780>

We now receive no errors, and a count of 1 is returned to indicate that a single row was updated.

And So Much More

Unless you're a masochist, the SQLAlchemy workflow of running queries is objectively better than the boilerplate you'd be dealing with using a vanilla DB connector. I can't begin to quantify the amount of time I've personally saved from this simplified workflow, which makes the following assertion seem outlandish: There is still so much more to gain from SQLAlchemy.

Anyway, one step at a time. I compiled a working demo containing all the source code we covered in this tutorial on GitHub below. Join us next time!