When we think about software interacting with databases directly, many envision a similar pattern of managing connections, explicitly committing queries, and a similarly tiresome boilerplate. Python libraries like PyMySQL and Psycopg2 perfectly fit this paradigm and uphold the status quo of what we've come to accept. SQLAlchemy has become one of Python's most iconic libraries by abstracting the mundane and trivializing databases in software.
It is also home to the most piss poor documentation for a Python library I've ever had the misfortune of navigating.
SQLAlchemy's API offers a better, simpler, and faster way to work with relational databases. This is useless to those attempting to make sense of the esoteric jargon masquerading as SQLAlchemy's documentation that slaps you across the face and says, "I don't give a shit about you."
I'm here to say I do give a shit about you and would like nothing more than to express these feelings in the form of an SQLAlchemy tutorial.
SQL or ORM?
SQLAlchemy can serve two purposes: making SQL interactions easier and serving as a full-blown ORM. Even if you aren't interested in implementing an ORM (or don't know what an ORM is), SQLAlchemy is invaluable as a tool to simplify connections to SQL databases and perform raw queries. I thought that was an original statement as I wrote it, but it turns out SQLAlchemy describes itself this way:
SQLAlchemy consists of two distinct components, known as the Core and the ORM. The Core is itself a fully featured SQL abstraction toolkit, providing a smooth layer of abstraction over a wide variety of DBAPI implementations and behaviors, as well as a SQL Expression Language which allows expression of the SQL language via generative Python expressions.
Connecting to a Database
SQLAlchemy gives us a few options for managing database connections, but they all begin with the concept of an engine. An "engine" is a Python object representing a database. Engines are created one time upfront by passing in connection info for a target database.
Once an engine is created, we can swiftly work with our database at any time by interacting with the engine object. Passing a SQL statement to an engine directly lets SQLAlchemy handle opening a new connection, executing, and immediately closing the connection. In the case of sophisticated web apps sustaining a high volume of data transfer, it's preferable to persistent continuous database connections via a session. We'll get to that in the next post.
The syntax for creating an engine is refreshingly simple. create_engine()
requires one positional argument, which is a string representing the connection information to connect to a database:
That string being passed into create_engine()
is a connection URI. Assuming you have a database to work with, we'll figure out what your URI should look like together.
Database Connection URIs
Dissecting a URI into its parts looks like this:
- [DB_TYPE]: Specifies the kind (dialect) of database we're connecting to. SQLAlchemy can interface with all mainstream flavors of relational databases. Depending on which database you're connecting to, replace
[DB_TYPE]
with the matching dialect:- MySQL:
mysql
- PostgreSQL:
postgresql
- SQLite:
sqlite
- Oracle (ugh):
oracle
- Microsoft SQL (slightly less exasperated "ugh"):
mssql
- MySQL:
- [DB_CONNECTOR]: To manage your database connections, SQLAlchemy leverages whichever Python database connection library you chose to use. If you aren't sure what this means, here are the libraries I'd recommend per dialect:
The variables that come after should all look familiar; these refer to your target database's URL, a database user, that user's password, etc.
With that out of the way, we can install SQLAlchemy along with your connector:
Additional Engine Configuration
Besides a connection URI, engines can be configured to accommodate your database setup's specific needs (or preferences). I require connections to happen over SSL; this requires a PEM key to be provided for extra security, which we're able to pass via the connect_args
keyword argument when creating our engine:
If you're the curious type, you can also pass echo=True
to your engine, which will print all SQL statements being executed to your console as they happen (including connections, table creation, etc.). This is great for getting insight into what's happening for newcomers, but it quickly gets annoying and spammy.
Executing Queries
We can run ad hoc queries on an engine object directly by using engine.execute("SELECT * FROM mytable")
. When we call execute()
on an engine, SQLAlchemy handles:
- Opening a connection to our database.
- Executing raw SQL on our database and returning the results.
- Immediately close the database connection used to run this query to avoid hanging connections.
This way of interacting with a database is called explicit connectionless execution. This type of operation is "explicit" in the sense that queries are automatically committed upon execution (contrast this to conventional Python libraries, which abide by PEP-249, which do not run queries unless followed by a commit()
).
This is great when we want to run ad hoc queries on demand (i.e.: nearly all cases that are not user-facing applications). We can fetch or mutate the data we want without worrying about the extra stuff.
SELECT Data
I've set up a database of dummy data for this example. To demonstrate how easy it is to work with SQLAlchemy, I'm going to start by fetching some data from a table called nyc_jobs:
I've passed a simple SQL query conn.execute()
, which SELECT
s a few columns from my nyc_jobs table (returned in random order). Let's check the output of that:
Oh God, what the hell is that?! What's a Result
? Was this all too good to be true?!
As of SQLAlchemy 1.4.X, the object ResultProxy
has been renamed as Result
. Check your version of SQLAlchemy to know which object to except when fetching a result.
Parsing Query Results
Result
(is a useful data structure summarizing the result of our query and wrapping the results themselves. The Result
object makes it easy to get a high-level look at how our query succeeded by giving us access to attributes like rowcount
:
Which gives us...
...But enough horsing around. Let's see what our data looks like by calling fetchall()
on our Result
:
Now we're talking:
Hey, those look like rows containing job listings!
Most SQLAlchemy users would likely prefer fetched rows to be formatted as a different data type instead of an SQLAlchemy object. A common scenario is parsing rows as Python dictionaries. Luckily, the SQLAlchemy Row
object has a built-in _asdict()
method for this exact purpose:
Output:
Selected 5 rows: [{'job_id': 312335, 'agency': 'DEPT OF ENVIRONMENT PROTECTION', 'business_title': 'Dynamic CRM Developer', 'salary_range_from': 85823, 'salary_range_to': 121363}, {'job_id': 385305, 'agency': 'POLICE DEPARTMENT', 'business_title': 'Case Management Nurse', 'salary_range_from': 81653, 'salary_range_to': 81653}, {'job_id': 384016, 'agency': 'DEPT OF INFO TECH & TELECOMM', 'business_title': '.NET Developer/Programmer Analyst', 'salary_range_from': 56990, 'salary_range_to': 110000}, {'job_id': 221045, 'agency': 'NYC EMPLOYEES RETIREMENT SYS', 'business_title': 'CERTIFIED IT ADMINISTRATOR (WAN)', 'salary_range_from': 89383, 'salary_range_to': 134914}, {'job_id': 397017, 'agency': 'DEPARTMENT OF BUILDINGS', 'business_title': 'Operations Associate', 'salary_range_from': 31893, 'salary_range_to': 36677}]
Pretty close but a bit hard to read. Using Python's built-in json
library can help us here:
Now you have a format that will surely suit your needs regardless of what you're trying to achieve:
[
...
{
"job_id": 398955,
"agency": "DEPT OF INFO TECH & TELECOMM",
"business_title": "Cyber Threat Analyst",
"salary_range_from": 76500,
"salary_range_to": 95000
},
{
"job_id": 380613,
"agency": "DEPT OF INFO TECH & TELECOMM",
"business_title": "Vulnerability Manager",
"salary_range_from": 69940,
"salary_range_to": 165000
},
{
"job_id": 390870,
"agency": "DEPT OF INFO TECH & TELECOMM",
"business_title": "Cyber Threat Analyst",
"salary_range_from": 75000,
"salary_range_to": 95000
},
{
"job_id": 356355,
"agency": "DEPT OF ENVIRONMENT PROTECTION",
"business_title": "Engineering Technician I",
"salary_range_from": 36239,
"salary_range_to": 41675
},
...
]
UPDATE Rows
We can run UPDATE queries the same way we ran our SELECT query, with the caveat that there is such a thing as a bad UPDATE query. Let's make sure we avoid those.
I'm picking a random row from my table to update. I'm going to change row 229837, which looks like this in JSON form:
Let's write a potentially dangerous SQL query that introduces some problematic characters:
It's a bad idea to attempt an UPDATE where the value contains operators sure to break the operation. Let's see what happens:
Running this pisses off everything:
Luckily for us, SQLAlchemy has a method called text()
which escapes dangerous characters found in queries like these. Always wrap your queries in this method:
Output:
08-31-2023 08:11:10 | SUCCESS: Updated 1 row: <sqlalchemy.engine.cursor.CursorResult object at 0x107021780>
We now receive no errors, and a count of 1
is returned to indicate that a single row was updated.
And So Much More
Unless you're a masochist, the SQLAlchemy workflow of running queries is objectively better than the boilerplate you'd be dealing with using a vanilla DB connector. I can't begin to quantify the amount of time I've personally saved from this simplified workflow, which makes the following assertion seem outlandish: There is still so much more to gain from SQLAlchemy.
Anyway, one step at a time. I compiled a working demo containing all the source code we covered in this tutorial on GitHub below. Join us next time!