Last time we met, we joyfully shared a little tirade about missing out on functionality provided to us by libraries such as SQLAlchemy, and the advantages of interacting with databases where ORMs are involved. I stand by that sentiment, but I'll now directly contradict myself by sharing some tips on using vanilla Psycopg2 to interact with databases.
We never know when we'll find ourselves stranded on a desert island without access to SQLAlchemy when a lonesome Psycopg2 washes up onshore. Either that or perhaps you're part of a development team stuck in a certain way of doing things that don't include utilizing SQLAlchemy. Whatever the situation may be, we're here for you.
Installation & Configuration
Since this is a Pyscopg2 tutorial, you won't be surprised to learn that our only essential dependency is Pyscopg2. I've tossed in python-dotenv to load our config values from environment variables, and loguru for clean logging:
The first file I'm going to add to our project is config.py: a place for us to store the information we need to connect to our Postgres database. In the interest of keeping sensitive information out of repositories, I'm opting to load these values from environment variables:
Setting up Our Database Class
It's quite common to bundle generic logic regarding database connections and actions in a Python class. We set this class up one time, and then have the freedom to execute any database-related action at any point in our script by calling the convenient methods we define on it.
Each variable we set in our constructor is a piece of information we need to connect to our database, with the exception self.conn
. We set self.conn
on our Database class to simplify connection management, ensuring that only a single connection exists for our database at any given time.
Connecting to our Database
We now have enough here to connect to our database. We can handle this in a new function, aptly named connect()
:
Our connect()
function is safe enough to be called at any time in our application because we're checking to see if self.conn
already exists before attempting to create a connection. When a connection is established, we save said connection to self.conn
to check against later.
psycopg2.connect()
accepts the usual keyword arguments we'd expect. If you happen to come from using PyMySQL, there are some subtle differences in the name conventions: where PyMySQL uses passwd, Psycopg2 uses password. PyMySQL's db is equivalent to Psycopg2's dbname. We also have a host keyword, whereas PyMySQL doesn't.
Executing SELECT Queries
Let's add a function to execute some queries. Here's a function called select_rows()
we can use to fetch rows from a database via a SELECT query:
This returns the result of the query by returning the values of each row it managed to grab. To demonstrate this, I set up a database table on a Postgres database call customers, which looks like this:
id | first_name | last_name | department | address | created_at | |
---|---|---|---|---|---|---|
1 | Rickard | Paolo | rpaolo0@altervista.org | Services | 97139 Dennis Circle | 7/28/2017 |
2 | Erin | Maro | emaro1@homestead.com | Legal | 5 Derek Lane | 5/20/2016 |
3 | Quinlan | Haycroft | qhaycroft2@google.nl | Research and Development | 4 Lighthouse Bay Hill | 9/17/2017 |
4 | Hi | Saterthwait | hsaterthwait3@wunderground.com | Business Development | 162 Sachs Circle | 9/22/2015 |
5 | Kelley | Ritch | kritch4@oakley.com | Sales | 301 Cody Crossing | 2/13/2019 |
I pass a simple query to select_rows()
to grab the above table:
SELECT * FROM customers;
The result looks like this:
When we run fetchall()
on the cursor we've executed a query against, it returns a list of results formatted as such. Note that these results include our row's values, but not their column titles.
Bonus: Executing UPDATE Queries
We might want a separate function to update rows in a database. In this scenario, it makes more sense to return a message about many rows were affected:
Psycopg2 Extras
In addition to the vanilla functionality we'd expect, Psycopg2 has a lot of useful hidden features in a submodule called psycopg2.extras. We're going to dig into this toolbox and pull out something called DictCursor: a flavor of database cursor that will return rows as a list of key/value pairs as opposed to only values
Using DictCursor to Return More Useful Results
My personal favorite of these extras is the DictCursor
, which renders the rows being returned by our query as Python dictionaries as opposed to lists. When using a DictCursor, the key is always the column name, and the value is the value of that column in that particular row.
We're going to create a function similar to the select_rows()
function we created before, with a notable twist: this time, we'll be passing a keyword argument called cursor_factory into self.conn.cursor()
:
While our cursor is open, all rows returned by the query will be returned as dictionaries... kind of. Here's what the output of the above looks like:
That looks like the same old list of values of before! Not so fast - let's check the type of each row using logger.info(type(row))
in our loop instead:
Aha! These aren't lists of values, they're a new data structure unique to Psycopg2! While we see what appears to be a list of values, each value in each row has a key: the name of the column it belongs to. Check it out:
Gives us:
Awesome! With this new function we can return what looks to be the same as what we have before, with the power to access values based on which column they belong in. Alas, the final product:
def select_rows_dict_cursor(self, query):
"""Run SELECT query and return dictionaries."""
self.connect()
with self.conn.cursor(cursor_factory=DictCursor) as cur:
cur.execute(query)
records = cur.fetchall()
cur.close()
return records
Other Psycopg2 Extras
There's plenty more to the Psycopg2 library where that came from. Psycopg2 contains a submodule named extras, which contains various tools to make common database tasks easier - it's up to you to decide which are worth your while.
For example, another extra which might be of interest could be psycopg2.extras.LoggingConnection
, useful for debugging connection statuses and errors as you work through your program.
There's even a JSON Adaptation extra, which provides support for leveraging JSON data in building queries:
cur.execute("insert into mytable (jsondata) values (%s)",
[Json({'a': 100})])
I don't dwell too deep in Psycopg2 extras myself, but if you see any Godlike extras I'm missing, feel free to call them out in the COMMENTS BELOW! (Hah! I've always wanted to say that).
From CSV to Postgres Table
Something worth visiting is the ability to upload CSVs into Postgres to create tables. We can accomplish this via the built-in method copy_expert
.
To save a CSV to Postgres table, we need to begin with a basic SQL query saved in our project as a variable:
COPY %s FROM STDIN WITH
CSV
HEADER
DELIMITER AS ','
As should be familiar, %s
represents a value we can pass in later. With this raw query, we're only missing two more values:
- The path of our CSV file to be uploaded
- The name of the table we'd like to upload to in Postgres
Check out how we use copy_expert
here to put it all together:
sql = "COPY %s FROM STDIN WITH CSVHEADER DELIMITER AS ','"
file = open('files/myfile.csv', "r")
table = 'my_postgres_table'
with conn.cursor() as cur:
cur.execute("truncate " + table + ";")
cur.copy_expert(sql=sql % table, file=file)
conn.commit()
cur.close()
conn.close()
Notice that I opt to truncate the existing table before uploading the new data, as seen by cur.execute("truncate " + table + ";")
. Without doing this, we would be uploading the same CSV to the same table forever, creating duplicate rows over and over.
What if The Table Doesn't Exist?
Of course this would come up. The truth is (to the best of my knowledge), there aren't many native things Psycopg2 has to offer to make this process easy.
Recall that creating a table has a syntax similar to this:
It's not impossible to build this string yourself in Python. It just entails a lot of iterating over whichever dynamic data structure you have coming through, determining the correct data type per column, and then the unavoidable task of setting your Primary and Unique keys if applicable. This is where my patience ends and knee-jerk reaction of "would be easier in SQLAlchemy" kicks in. Hey, it's possible... I just don't feel like writing about it :).
Godspeed to You, Brave Warrior
For those about to Psycopg2, we salute you. That is unless the choice is self-inflicted. In that case, perhaps it's best we don't work together any time soon.
Source code for this tutorial can be found here: