Psycopg2: Postgres & Python the Old Fashioned Way

Managing Postgres Database connections in Python with Psycopg2.

Last time we met, we joyfully shared a little tirade about missing out on functionality provided to us by libraries such as SQLAlchemy, and the advantages of interacting with databases where ORMs are involved. I stand by that sentiment, but I’ll now directly contradict myself by sharing some tips on using vanilla Psycopg2 to interact with databases.

We never know when we’ll be stranded on a desert island without access to SQLAlchemy, but a lonesome Psycopg2 washes up on shore. Either that or perhaps you’re part of a development team stuck in a certain way of doing things which doesn't include utilize SQLAlchemy. Whatever the situation may be, we’re here for you.

The Quintessential Boilerplate

No matter the type of database or the library, the boilerplate code for connecting to databases remains mostly the same. To some extent, this even holds true across programming languages. Let's look at a barebones example while ignoring the library at hand:

import SomeDatabaseLibrary

class Database:
    """A Generic Database class."""

    def __init__(self, config):
        self.username = config.database('USERNAME')
        self.password = config.database('PASSWORD')
        self.host = config.database('HOST')
        self.port = config.database('PORT')
        self.db = config.database('DB')

    def run_query(self, query):
            conn = None
            records = []
            try:
                conn = SomeDatabaseLibrary.connect(host=self.host, 
                                                user=self.username, 
                                                password=self.password,
                                                port=self.port, 
                                                dbname=self.db)
                with conn.cursor() as cur:
                    cur.execute(query)
                    result = cur.fetchall()
                    for row in result:
                        records.append(row)
                    cur.close()
                    return records
            except (Exception, SomeDatabaseLibrary.DatabaseError) as error:
                print(error)
            finally:
                if conn is not None:
                    conn.close()
                    print('Database connection closed.')

In the above example, we could swap SomeDatabaseLibrary with either Psycopg2 or PyMySQL just the same. If we compare this to our example with PyMySQL, it's easy to see that the basics of utilizing connections, cursors, and the methods to close them transcend libraries. If you know the basics of one, you know them all.

If you'd like to keep your connection logic separate (as I do), we can cleanly break the logic of handling connections out to a separate function. This time, we'll replace SomeDatabaseLibrary with Psycopg2 to produce some working code:

import psycopg2

class Database:
    """A Generic Database class."""

    def __init__(self, config):
        self.username = config.database('USERNAME')
        self.password = config.database('PASSWORD')
        self.host = config.database('HOST')
        self.port = config.database('PORT')
        self.db = config.database('DB')
        self.conn = None
        
    def open_connection():
        """Encapsulated connection management logic."""
        try:
            if(self.conn is None):
                self.conn = psycopg2.connect(host=self.host, 
                                       user=self.username, 
                                       password=self.password,
                                       port=self.port, 
                                       dbname=self.db)
            elif (not conn.open):
                self.conn = psycopg2.connect(host=self.host, 
                                       user=self.username, 
                                       password=self.password,
                                       port=self.port, 
                                       dbname=self.db)  
        except:
            logger.error("ERROR: Could not connect to Postgres.")
            sys.exit()

    def run_query(self, query):
            records = []
            try:
                open_connection()
                with self.conn.cursor() as cur:
                    cur.execute(query)
                    result = cur.fetchall()
                    for row in result:
                        records.append(row)
                    cur.close()
                    return records
            except (Exception, psycopg2.DatabaseError) as error:
                print(error)
            finally:
                if conn is not None:
                    conn.close()
                    print('Database connection closed.')

Psycopg2 Extras

Psycopg2 has many useful features via a library called psycopg2.extras. My personal favorite of these extras is the DictCursor, which renders the rows being returned by our query as Python dictionaries as opposed to lists.

Using DictCursor to Return More Useful Results

When using a DictCursor, the key is always the column name, and the value is the value of that column in that particular row.

To use extras, we import psycopg2.extras.

Then, we turn our attention to the following line:

self.conn.cursor() as cur:

Within cursor, we can pass an attribute named cursor_factory  and set it as such:

conn.cursor(cursor_factory=psycopg2.extras.DictCursor) as cur:

While our cursor is open, all rows returned by the query will be returned as dictionaries. For example, the row in the above example will be returned as a dict. To demonstrate, here's what a query on this exact post you're reading now looks like when returned as a Dict:

{
    title: "Psycopg2: Postgres & Python the Old Fashioned Way",
    slug: "psycopg2-postgres-python-the-old-fashioned-way",
    feature_image: "https://res-3.cloudinary.com/hackers-and-slackers/image/upload/f_auto,q_auto/v1/images/psycopg2.jpg",
    status: "draft",
    created_at: "2019-01-14 22:20:52",
    custom_excerpt: "Managing Postgres Database connections with Psycopg2"
}

Compare this to what we would've seen had we not used DictCursor:

["Psycopg2: Postgres & Python the Old Fashioned Way",
"psycopg2-postgres-python-the-old-fashioned-way",
"https://res-3.cloudinary.com/hackers-and-slackers/image/upload/f_auto,q_auto/v1/images/psycopg2.jpg",
"draft",
"2019-01-14 22:20:52",
"Managing Postgres Database connections with Psycopg2"]

Yes, it's a list, and thereby much less useful. Even from a readability standpoint, I (the human user) have no idea what these values represent unless comparing them to the table schema. Even worse would be compiling CSVs or even Pandas Dataframes this way. When building a table made of lists, you set your headers and hope that every row to come matches the number of header columns one-to-one. Otherwise, it's entirely unclear as to which value belongs to which column.

Other Psycopg2 Extras

There are plenty more Psycopg2 extras where that came from; it's mostly up to you to decide which are worth your while.

For example, another extra which might be of interest could be psycopg2.extras.LoggingConnection, useful for debugging connection statuses and errors as you work through your program.

There's even a JSON Adaptation extra, which provides support for leveraging JSON data in building queries:

cur.execute("insert into mytable (jsondata) values (%s)",
    [Json({'a': 100})])

I don't dwell too deep in Psycopg2 extras myself, but if you see any Godlike extras I'm missing, feel free to call them out in the COMMENTS BELOW! (Hah! I've always wanted to say that).

A Few More Fundamental Useful Things

Something worth visiting is the ability to upload CSVs into Postgres to create tables. We can accomplish this via the built-in method copy_expert.

From CSV to Postgres Table

To save a CSV to Postgres table, we need to begin with a basic SQL query saved in our project as a variable:

COPY %s FROM STDIN WITH
                    CSV
                    HEADER
                    DELIMITER AS ','

As should be familiar, %s represents a value we can pass in later. With this raw query, we're only missing two more values:

  • The path of our CSV file to be uploaded
  • The name of the table we'd like to upload to in Postgres

Check out how we use copy_expert here to put it all together:

sql = "COPY %s FROM STDIN WITH CSVHEADER DELIMITER AS ','"
file = open('files/myfile.csv', "r")
table = 'my_postgres_table'
with conn.cursor() as cur:
    cur.execute("truncate " + table + ";")
    cur.copy_expert(sql=sql % table, file=file)
    conn.commit()
    cur.close()
    conn.close()

Notice that I opt to truncate the existing table before uploading the new data, as seen by cur.execute("truncate " + table + ";"). Without doing this, we would be uploading the same CSV to the same table forever, creating duplicate rows over and over.

What if The Table Doesn't Exist?

Ugh, of course this would come up. The truth is (to the best of my knowledge), there aren't many native things Psycopg2 has to offer to make this process easy.

Recall that creating a table has a syntax similar to this:

CREATE TABLE `recommended_reads` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(150) NOT NULL,
  `content` text,
  `url` varchar(150) NOT NULL,
  `created` int(11) NOT NULL,
  `unique_ID` int(11) NOT NULL,
  `image` varchar(150) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`),
  UNIQUE KEY `uniqueid` (`unique_ID`) USING BTREE
)

It's not impossible to build this string yourself in Python. It just entails a lot of iterating over whichever dynamic data structure you have coming through, determining the correct data type per column, and then the unavoidable task of setting your Primary and Unique keys if applicable. This is where my patience ends and knee-jerk reaction of "would be easier in SQLAlchemy" kicks in. Hey, it's possible! I just don't feel like writing about it. :).

Godspeed to You, Brave Warrior

For those about to Psycopg2, we salute you. That is unless the choice is self-inflicted. In that case, perhaps it's best we don't work together any time soon.

Todd Birchard's' avatar
New York City Website
Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.