Last time we met, we joyfully shared a little tirade about missing out on functionality provided to us by libraries such as SQLAlchemy, and the advantages of interacting with databases where ORMs are involved. I stand by that sentiment, but I’ll now directly contradict myself by sharing some tips on using vanilla Psycopg2 to interact with databases.

We never know when we’ll be stranded on a desert island without access to SQLAlchemy, but a lonesome Psycopg2 washes up onshore. Either that or perhaps you’re part of a development team stuck in a certain way of doing things which doesn't include utilizing SQLAlchemy. Whatever the situation may be, we’re here for you.

Setting up Our Database Class

Working with Pyscopg2 isn't too different from working with PyMySQL, or any database connector for that matter. We're going to get started by creating a class called Database to manage connecting to our database and executing queries:

class Database:
    """PostgreSQL Database class."""

    def __init__(self, config):
        self.host = config.db_host
        self.username = config.db_user
        self.password = config.db_password
        self.port = config.db_port
        self.dbname = config.db_name
        self.conn = None

With that established, let's add our first method called open_connection():

import sys
import logging
import psycopg2


class Database:
    """PostgreSQL Database class."""

    ...

    def open_connection(self):
        """Connect to a Postgres database."""
        try:
            if(self.conn is None):
                self.conn = psycopg2.connect(host=self.host,
                                             user=self.username,
                                             password=self.password,
                                             port=self.port,
                                             dbname=self.dbname)
        except psycopg2.DatabaseError as e:
            logging.error(e)
            sys.exit()
        finally:
            logging.info('Connection opened successfully.')

Our open_connection() function is safe enough to be called at any time in our application because we're checking to see if self.conn is open first (we could really call this function open_database_connection_if_not_already_open(), but thats a bit wordy).

psycopg2.connect() accepts the usual keyword arguments we'd expect, but we warned: these are actually named differently than other libraries. Where PyMySQL uses passwd Psycopg2 uses password, and db is now dbname. We also have a host keyword where PyMySQL doesn't.

Executing Queries

Let's add a function to execute some queries. Here's a function called run_query() which checks to see if we're running a SELECT query or a mutation, and executes accordingly:

...


class Database:
    """PostgreSQL Database class."""

   ...

    def run_query(self, query):
    	"""Run a SQL query."""
        try:
            self.open_connection()
            with self.conn.cursor() as cur:
                if 'SELECT' in query:
                    records = []
                    cur.execute(query)
                    result = cur.fetchall()
                    for row in result:
                        records.append(row)
                    cur.close()
                    return records
                else:
                    result = cur.execute(query)
                    self.conn.commit()
                    affected = f"{cur.rowcount} rows affected."
                    cur.close()
                    return affected
        except psycopg2.DatabaseError as e:
            print(e)
        finally:
            if self.conn:
                self.conn.close()
                logging.info('Database connection closed.')

Psycopg2 Extras

In addition the vanilla functionality we'd expect, Psycopg2 actually has a lot of useful features which are somewhat hidden in a submodule called psycopg2.extras.

Using DictCursor to Return More Useful Results

My personal favorite of these extras is the DictCursor, which renders the rows being returned by our query as Python dictionaries as opposed to lists. When using a DictCursor, the key is always the column name, and the value is the value of that column in that particular row.

To use extras, we import psycopg2.extras. Then we turn our attention to the following line:

self.conn.cursor() as cur:

Within cursor, we can pass an attribute named cursor_factory  and set it as such:

conn.cursor(cursor_factory=psycopg2.extras.DictCursor) as cur:

While our cursor is open, all rows returned by the query will be returned as dictionaries. For example, the row in the above example will be returned as a dict. To demonstrate, here's what a query on this exact post you're reading now looks like when returned as a Dict:

{
    title: "Psycopg2: Postgres & Python the Old Fashioned Way",
    slug: "psycopg2-postgres-python-the-old-fashioned-way",
    feature_image: "https://res-3.cloudinary.com/hackers-and-slackers/image/upload/f_auto,q_auto/v1/images/psycopg2.jpg",
    status: "draft",
    created_at: "2019-01-14 22:20:52",
    custom_excerpt: "Managing Postgres Database connections with Psycopg2"
}

Compare this to what we would've seen had we not used DictCursor:

["Psycopg2: Postgres & Python the Old Fashioned Way",
"psycopg2-postgres-python-the-old-fashioned-way",
"https://res-3.cloudinary.com/hackers-and-slackers/image/upload/f_auto,q_auto/v1/images/psycopg2.jpg",
"draft",
"2019-01-14 22:20:52",
"Managing Postgres Database connections with Psycopg2"]

Yes, it's a list, and thereby much less useful. Even from a readability standpoint, I (the human user) have no idea what these values represent unless comparing them to the table schema. Even worse would be compiling CSVs or even Pandas Dataframes this way. When building a table made of lists, you set your headers and hope that every row to come matches the number of header columns one-to-one. Otherwise, it's entirely unclear as to which value belongs to which column.

Other Psycopg2 Extras

There are plenty more Psycopg2 extras where that came from; it's mostly up to you to decide which are worth your while.

For example, another extra which might be of interest could be psycopg2.extras.LoggingConnection, useful for debugging connection statuses and errors as you work through your program.

There's even a JSON Adaptation extra, which provides support for leveraging JSON data in building queries:

cur.execute("insert into mytable (jsondata) values (%s)",
    [Json({'a': 100})])

I don't dwell too deep in Psycopg2 extras myself, but if you see any Godlike extras I'm missing, feel free to call them out in the COMMENTS BELOW! (Hah! I've always wanted to say that).

A Few More Fundamental Useful Things

Something worth visiting is the ability to upload CSVs into Postgres to create tables. We can accomplish this via the built-in method copy_expert.

From CSV to Postgres Table

To save a CSV to Postgres table, we need to begin with a basic SQL query saved in our project as a variable:

COPY %s FROM STDIN WITH
                    CSV
                    HEADER
                    DELIMITER AS ','

As should be familiar, %s represents a value we can pass in later. With this raw query, we're only missing two more values:

  • The path of our CSV file to be uploaded
  • The name of the table we'd like to upload to in Postgres

Check out how we use copy_expert here to put it all together:

sql = "COPY %s FROM STDIN WITH CSVHEADER DELIMITER AS ','"
file = open('files/myfile.csv', "r")
table = 'my_postgres_table'
with conn.cursor() as cur:
    cur.execute("truncate " + table + ";")
    cur.copy_expert(sql=sql % table, file=file)
    conn.commit()
    cur.close()
    conn.close()

Notice that I opt to truncate the existing table before uploading the new data, as seen by cur.execute("truncate " + table + ";"). Without doing this, we would be uploading the same CSV to the same table forever, creating duplicate rows over and over.

What if The Table Doesn't Exist?

Of course this would come up. The truth is (to the best of my knowledge), there aren't many native things Psycopg2 has to offer to make this process easy.

Recall that creating a table has a syntax similar to this:

CREATE TABLE `recommended_reads` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(150) NOT NULL,
  `content` text,
  `url` varchar(150) NOT NULL,
  `created` int(11) NOT NULL,
  `unique_ID` int(11) NOT NULL,
  `image` varchar(150) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`),
  UNIQUE KEY `uniqueid` (`unique_ID`) USING BTREE
)

It's not impossible to build this string yourself in Python. It just entails a lot of iterating over whichever dynamic data structure you have coming through, determining the correct data type per column, and then the unavoidable task of setting your Primary and Unique keys if applicable. This is where my patience ends and knee-jerk reaction of "would be easier in SQLAlchemy" kicks in. Hey, it's possible... I just don't feel like writing about it :).

Godspeed to You, Brave Warrior

For those about to Psycopg2, we salute you. That is unless the choice is self-inflicted. In that case, perhaps it's best we don't work together any time soon.