Last time we met, we joyfully shared a little tirade about missing out on functionality provided to us by libraries such as SQLAlchemy, and the advantages of interacting with databases where ORMs are involved. I stand by that sentiment, but I'll now directly contradict myself by sharing some tips on using vanilla Psycopg2 to interact with databases.

We never know when we'll find ourselves stranded on a desert island without access to SQLAlchemy when a lonesome Psycopg2 washes up onshore. Either that or perhaps you're part of a development team stuck in a certain way of doing things that don't include utilizing SQLAlchemy. Whatever the situation may be, we're here for you.

Installation & Configuration

Since this is a Pyscopg2 tutorial, you won't be surprised to learn that our only essential dependency is Pyscopg2. I've tossed in python-dotenv to load our config values from environment variables, and loguru for clean logging:

$ pip3 install psycopg2 python-dotenv loguru
Install dependencies

The first file I'm going to add to our project is config.py: a place for us to store the information we need to connect to our Postgres database. In the interest of keeping sensitive information out of repositories, I'm opting to load these values from environment variables:

"""Load config from environment variables."""
from os import environ, path
from dotenv import load_dotenv


# Load variables from .env
basedir = path.abspath(path.dirname(__file__))
load_dotenv(path.join(basedir, '.env'))


# Database config
DATABASE_HOST = environ.get('DATABASE_HOST')
DATABASE_USERNAME = environ.get('DATABASE_USERNAME')
DATABASE_PASSWORD = environ.get('DATABASE_PASSWORD')
DATABASE_PORT = environ.get('DATABASE_PORT')
DATABASE_NAME = environ.get('DATABASE_NAME')

# SQL queries
SQL_QUERIES_FOLDER = 'sql'
config.py

Setting up Our Database Class

It's quite common to bundle generic logic regarding database connections and actions in a Python class. We set this class up one time, and then have the freedom to execute any database-related action at any point in our script by calling the convenient methods we define on it.

class Database:
    """PostgreSQL Database class."""

    def __init__(
            self,
            DATABASE_HOST,
            DATABASE_USERNAME,
            DATABASE_PASSWORD,
            DATABASE_PORT,
            DATABASE_NAME
        ):
        self.host = DATABASE_HOST
        self.username = DATABASE_USERNAME
        self.password = DATABASE_PASSWORD
        self.port = DATABASE_PORT
        self.dbname = DATABASE_NAME
        self.conn = None
db.py

Each variable we set in our constructor is a piece of information we need to connect to our database, with the exception self.conn. We set self.conn on our Database class to simplify connection management, ensuring that only a single connection exists for our database at any given time.

Connecting to our Database

We now have enough here to connect to our database. We can handle this in a new function, aptly named connect():

"""Database client."""
from loguru import logger
import psycopg2


class Database:
    """PostgreSQL Database class."""

    def __init__(self, config):
        self.host = config.DATABASE_HOST
        self.username = config.DATABASE_USERNAME
        self.password = config.DATABASE_PASSWORD
        self.port = config.DATABASE_PORT
        self.dbname = config.DATABASE_NAME
        self.conn = None

    def connect(self):
        """Connect to a Postgres database."""
        if self.conn is None:
            try:
                self.conn = psycopg2.connect(
                    host=self.host,
                    user=self.username,
                    password=self.password,
                    port=self.port,
                    dbname=self.dbname
                )
            except psycopg2.DatabaseError as e:
                LOGGER.error(e)
                raise e
            finally:
                LOGGER.info('Connection opened successfully.')
db.py

Our connect() function is safe enough to be called at any time in our application because we're checking to see if self.conn already exists before attempting to create a connection. When a connection is established, we save said connection to self.conn to check against later.

psycopg2.connect() accepts the usual keyword arguments we'd expect. If you happen to come from using PyMySQL, there are some subtle differences in the name conventions: where PyMySQL uses passwd, Psycopg2 uses password. PyMySQL's db is equivalent to Psycopg2's dbname. We also have a host keyword, whereas PyMySQL doesn't.

Executing SELECT Queries

Let's add a function to execute some queries. Here's a function called select_rows() we can use to fetch rows from a database via a SELECT query:

...

class Database:
   ...

   def select_rows(self, query):
        """Run a SQL query to select rows from table."""
        self.open_connection()
        with self.conn.cursor() as cur:
            cur.execute(query)
            records = [row for row in cur.fetchall()]
            cur.close()
            return records
A select_rows() function to grab rows from a database table.

This returns the result of the query by returning the values of each row it managed to grab. To demonstrate this, I set up a database table on a Postgres database call customers, which looks like this:

id first_name last_name email department address created_at
1 Rickard Paolo rpaolo0@altervista.org Services 97139 Dennis Circle 7/28/2017
2 Erin Maro emaro1@homestead.com Legal 5 Derek Lane 5/20/2016
3 Quinlan Haycroft qhaycroft2@google.nl Research and Development 4 Lighthouse Bay Hill 9/17/2017
4 Hi Saterthwait hsaterthwait3@wunderground.com Business Development 162 Sachs Circle 9/22/2015
5 Kelley Ritch kritch4@oakley.com Sales 301 Cody Crossing 2/13/2019

I pass a simple query to select_rows() to grab the above table:

SELECT * FROM customers;

The result looks like this:

[[(1,
   'Rickard',
   'Paolo',
   'rpaolo0@altervista.org',
   'Services',
   '97139 Dennis Circle',
   datetime.date(2017, 7, 28)),
  (2,
   'Erin',
   'Maro',
   'emaro1@homestead.com',
   'Legal',
   '5 Derek Lane',
   datetime.date(2016, 5, 20)),
  (3,
   'Quinlan',
   'Haycroft',
   'qhaycroft2@google.nl',
   'Research and Development',
   '4 Lighthouse Bay Hill',
   datetime.date(2017, 9, 17)),
  (4,
   'Hi',
   'Saterthwait',
   'hsaterthwait3@wunderground.com',
   'Business Development',
   '162 Sachs Circle',
   datetime.date(2015, 9, 22)),
  (5,
   'Kelley',
   'Ritch',
   'kritch4@oakley.com',
   'Sales',
   '301 Cody Crossing',
   datetime.date(2019, 2, 13))]]
Results from a fetchall() query.

When we run fetchall() on the cursor we've executed a query against, it returns a list of results formatted as such. Note that these results include our row's values, but not their column titles.

Bonus: Executing UPDATE Queries

We might want a separate function to update rows in a database. In this scenario, it makes more sense to return a message about many rows were affected:

...

class Database:
   ...

   def update_rows(self, query):
        """Run a SQL query to update rows in table."""
        self.open_connection()
        with self.conn.cursor() as cur:
            cur.execute(query)
            self.conn.commit()
            cur.close()
            return f"{cur.rowcount} rows affected."
An update_rows() function to update rows in a database table.

Psycopg2 Extras

In addition to the vanilla functionality we'd expect, Psycopg2 has a lot of useful hidden features in a submodule called psycopg2.extras. We're going to dig into this toolbox and pull out something called DictCursor: a flavor of database cursor that will return rows as a list of key/value pairs as opposed to only values

Using DictCursor to Return More Useful Results

My personal favorite of these extras is the DictCursor, which renders the rows being returned by our query as Python dictionaries as opposed to lists. When using a DictCursor, the key is always the column name, and the value is the value of that column in that particular row.

We're going to create a function similar to the select_rows() function we created before, with a notable twist: this time, we'll be passing a keyword argument called cursor_factory into self.conn.cursor():

from psycopg2.extras import DictCursor

    ...

    def select_rows_dict_cursor(self, query):
        """Run SELECT query and return list of dicts."""
        self.connect()
        with self.conn.cursor(cursor_factory=DictCursor) as cur:
            cur.execute(query)
            records = cur.fetchall()
        cur.close()
        return records
A new select_rows_dict_cursor() function.

While our cursor is open, all rows returned by the query will be returned as dictionaries... kind of. Here's what the output of the above looks like:

[1, 'Rickard', 'Paolo', 'rpaolo0@altervista.org', 'Services', '97139 Dennis Circle', datetime.date(2017, 7, 28)]
[2, 'Erin', 'Maro', 'emaro1@homestead.com', 'Legal', '5 Derek Lane', datetime.date(2016, 5, 20)]
[3, 'Quinlan', 'Haycroft', 'qhaycroft2@google.nl', 'Research and Development', '4 Lighthouse Bay Hill', datetime.date(2017, 9, 17)]
[4, 'Hi', 'Saterthwait', 'hsaterthwait3@wunderground.com', 'Business Development', '162 Sachs Circle', datetime.date(2015, 9, 22)]
[5, 'Kelley', 'Ritch', 'kritch4@oakley.com', 'Sales', '301 Cody Crossing', datetime.date(2019, 2, 13)]
Query results.

That looks like the same old list of values of before! Not so fast - let's check the type of each row using logger.info(type(row)) in our loop instead:

<class 'psycopg2.extras.DictRow'>
<class 'psycopg2.extras.DictRow'>
<class 'psycopg2.extras.DictRow'>
<class 'psycopg2.extras.DictRow'>
<class 'psycopg2.extras.DictRow'>
Data types of each returned row

Aha! These aren't lists of values, they're a new data structure unique to Psycopg2! While we see what appears to be a list of values, each value in each row has a key: the name of the column it belongs to. Check it out:

def select_rows_dict_cursor(self, query):
    """Run SELECT query and return dictionaries."""
        self.connect()
        with self.conn.cursor(cursor_factory=DictCursor) as cur:
            cur.execute(query)
            for row in cur.fetchall():
                print(row['first_name'])
Accessing values by column name

Gives us:

Rickard
Erin
Quinlan
Hi
Kelley
All values belonging to a single column.

Awesome! With this new function we can return what looks to be the same as what we have before, with the power to access values based on which column they belong in. Alas, the final product:

def select_rows_dict_cursor(self, query):
        """Run SELECT query and return dictionaries."""
        self.connect()
        with self.conn.cursor(cursor_factory=DictCursor) as cur:
            cur.execute(query)
            records = cur.fetchall()
        cur.close()
        return records

Other Psycopg2 Extras

There's plenty more to the Psycopg2 library where that came from. Psycopg2 contains a submodule named extras, which contains various tools to make common database tasks easier - it's up to you to decide which are worth your while.

For example, another extra which might be of interest could be psycopg2.extras.LoggingConnection, useful for debugging connection statuses and errors as you work through your program.

There's even a JSON Adaptation extra, which provides support for leveraging JSON data in building queries:

cur.execute("insert into mytable (jsondata) values (%s)",
    [Json({'a': 100})])

I don't dwell too deep in Psycopg2 extras myself, but if you see any Godlike extras I'm missing, feel free to call them out in the COMMENTS BELOW! (Hah! I've always wanted to say that).

From CSV to Postgres Table

Something worth visiting is the ability to upload CSVs into Postgres to create tables. We can accomplish this via the built-in method copy_expert.

To save a CSV to Postgres table, we need to begin with a basic SQL query saved in our project as a variable:

COPY %s FROM STDIN WITH
                    CSV
                    HEADER
                    DELIMITER AS ','

As should be familiar, %s represents a value we can pass in later. With this raw query, we're only missing two more values:

  • The path of our CSV file to be uploaded
  • The name of the table we'd like to upload to in Postgres

Check out how we use copy_expert here to put it all together:

sql = "COPY %s FROM STDIN WITH CSVHEADER DELIMITER AS ','"
file = open('files/myfile.csv', "r")
table = 'my_postgres_table'
with conn.cursor() as cur:
    cur.execute("truncate " + table + ";")
    cur.copy_expert(sql=sql % table, file=file)
    conn.commit()
    cur.close()
    conn.close()

Notice that I opt to truncate the existing table before uploading the new data, as seen by cur.execute("truncate " + table + ";"). Without doing this, we would be uploading the same CSV to the same table forever, creating duplicate rows over and over.

What if The Table Doesn't Exist?

Of course this would come up. The truth is (to the best of my knowledge), there aren't many native things Psycopg2 has to offer to make this process easy.

Recall that creating a table has a syntax similar to this:

CREATE TABLE `recommended_reads` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(150) NOT NULL,
  `content` text,
  `url` varchar(150) NOT NULL,
  `created` int(11) NOT NULL,
  `unique_ID` int(11) NOT NULL,
  `image` varchar(150) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`),
  UNIQUE KEY `uniqueid` (`unique_ID`) USING BTREE
)
Accompanying SQL

It's not impossible to build this string yourself in Python. It just entails a lot of iterating over whichever dynamic data structure you have coming through, determining the correct data type per column, and then the unavoidable task of setting your Primary and Unique keys if applicable. This is where my patience ends and knee-jerk reaction of "would be easier in SQLAlchemy" kicks in. Hey, it's possible... I just don't feel like writing about it :).

Godspeed to You, Brave Warrior

For those about to Psycopg2, we salute you. That is unless the choice is self-inflicted. In that case, perhaps it's best we don't work together any time soon.

Source code for this tutorial can be found here:

hackersandslackers/psycopg2-tutorial
Tutorial for connecting to a Postgres database in Python. - hackersandslackers/psycopg2-tutorial