Configuring Python Projects with INI, TOML, YAML, and ENV files

Someday, every one of us will die. Perhaps we'll go out in a blaze of glory, stamping our ticket after a life well-lived. Some of us may instead die on the inside as we draw the final straws of a dead-end career that cannot go any longer. Regardless of whether your death is physical or emotional, one thing is for sure: your employer and coworkers will consider you to be dead to them forever.

Office culture perpetuates strange idioms, my favorite of which is the timeless "hit by a bus" cliche. Every company has its fair share of veteran employees who have accumulated invaluable knowledge over the years. As companies rely on these contributors more and more, organizational gratitude begins to shift towards a sort of paranoia. None can help but wonder: "what if our best employee gets hit by a bus?"

I appreciate the poetic justice of an organization left helpless in the wake of exploiting employees. That said, there are other reasons to make sure the code you write is easily readable and workable by others. If you plan on building software that continues to live on, you're going to need to start by structuring your app logically. Let's start with square one: project configuration.

There are plenty of file types we could use to store and access essential variables throughout our project. File types like ini, yaml, and others all have unique ways of storing information within structured (or unstructured) hierarchies. Depending on your project's nature, each of these file structures could either serve you well or get in the way. We'll be looking at the advantages of all these options and parse these configs with their appropriate Python libraries.

Meet The Contenders

There's more than one way to skin a cat, but there are even more ways to format configuration files in modern software. We're going to look at some of the most common file formats for handling project configurations (ini, toml, yaml, json, .env) and the Python libraries which parse them.

Configure from .ini

ini files are perhaps the most straight configuration files available to us. ini files are highly suitable for smaller projects, mostly because these files only support hierarchies 1-level deep. ini files are essentially flat files, with the exception that variables can belong to groups. The below example demonstrates how variables sharing a common theme may fall under a common title, such as [DATABASE] or [LOGS]:

[APP]
ENVIRONMENT = development
DEBUG = False

[DATABASE]
USERNAME = root
PASSWORD = p@ssw0rd
HOST = 127.0.0.1
PORT = 5432
DB = my_database

[LOGS]
ERRORS = logs/errors.log
INFO = data/info.log

[FILES]
STATIC_FOLDER = static
TEMPLATES_FOLDER = templates

config.ini

This structure surely makes things easier to understand by humans, but the practicality of this structure goes beyond aesthetics. Let's parse this file with Python's configparser library to see what's really happening. We get started by saving the contents of test.ini to a variable called config:

"""Load configuration from .ini file."""
import configparser                                                     


# Read local file `config.ini`.
config = configparser.ConfigParser()                                     config.read('settings/config.ini')

ini_config.py

Calling read() on an ini file does much more than store plain data; our config variable is now a unique data structure, allowing us various methods for reading and writing values to our config. Try running print(config) to see for yourself:

<configparser.ConfigParser object at 0x10e58c390>

Output

Config files exist for the simple purpose of extracting values. configparser allows us to do this in several ways. Each of the lines below return the value 127.0.0.1:

"""Load configuration from .ini file."""
import configparser                                                     


# Read local `config.ini` file.
config = configparser.ConfigParser()                                     config.read('settings/config.ini') 

# Get values from our .ini file
config.get('DATABASE', 'HOST')
config['DATABASE']['HOST']

ini_config.py

For values where we're expecting to receive a specific data type, configparser has several type-checking methods to retrieve values in the data structure we're looking for. The command config.getboolean('APP', 'DEBUG') will correctly return a boolean value of False as opposed to a string reading "False," which would obviously be problematic for our app. If our value DEBUG were set to something other than a boolean, config.getboolean() would throw an error. configparser has a bunch of other type-checking methods such as getint(), getfloat() and so forth.

The features of configparser don't end there. We could go into detail about the library's ability to write new config values, check the existence of keys, and so forth, but let's not.

Configure from .toml

TOML files may seem to share some syntax similarities with ini files at first glance, but support a much wider variety of data types, as well as relationships between values themselves. TOML files also force us to be more explicit about data structures upfront, as opposed to determining them after parsing as configparser does.

Parsing TOML files in Python is handled by a library appropriately dubbed toml, Before we even go there, let's see what the TOML hype is about.

TOML Variable Types

TOML files define variables via key/value pairs in a similar manner to that of ini files. These pairs are referred to as keys. Unlike ini files, however, TOML expects that the values of keys to be stored as the data type they're intended to be utilized as. Variables intended to be parsed as strings must be stored as values in quotes, whereas booleans must be stored as either raw true or false values. This removes a lot of ambiguity around our configurations: we have no need for methods such as getboolean() with TOML files.

TOML files can support an impressive catalog of variable types. Some of the more impressive variable types of TOML include DateTime, local time, arrays, floats, and even hexadecimal values:

[project]
name = "Faceback"
description = "Powerful AI which renders the back of somebody's head, based on their face."
version = "1.0.0"
updated = 1979-05-27T07:32:00Z
author = "Todd Birchard"

...

config.toml

TOML File Structures

The bracketed sections in TOML files are referred to as tables. Keys can live either inside or outside of tables, as we can see in the example below. You'll notice that these aren't the only two elements of TOML files, either:

# Keys
title = "My TOML Config"

# Tables
[project]
name = "Faceback"
description = "Powerful AI which renders the back of somebody's head, based on their face."
version = "1.0.0"
updated = 1979-05-27T07:32:00Z
author = "Todd Birchard"

[database]
host = "127.0.0.1"
password = "p@ssw0rd"
port = 5432
name = "my_database"
connection_max = 5000
enabled = true

# Nested `tables`
[environments]
  [environments.dev]
    ip = "10.0.0.1"
    dc = "eqdc10"
  [environments.staging]
    ip = "10.0.0.2"
    dc = "eqdc10"
  [environments.production]
    ip = "10.0.0.3"
    dc = "eqdc10"

# Array of Tables
[[testers]]
id = 1
username = "JohnCena"
password = "YouCantSeeMe69"

[[testers]]
id = 3
username = "TheRock"
password = "CantCook123"

config.toml

TOML supports the concept of "nested tables," as seen in the [environments] table, preceded by multiple sub-tables. Using dot-notation enables us to create associations of tables, which imply they're different instances of the same element.

Equally impressive is the concept of "arrays of tables," which what's happening with [[testers]]. Tables in double-brackets are automatically added to an array, where each item in the array is a table with the same name. The best way to visualize what's happening here is with the JSON equivalent:

{
  "testers": [
    { "id": 1, "username": "JohnCena", "password": "YouCantSeeMe69" },
    { "id": 2, "username": "TheRock", "password": "CantCook123" }
  ]
}

config.json

Parsing TOML

Enough about TOML as a standard, let's get our data:

"""Load configuration from .toml file."""
import toml


# Read local `config.toml` file.
config = toml.load('settings/config.toml')
print(config)

toml_config.py

Loading TOML files immediately returns a dictionary:

{'title': 'My TOML Config',
 'project': {'name': 'Faceback',
  'description': "Powerful AI which renders the back of somebody's head, based on their face.",
  'version': '1.0.0',
  'updated': datetime.datetime(1979, 5, 27, 7, 32, tzinfo=<toml.tz.TomlTz object at 0x107b82390>),
  'author': 'Todd Birchard'},
 'database': {'host': '127.0.0.1',
  'password': 'p@ssw0rd',
  'port': 5432,
  'name': 'my_database',
  'connection_max': 5000,
  'enabled': True},
 'environments': {'dev': {'ip': '10.0.0.1', 'dc': 'eqdc10'},
  'staging': {'ip': '10.0.0.2', 'dc': 'eqdc10'},
  'production': {'ip': '10.0.0.3', 'dc': 'eqdc10'}},
 'testers': [{'id': 1, 'username': 'JohnCena', 'password': 'YouCantSeeMe69'},
  {'id': 1, 'username': 'TheRock', 'password': 'CantCook123'}]}

Python dict parsed from config.toml

Grabbing values from config is as easy as working with any dictionary:

"""Load configuration from .toml file."""
import toml


# Read local `config.toml` file.
config = toml.load('settings/config.toml')
print(config)

# Retrieving a dictionary of values
config['project']
config.get('project')

# Retrieving a value
config['project']['author']
config.get('project').get('author')

toml_config.py

Configure from .yaml

YAML file formats have become a crowd favorite for configurations, presumably for their ease of readability. Those familiar with the YAML specification will tell you that YAML is far from an elegant file format, but this hasn't stopped anybody.

YAML files utilize white space to define variable hierarchies, which seems to have resonated with many developers. Check out what a sample YAML config might look like:

appName: appName
logLevel: WARN

AWS:
    Region: us-east-1
    Resources:
      EC2: 
        Type: "AWS::EC2::Instance"
        Properties: 
          ImageId: "ami-0ff8a91507f77f867"
          InstanceType: t2.micro
          KeyName: testkey
          BlockDeviceMappings:
            -
              DeviceName: /dev/sdm
              Ebs:
                VolumeType: io1
                Iops: 200
                DeleteOnTermination: false
                VolumeSize: 20
      Lambda:
          Type: "AWS::Lambda::Function"
          Properties: 
            Handler: "index.handler"
            Role: 
              Fn::GetAtt: 
                - "LambdaExecutionRole"
                - "Arn"
            Runtime: "python3.7"
            Timeout: 25
            TracingConfig:
              Mode: "Active"

routes:
  admin:
    url: /admin
    template: admin.html
    assets:
        templates: /templates
        static: /static
  dashboard:
    url: /dashboard
    template: dashboard.html
    assets:
        templates: /templates
        static: /static
  account:
    url: /account
    template: account.html
    assets:
        templates: /templates
        static: /static
        
databases:
  cassandra:
    host: example.cassandra.db
    username: user
    password: password
  redshift:
    jdbcURL: jdbc:redshift://<IP>:<PORT>/file?user=username&password=pass
    tempS3Dir: s3://path/to/redshift/temp/dir/ 
  redis:
    host: hostname
    port: port-number
    auth: authentication
    db: databaseconfig.yaml

config.yaml

It should be immediately apparent that YAML configurations are easy to write and understand. The YAML file above can accomplish the same types of complex hierarchies we saw in our TOML file. However, we didn't need to explicitly set the variable data types, nor did we need to take a moment to understand concepts such as tables or arrays of tables. One could easily argue that YAML's ease-of-use doesn't justify the downsides. Don't spend too much time thinking about this: we're talking about config files here.

I think we can all agree that YAML sure beats the hell out of a JSON config. Here's the same config as above as a JSON file:

{
   "appName": "appName",
   "logLevel": "WARN",
   "AWS": {
      "Region": "us-east-1",
      "Resources": {
         "EC2": {
            "Type": "AWS::EC2::Instance",
            "Properties": {
               "ImageId": "ami-0ff8a91507f77f867",
               "InstanceType": "t2.micro",
               "KeyName": "testkey",
               "BlockDeviceMappings": [
                  {
                     "DeviceName": "/dev/sdm",
                     "Ebs": {
                        "VolumeType": "io1",
                        "Iops": 200,
                        "DeleteOnTermination": false,
                        "VolumeSize": 20
                     }
                  }
               ]
            }
         },
         "Lambda": {
            "Type": "AWS::Lambda::Function",
            "Properties": {
               "Handler": "index.handler",
               "Role": {
                  "Fn::GetAtt": [
                     "LambdaExecutionRole",
                     "Arn"
                  ]
               },
               "Runtime": "python3.7",
               "Timeout": 25,
               "TracingConfig": {
                  "Mode": "Active"
               }
            }
         }
      }
   },
   "routes": {
      "admin": {
         "url": "/admin",
         "template": "admin.html",
         "assets": {
            "templates": "/templates",
            "static": "/static"
         }
      },
      "dashboard": {
         "url": "/dashboard",
         "template": "dashboard.html",
         "assets": {
            "templates": "/templates",
            "static": "/static"
         }
      },
      "account": {
         "url": "/account",
         "template": "account.html",
         "assets": {
            "templates": "/templates",
            "static": "/static"
         }
      }
   },
   "databases": {
      "cassandra": {
         "host": "example.cassandra.db",
         "username": "user",
         "password": "password"
      },
      "redshift": {
         "jdbcURL": "jdbc:redshift://<IP>:<PORT>/file?user=username&password=pass",
         "tempS3Dir": "s3://path/to/redshift/temp/dir/"
      },
      "redis": {
         "host": "hostname",
         "port": "port-number",
         "auth": "authentication",
         "db": "database"
      }
   }
}

config.json

Show me somebody who prefers JSON over YAML, and I'll show you a masochist in denial of their vendor-lock with AWS.

Parsing YAML in Python

I recommend the Python Confuse library (a package name that's sure to raise some eyebrows by your company's information security team).

Confuse allows us to interact with YAML files in a way that is nearly identical to how we would with JSON. The exception to this is that the confuse library needs to specify .get() on a key to extract its value, like so:

"""Load configuration from .yaml file."""
import confuse

config = confuse.Configuration('MyApp', __name__)
runtime = config['AWS']['Lambda']['Runtime'].get()
print(runtime)

yaml_config.py

.get() can accept a datatype value such as int. Doing so ensures that the value we're getting is actually of the schema we're expecting, which is a neat feature.

Validators

Confuse's documentation details additional validation methods for values we pull from YAML files. Methods like as_filename(), as_number(), and as_str_seq() do basically what you'd expect them to.

CLI Configuration

Confuse also gets into the realm of building CLIs, allowing us to use our YAML file to inform arguments which can be passed to a CLI and their potential values:

config = confuse.Configuration('myapp')
parser = argparse.ArgumentParser()

parser.add_argument('--foo', help='a parameter')

args = parser.parse_args()
config.set_args(args)

print(config['foo'].get())

cli_config.py

There's plenty of things you can go nuts with here.

Configure from .env

Environment variables are a great way of keeping sensitive information out of your project's codebase. We can store environment variables in numerous different ways, the easiest of which is via command line:

$ export MY_VARIABLE=AAAAtpl%2Bkvro%2BoQ9wRg77VUEpQv%2F

export MY_VARIABLE

Variables stored in this way will only last as long as your current terminal session is open, so this doesn't help us much outside of testing. If we wanted MY_VARIABLE to persist, we could add the above export line to our .bash_profile (or equivalent) to ensure MY_VARIABLE will always exist system-wide.

Project-specific variables are better suited for .env files living in our project's directory. FOR THE LOVE OF GOD, DON'T COMMIT THESE FILES TO GITHUB.

Let's say we have a .env file with project-related variables like so:

FLASK_ENV=development
FLASK_APP=wsgi.py
COMPRESSOR_DEBUG=True
STATIC_FOLDER=static
TEMPLATES_FOLDER=templates

.env

We can now extract these values in Python using the built-in os.environ:

"""App configuration."""
from os import environ, path
from dotenv import load_dotenv


# Find .env file
basedir = path.abspath(path.dirname(__file__))
load_dotenv(path.join(basedir, '.env'))


# General Config
SECRET_KEY = environ.get('SECRET_KEY')
FLASK_APP = environ.get('FLASK_APP')
FLASK_ENV = environ.get('FLASK_ENV')

env_config.py

Just Use What You Want

There are clearly plenty of ways to set environment and project variables in Python. We could spend all day dissecting the pros and cons of configuration file types. This is one aspect of life we surely don't want to overthink.

Besides, I need to reflect on my life. I just wrote two thousand words about the pros and cons of configuration files, which I'd rather forget before becoming aware of how meaningless my life is.