Someday, every one of us will die. Perhaps we'll go out in a blaze of glory, stamping our ticket after a life well-lived. Some of us may instead die on the inside as we draw the final straws of a dead-end career which cannot go on any longer. Regardless of whether your death is physical or emotional, one thing is for certain: your employer and coworkers will consider you to be dead to them forever.

Office culture perpetuates strange idioms, my favorite of which is the timeless "hit by a bus" cliche. Every company has its fair share of veteran employees who have accumulated invaluable knowledge over the years. As companies find themselves relying on these contributors more and more, organizational gratitude begins to shift towards a sort of paranoia. None can help but wonder: "what if our best employee gets hit by a bus?

I appreciate the poetic justice of an organization left helpless in the wake of exploiting employees. That said, there are other reasons to make sure the code you write is easily readable and workable by others. If you plan on building software that continues to live on, you're going to need to start by structuring your app logically. Let's start with square one: project configuration.

There are plenty of file types we could use to store and access important variables throughout our project. File types like ini, YAML, or what-have-you all have unique ways of storing information within structured (or unstructured) hierarchies. Depending on the nature of your project, each of these file structures could either serve you well or get in the way. We'll be looking at the advantages of all these options, as well as how to parse these configs with their appropriate Python libraries.

Meet The Contenders

There's more than one way to skin a cat, but there are even more ways to format configuration files in modern software. We're going to look at some of the most common file formats for handling project configurations (ini, toml, yaml, conf, json, env) and the Python libraries which parse them.

INI Files

ini files are perhaps the most straight configuration files available to us. ini files are highly suitable for smaller projects, mostly because these files only support hierarchies 1-level deep. ini files are essentially flat files, with the exception that variables can belong to groups. The below example demonstrates how variables sharing a common theme may fall under a common title, such as [DATABASE] or [LOGS]:

[APP]
ENVIRONMENT = development
DEBUG = False

[DATABASE]
USERNAME: root
PASSWORD: p@ssw0rd
HOST: 127.0.0.1
PORT: 5432
DB: my_database

[LOGS]
ERRORS: logs/errors.log
INFO: data/info.log

[FILES]
STATIC_FOLDER: static
TEMPLATES_FOLDER: templates
config.ini

This structure surely makes things easier to understand by humans, but the practicality of this structure goes beyond aesthetics. Let's parse this file with Python's configparser library to see what's really happening. We get started by saving the contents of test.ini to a variable called config:

import configparser                                                     

config = configparser.ConfigParser()                                     config.read('~/Desktop/config.ini') 
config.py

Calling read() on an ini file does much more than store plain data; our config variable is actually now its own unique data structure, allowing us various methods for reading and writing values to our config. Try running print(config) to see for yourself:

<configparser.ConfigParser object at 0x10e58c390>

Config files exist for the simple purpose of extracting values. configparser allows us to do this in a number of ways. Each of the lines below return the 127.0.0.1:

config.get('DATABASE', 'HOST')
config['DATABASE']['HOST'] 

For values where we're expecting to receive a specific data type, configparser has a number of type-checking methods to retrieve values in the data structure we're looking for. The command config.getboolean('APP', 'DEBUG') will correctly return a boolean value of False as opposed to a string reading "False," which would obviously be problematic for our app. If our value DEBUG were set to something other than a boolean, config.getboolean() would throw an error. configparser has a bunch of other type-checking methods such as getint(), getfloat() and so forth.

The features of  configparser don't end there. We could go into detail about the library's ability to write new config values, check the existence of keys, and so forth, but let's not.

TOML Files

TOML files may seem to share some syntax similarities with ini files at first glance, but support a much wider variety of data types, as well as relationships between values themselves. TOML files also force us to be more explicit about data structures upfront, as opposed to determining them after parsing as configparser does.

Parsing TOML files in Python is handled by a library appropriately dubbed toml, Before we even go there, let's see what the TOML hype is about.

TOML Variable Types

TOML files define variables via key/value pairs in a similar manner to that of ini files. These pairs are referred to as keys. Unlike ini files, however, TOML expects that the values of keys to be stored as the data type they're intended to be utilized as. Variables intended to be parsed as strings must be stored as values in quotes, whereas booleans must be stored as either raw true or false values. This removes a lot of ambiguity around our configurations: we have no need for methods such as getboolean() with TOML files.

TOML files can support an impressive catalog of variable types. Some of the more impressive types of variables TOML supports includes DateTime, local time, arrays, floats, and even hexadecimal values:

[project]
name: "Faceback"
description: "Powerful AI which renders the back of somebody's head, based on their face."
version: "1.0.0"
updated: 1979-05-27T07:32:00Z
author = "Todd Birchard"

...
config.toml

TOML File Structures

The bracketed sections in TOML files are referred to as tables. Keys can live either inside or outside of tables, as we can see in the example below. You'll notice that these aren't the only two elements of TOML files, either:

# Keys
title = "My TOML Config"


# Tables
[project]
name = "Faceback"
description = "Powerful AI which renders the back of somebody's head, based on their face."
version = "1.0.0"
updated = 1979-05-27T07:32:00Z
author = "Todd Birchard"

[database]
host = "127.0.0.1"
password = "p@ssw0rd"
port = 5432
name = "my_database"
connection_max = 5000
enabled = true


# Nested `tables`
[environments]
  [environments.dev]
  ip = "10.0.0.1"
  dc = "eqdc10"
  [environments.staging]
  ip = "10.0.0.2"
  dc = "eqdc10"
  [environments.production]
  ip = "10.0.0.3"
  dc = "eqdc10"

# Array of Tables
[[testers]]
id = 1
username = "JohnCena"
password = "YouCantSeeMe69"

[[testers]]
id = 3
username = "TheRock"
password = "CantCook123"
config.toml

TOML supports the concept of "nested tables", as seen in the [environments] table, preceded by multiple sub-tables. By using dot-notation, we're able to create associations of tables, which imply they're different instances of the same element.

Equally interesting is the concept of "arrays of tables," which what's happening with [[testers]]. Tables in double-brackets are automatically added to an array, where each item in the array is a table with the same name. The best way to visualize what's happening here is with the JSON equivalent:

{
  "testers": [
    { "id": 1, "username": "JohnCena", "password": "YouCantSeeMe69" },
    { "id": 2, "username": "TheRock", "password": "CantCook123" }
  ]
}

Parsing TOML

Enough about TOML as a standard, let's get our data:

import toml
config = toml.load('/Users/toddbirchard/Desktop/config.toml')
print(config)

Loading TOML files immediately returns a dictionary:

{'title': 'My TOML Config',
 'project': {'name': 'Faceback',
  'description': "Powerful AI which renders the back of somebody's head, based on their face.",
  'version': '1.0.0',
  'updated': datetime.datetime(1979, 5, 27, 7, 32, tzinfo=<toml.tz.TomlTz object at 0x107b82390>),
  'author': 'Todd Birchard'},
 'database': {'host': '127.0.0.1',
  'password': 'p@ssw0rd',
  'port': 5432,
  'name': 'my_database',
  'connection_max': 5000,
  'enabled': True},
 'environments': {'dev': {'ip': '10.0.0.1', 'dc': 'eqdc10'},
  'staging': {'ip': '10.0.0.2', 'dc': 'eqdc10'},
  'production': {'ip': '10.0.0.3', 'dc': 'eqdc10'}},
 'testers': [{'id': 1, 'username': 'JohnCena', 'password': 'YouCantSeeMe69'},
  {'id': 1, 'username': 'TheRock', 'password': 'CantCook123'}]}

Grabbing values from config is as easy as working with any dictionary:

# Retrieving a dictionary
config['project']
config.get('project')

# Retrieving a value
config['project']['author']
config.get('project').get('author')

YAML Configurations

YAML file formats have become a crowd favorite for configurations, presumably for their ease of readability. Those familiar with the YAML specification will tell you that YAML is far from being an elegant file format, but this hasn't seemed to stop anybody.

YAML files utilize white space to define variable hierarchies, which seems to have resonated with many developers. Check out what a sample YAML config might look like:

appName: appName
logLevel: WARN

AWS:
    Region: us-east-1
    Resources:
      EC2: 
        Type: "AWS::EC2::Instance"
        Properties: 
          ImageId: "ami-0ff8a91507f77f867"
          InstanceType: t2.micro
          KeyName: testkey
          BlockDeviceMappings:
            -
              DeviceName: /dev/sdm
              Ebs:
                VolumeType: io1
                Iops: 200
                DeleteOnTermination: false
                VolumeSize: 20
      Lambda:
          Type: "AWS::Lambda::Function"
          Properties: 
            Handler: "index.handler"
            Role: 
              Fn::GetAtt: 
                - "LambdaExecutionRole"
                - "Arn"
            Runtime: "python3.7"
            Timeout: 25
            TracingConfig:
              Mode: "Active"

routes:
  admin:
    url: /admin
    template: admin.html
    assets:
        templates: /templates
        static: /static
  dashboard:
    url: /dashboard
    template: dashboard.html
    assets:
        templates: /templates
        static: /static
  account:
    url: /account
    template: account.html
    assets:
        templates: /templates
        static: /static
        
databases:
  cassandra:
    host: example.cassandra.db
    username: user
    password: password
  redshift:
    jdbcURL: jdbc:redshift://<IP>:<PORT>/file?user=username&password=pass
    tempS3Dir: s3://path/to/redshift/temp/dir/ 
  redis:
    host: hostname
    port: port-number
    auth: authentication
    db: databaseconfig.yaml
config.yaml

It should be immediately apparent that YAML configurations are easy to write and understand. The YAML file above is able to accomplish the same types of complex hierarchies we saw in our TOML file. However, we didn't need to set the variable data types explicitly, nor did we need to take a moment to understand concepts such as tables or arrays of tables. One could easily argue that YAML's ease-of-use doesn't justify the downsides. Don't spend too much time thinking about this: we're talking about config files here.

Something I think we can all agree on is YAML sure beats the hell out of a JSON config. Here's the same config as above as a JSON file:

{
   "appName": "appName",
   "logLevel": "WARN",
   "AWS": {
      "Region": "us-east-1",
      "Resources": {
         "EC2": {
            "Type": "AWS::EC2::Instance",
            "Properties": {
               "ImageId": "ami-0ff8a91507f77f867",
               "InstanceType": "t2.micro",
               "KeyName": "testkey",
               "BlockDeviceMappings": [
                  {
                     "DeviceName": "/dev/sdm",
                     "Ebs": {
                        "VolumeType": "io1",
                        "Iops": 200,
                        "DeleteOnTermination": false,
                        "VolumeSize": 20
                     }
                  }
               ]
            }
         },
         "Lambda": {
            "Type": "AWS::Lambda::Function",
            "Properties": {
               "Handler": "index.handler",
               "Role": {
                  "Fn::GetAtt": [
                     "LambdaExecutionRole",
                     "Arn"
                  ]
               },
               "Runtime": "python3.7",
               "Timeout": 25,
               "TracingConfig": {
                  "Mode": "Active"
               }
            }
         }
      }
   },
   "routes": {
      "admin": {
         "url": "/admin",
         "template": "admin.html",
         "assets": {
            "templates": "/templates",
            "static": "/static"
         }
      },
      "dashboard": {
         "url": "/dashboard",
         "template": "dashboard.html",
         "assets": {
            "templates": "/templates",
            "static": "/static"
         }
      },
      "account": {
         "url": "/account",
         "template": "account.html",
         "assets": {
            "templates": "/templates",
            "static": "/static"
         }
      }
   },
   "databases": {
      "cassandra": {
         "host": "example.cassandra.db",
         "username": "user",
         "password": "password"
      },
      "redshift": {
         "jdbcURL": "jdbc:redshift://<IP>:<PORT>/file?user=username&password=pass",
         "tempS3Dir": "s3://path/to/redshift/temp/dir/"
      },
      "redis": {
         "host": "hostname",
         "port": "port-number",
         "auth": "authentication",
         "db": "database"
      }
   }
}
config.json

Show me somebody who prefers JSON over YAML, and I'll show you a masochist in denial of their vendor-lock with AWS.

Parsing YAML in Python

I recommend the Python Confuse library (a package name that's sure to raise some eyebrows by your company's information security team).

Confuse allows us to interact with YAML files almost identically to how we would with JSON, with the exception that we specify .get() at the end of walking through the tree hierarchy, like so:

config = confuse.Configuration('MyApp', __name__)

config['AWS']['Lambda']['Runtime'].get()

.get() can accept a datatype value such as int. Doing so ensures that the value we're getting is actually of the schema we're expecting, which is a neat feature.

Validators

Confuse's documentation details additional validation methods for values we pull from YAML files. Methods like as_filename(), as_number(), and as_str_seq() do basically what you'd expect them to.

CLI Configuration

Confuse also gets into the realm of building CLIs, allowing us to use our YAML file to inform arguments which can be passed to a CLI and their potential values:

config = confuse.Configuration('myapp')
parser = argparse.ArgumentParser()
parser.add_argument('--foo', help='a parameter')
args = parser.parse_args()
config.set_args(args)
print(config['foo'].get())

There's plenty of things you can go nuts with here.

.ENV Files

Environment variables are a great way of keeping sensitive information out of your project's codebase. We can store environment variables in numerous different ways, the easiest of which is via command line:

$ export MY_VARIABLE=AAAAtpl%2Bkvro%2BoQ9wRg77VUEpQv%2F

Variables stored in this way will only last as long as your current terminal session is open, so this doesn't help us much outside of testing. If we wanted MY_VARIABLE to persist, we could add the above export line to our .bash_profile (or equivalent) to ensure MY_VARIABLE will always exist system-wide.

Project-specific variables are better suited for .env files living in our project's directory. FOR THE LOVE OF GOD, DON'T COMMIT THESE FILES TO GITHUB.

Let's say we have a .env file with project-related variables like so:

FLASK_ENV=development
FLASK_APP=wsgi.py
COMPRESSOR_DEBUG=True
STATIC_FOLDER=static
TEMPLATES_FOLDER=templates
.env

We can now extract these values in Python using the built-in os.environ:

"""App configuration."""
from os import environ


class Config:
    """Set configuration vars from .env file."""

    # General Config
    SECRET_KEY = environ.get('SECRET_KEY')
    FLASK_APP = environ.get('FLASK_APP')
    FLASK_ENV = environ.get('FLASK_ENV')

    # Flask-Assets
    LESS_BIN = environ.get('LESS_BIN')
    ASSETS_DEBUG = environ.get('ASSETS_DEBUG')
    LESS_RUN_IN_DEBUG = environ.get('LESS_RUN_IN_DEBUG')
config.py

Just Use What You Want

Clearly, there are plenty of ways to set environment and project variables in Python. We could spend all day dissecting the pros and cons of configuration file types. This is one aspect of life we surely don't want to overthink.

Besides, I need to go reflect on my life. I just wrote two thousand words about the pros and cons of configuration files, which I'd rather forget before I'm made aware of how meaningless my life is.