Someday, every one of us will die. Perhaps we'll go out in a blaze of glory, stamping our ticket after a life well-lived. Some of us may instead die on the inside as we draw the final straws of a dead-end career that cannot go any longer. Regardless of whether your death is physical or emotional, one thing is for sure: your employer and coworkers will consider you to be dead to them forever.
Office culture perpetuates strange idioms, my favorite of which is the timeless "hit by a bus" cliche. Every company has its fair share of veteran employees who have accumulated invaluable knowledge over the years. As companies rely on these contributors more and more, organizational gratitude begins to shift towards a sort of paranoia. None can help but wonder: "what if our best employee gets hit by a bus?"
I appreciate the poetic justice of an organization left helpless in the wake of exploiting employees. That said, there are other reasons to make sure the code you write is easily readable and workable by others. If you plan on building software that continues to live on, you're going to need to start by structuring your app logically. Let's start with square one: project configuration.
There are plenty of file types we could use to store and access essential variables throughout our project. File types like ini, yaml, and others all have unique ways of storing information within structured (or unstructured) hierarchies. Depending on your project's nature, each of these file structures could either serve you well or get in the way. We'll be looking at the advantages of all these options and parse these configs with their appropriate Python libraries.
Meet The Contenders
There's more than one way to skin a cat, but there are even more ways to format configuration files in modern software. We're going to look at some of the most common file formats for handling project configurations (ini, toml, yaml, json, .env) and the Python libraries which parse them.
Configure from .ini
ini files are perhaps the most straight configuration files available to us. ini files are highly suitable for smaller projects, mostly because these files only support hierarchies 1-level deep. ini files are essentially flat files, with the exception that variables can belong to groups. The below example demonstrates how variables sharing a common theme may fall under a common title, such as
This structure surely makes things easier to understand by humans, but the practicality of this structure goes beyond aesthetics. Let's parse this file with Python's configparser library to see what's really happening. We get started by saving the contents of test.ini to a variable called config:
read() on an ini file does much more than store plain data; our config variable is now a unique data structure, allowing us various methods for reading and writing values to our config. Try running
print(config) to see for yourself:
Config files exist for the simple purpose of extracting values. configparser allows us to do this in several ways. Each of the lines below return the value 127.0.0.1:
For values where we're expecting to receive a specific data type, configparser has several type-checking methods to retrieve values in the data structure we're looking for. The command
config.getboolean('APP', 'DEBUG') will correctly return a boolean value of False as opposed to a string reading "False," which would obviously be problematic for our app. If our value
DEBUG were set to something other than a boolean,
config.getboolean() would throw an error. configparser has a bunch of other type-checking methods such as
getfloat() and so forth.
The features of configparser don't end there. We could go into detail about the library's ability to write new config values, check the existence of keys, and so forth, but let's not.
Configure from .toml
TOML files may seem to share some syntax similarities with ini files at first glance, but support a much wider variety of data types, as well as relationships between values themselves. TOML files also force us to be more explicit about data structures upfront, as opposed to determining them after parsing as configparser does.
Parsing TOML files in Python is handled by a library appropriately dubbed toml, Before we even go there, let's see what the TOML hype is about.
TOML Variable Types
TOML files define variables via key/value pairs in a similar manner to that of ini files. These pairs are referred to as keys. Unlike ini files, however, TOML expects that the values of keys to be stored as the data type they're intended to be utilized as. Variables intended to be parsed as strings must be stored as values in quotes, whereas booleans must be stored as either raw true or false values. This removes a lot of ambiguity around our configurations: we have no need for methods such as
getboolean() with TOML files.
TOML files can support an impressive catalog of variable types. Some of the more impressive variable types of TOML include DateTime, local time, arrays, floats, and even hexadecimal values:
TOML File Structures
The bracketed sections in TOML files are referred to as tables. Keys can live either inside or outside of tables, as we can see in the example below. You'll notice that these aren't the only two elements of TOML files, either:
TOML supports the concept of "nested tables," as seen in the
[environments] table, preceded by multiple sub-tables. Using dot-notation enables us to create associations of tables, which imply they're different instances of the same element.
Equally impressive is the concept of "arrays of tables," which what's happening with
[[testers]]. Tables in double-brackets are automatically added to an array, where each item in the array is a table with the same name. The best way to visualize what's happening here is with the JSON equivalent:
Enough about TOML as a standard, let's get our data:
Loading TOML files immediately returns a dictionary:
Grabbing values from config is as easy as working with any dictionary:
Configure from .yaml
YAML file formats have become a crowd favorite for configurations, presumably for their ease of readability. Those familiar with the YAML specification will tell you that YAML is far from an elegant file format, but this hasn't stopped anybody.
YAML files utilize white space to define variable hierarchies, which seems to have resonated with many developers. Check out what a sample YAML config might look like:
It should be immediately apparent that YAML configurations are easy to write and understand. The YAML file above can accomplish the same types of complex hierarchies we saw in our TOML file. However, we didn't need to explicitly set the variable data types, nor did we need to take a moment to understand concepts such as tables or arrays of tables. One could easily argue that YAML's ease-of-use doesn't justify the downsides. Don't spend too much time thinking about this: we're talking about config files here.
I think we can all agree that YAML sure beats the hell out of a JSON config. Here's the same config as above as a JSON file:
Show me somebody who prefers JSON over YAML, and I'll show you a masochist in denial of their vendor-lock with AWS.
Parsing YAML in Python
I recommend the Python Confuse library (a package name that's sure to raise some eyebrows by your company's information security team).
Confuse allows us to interact with YAML files in a way that is nearly identical to how we would with JSON. The exception to this is that the confuse library needs to specify
.get() on a key to extract its value, like so:
.get() can accept a datatype value such as int. Doing so ensures that the value we're getting is actually of the schema we're expecting, which is a neat feature.
Confuse's documentation details additional validation methods for values we pull from YAML files. Methods like
as_str_seq() do basically what you'd expect them to.
Confuse also gets into the realm of building CLIs, allowing us to use our YAML file to inform arguments which can be passed to a CLI and their potential values:
There's plenty of things you can go nuts with here.
Configure from .env
Environment variables are a great way of keeping sensitive information out of your project's codebase. We can store environment variables in numerous different ways, the easiest of which is via command line:
Variables stored in this way will only last as long as your current terminal session is open, so this doesn't help us much outside of testing. If we wanted
MY_VARIABLE to persist, we could add the above
export line to our .bash_profile (or equivalent) to ensure
MY_VARIABLE will always exist system-wide.
Project-specific variables are better suited for .env files living in our project's directory. FOR THE LOVE OF GOD, DON'T COMMIT THESE FILES TO GITHUB.
Let's say we have a .env file with project-related variables like so:
We can now extract these values in Python using the built-in
Just Use What You Want
There are clearly plenty of ways to set environment and project variables in Python. We could spend all day dissecting the pros and cons of configuration file types. This is one aspect of life we surely don't want to overthink.
Besides, I need to reflect on my life. I just wrote two thousand words about the pros and cons of configuration files, which I'd rather forget before becoming aware of how meaningless my life is.