Someday, every one of us will die. Perhaps we'll go out in a blaze of glory, stamping our ticket after a life well-lived. Some of us may instead die on the inside as we draw the final straws of a dead-end career which cannot go on any longer. Regardless of whether your death is physical or emotional, one thing is for certain: your employer and coworkers will consider you to be dead to them forever.
Office culture perpetuates strange idioms, my favorite of which is the timeless "hit by a bus" cliche. Every company has its fair share of veteran employees who have accumulated invaluable knowledge over the years. As companies find themselves relying on these contributors more and more, organizational gratitude begins to shift towards a sort of paranoia. None can help but wonder: "what if our best employee gets hit by a bus?
I appreciate the poetic justice of an organization left helpless in the wake of exploiting employees. That said, there are other reasons to make sure the code you write is easily readable and workable by others. If you plan on building software that continues to live on, you're going to need to start by structuring your app logically. Let's start with square one: project configuration.
There are plenty of file types we could use to store and access important variables throughout our project. File types like ini, YAML, or what-have-you all have unique ways of storing information within structured (or unstructured) hierarchies. Depending on the nature of your project, each of these file structures could either serve you well or get in the way. We'll be looking at the advantages of all these options, as well as how to parse these configs with their appropriate Python libraries.
Meet The Contenders
There's more than one way to skin a cat, but there are even more ways to format configuration files in modern software. We're going to look at some of the most common file formats for handling project configurations (ini, toml, yaml, conf, json, env) and the Python libraries which parse them.
ini files are perhaps the most straight configuration files available to us. ini files are highly suitable for smaller projects, mostly because these files only support hierarchies 1-level deep. ini files are essentially flat files, with the exception that variables can belong to groups. The below example demonstrates how variables sharing a common theme may fall under a common title, such as [DATABASE] or [LOGS]:
This structure surely makes things easier to understand by humans, but the practicality of this structure goes beyond aesthetics. Let's parse this file with Python's configparser library to see what's really happening. We get started by saving the contents of test.ini to a variable called config:
read() on an ini file does much more than store plain data; our config variable is actually now its own unique data structure, allowing us various methods for reading and writing values to our config. Try running
print(config) to see for yourself:
Config files exist for the simple purpose of extracting values. configparser allows us to do this in a number of ways. Each of the lines below return the
For values where we're expecting to receive a specific data type, configparser has a number of type-checking methods to retrieve values in the data structure we're looking for. The command
config.getboolean('APP', 'DEBUG') will correctly return a boolean value of False as opposed to a string reading "False," which would obviously be problematic for our app. If our value
DEBUG were set to something other than a boolean,
config.getboolean() would throw an error. configparser has a bunch of other type-checking methods such as
getfloat() and so forth.
The features of configparser don't end there. We could go into detail about the library's ability to write new config values, check the existence of keys, and so forth, but let's not.
TOML files may seem to share some syntax similarities with ini files at first glance, but support a much wider variety of data types, as well as relationships between values themselves. TOML files also force us to be more explicit about data structures upfront, as opposed to determining them after parsing as configparser does.
Parsing TOML files in Python is handled by a library appropriately dubbed toml, Before we even go there, let's see what the TOML hype is about.
TOML Variable Types
TOML files define variables via key/value pairs in a similar manner to that of ini files. These pairs are referred to as keys. Unlike ini files, however, TOML expects that the values of keys to be stored as the data type they're intended to be utilized as. Variables intended to be parsed as strings must be stored as values in quotes, whereas booleans must be stored as either raw true or false values. This removes a lot of ambiguity around our configurations: we have no need for methods such as
getboolean() with TOML files.
TOML files can support an impressive catalog of variable types. Some of the more impressive types of variables TOML supports includes DateTime, local time, arrays, floats, and even hexadecimal values:
TOML File Structures
The bracketed sections in TOML files are referred to as tables. Keys can live either inside or outside of tables, as we can see in the example below. You'll notice that these aren't the only two elements of TOML files, either:
TOML supports the concept of "nested tables", as seen in the
[environments] table, preceded by multiple sub-tables. By using dot-notation, we're able to create associations of tables, which imply they're different instances of the same element.
Equally interesting is the concept of "arrays of tables," which what's happening with
[[testers]]. Tables in double-brackets are automatically added to an array, where each item in the array is a table with the same name. The best way to visualize what's happening here is with the JSON equivalent:
Enough about TOML as a standard, let's get our data:
Loading TOML files immediately returns a dictionary:
Grabbing values from config is as easy as working with any dictionary:
YAML file formats have become a crowd favorite for configurations, presumably for their ease of readability. Those familiar with the YAML specification will tell you that YAML is far from being an elegant file format, but this hasn't seemed to stop anybody.
YAML files utilize white space to define variable hierarchies, which seems to have resonated with many developers. Check out what a sample YAML config might look like:
It should be immediately apparent that YAML configurations are easy to write and understand. The YAML file above is able to accomplish the same types of complex hierarchies we saw in our TOML file. However, we didn't need to set the variable data types explicitly, nor did we need to take a moment to understand concepts such as tables or arrays of tables. One could easily argue that YAML's ease-of-use doesn't justify the downsides. Don't spend too much time thinking about this: we're talking about config files here.
Something I think we can all agree on is YAML sure beats the hell out of a JSON config. Here's the same config as above as a JSON file:
Show me somebody who prefers JSON over YAML, and I'll show you a masochist in denial of their vendor-lock with AWS.
Parsing YAML in Python
I recommend the Python Confuse library (a package name that's sure to raise some eyebrows by your company's information security team).
Confuse allows us to interact with YAML files almost identically to how we would with JSON, with the exception that we specify
.get() at the end of walking through the tree hierarchy, like so:
.get() can accept a datatype value such as int. Doing so ensures that the value we're getting is actually of the schema we're expecting, which is a neat feature.
Confuse's documentation details additional validation methods for values we pull from YAML files. Methods like
as_str_seq() do basically what you'd expect them to.
Confuse also gets into the realm of building CLIs, allowing us to use our YAML file to inform arguments which can be passed to a CLI and their potential values:
There's plenty of things you can go nuts with here.
Environment variables are a great way of keeping sensitive information out of your project's codebase. We can store environment variables in numerous different ways, the easiest of which is via command line:
Variables stored in this way will only last as long as your current terminal session is open, so this doesn't help us much outside of testing. If we wanted
MY_VARIABLE to persist, we could add the above
export line to our .bash_profile (or equivalent) to ensure
MY_VARIABLE will always exist system-wide.
Project-specific variables are better suited for .env files living in our project's directory. FOR THE LOVE OF GOD, DON'T COMMIT THESE FILES TO GITHUB.
Let's say we have a .env file with project-related variables like so:
We can now extract these values in Python using the built-in
Just Use What You Want
Clearly, there are plenty of ways to set environment and project variables in Python. We could spend all day dissecting the pros and cons of configuration file types. This is one aspect of life we surely don't want to overthink.
Besides, I need to go reflect on my life. I just wrote two thousand words about the pros and cons of configuration files, which I'd rather forget before I'm made aware of how meaningless my life is.