The Many Faces and Filetypes of Python Configs

Cleverly (or uncleverly) configure your Python project using .ini, .yaml, or .env files.

The Many Faces and Filetypes of Python Configs

    As we cling harder and harder to Dockerfiles, Kubernetes, or any modern preconfigured app environment, our dependency on billable boilerplate grows. Whether or not that is a problem is a conversation in itself. The longer I keep my projects self-hosted, the more  I'm consumed by the open-ended approaches people take to manage their project configuration variables.

    Full disclosure here: this post is probably about as boring as where you see this heading. Today, I'm here to talk about Python Environment and general configuration variable handling.

    Pick Your Poison

    Someday, each and every one of us will die. I'm referring of course to the part inside of us that slowly withers away as we're forced to maintain projects we've handed off. We can do our best to avoid these situations by isolating the variables most subject to change in separate, easy-to-edit files for Person Number 2 to pick up on.

    Option 1: Project Config via .ini Files

    .ini files are simple, making them perfect for simple projects- especially those to be handled by others why may not have development backgrounds. These are configuration files with a single-level hierarchy:

    [GLOBAL]
    PROJECT: Fake Example Project
    REGION: us-east-1
    INPUT_FOLDER: data/zip/
    OUTPUT_FOLDER: data/output/
    TIMEOUT: 200
    MEMORY: 512
    
    [PROD]
    DATABASE = postgresql://loser:[email protected]:5432/mydatabase
    ENDPOINT = https://production.endpoint.example.com
    USER = PROD_USERNAME
    
    [DEV]
    DATABASE = postgresql://loser:[email protected]:5432/mydatabase
    ENDPOINT = https://dev.endpoint.example.com
    USER = DEV_USERNAME
    

    Another example, for instance, may be to specify AWS Services:

    [S3]
    BUCKET_NAME: public-bucket
    BUCKET_FOLDER: /
    
    [RDS]
    NAME: rds/prod/sensitivedata
    ARN: arn:aws:rds:us-east-1:66574567568434896:secret:rds/prod/peopledata-ZvJ3Ys
    REGION: us-east-1
    
    [LAMBDA]
    FUNCTION_NAME: handler
    HANDLER: lambda.handler
    DESCRIPTION: Performs a task every now and then.
    RUNTIME: python3.7
    ROLE: lambda_role
    DIST_FOLDER: lambda/dist
    
    [SECRETS]
    SECRET_NAME: rds/prod/totallysecret
    SECRET_ARN: arn:aws:secretsmanager:us-east-1:769979969:secret:rds/prod/stupidproject-5647
    

    .ini files are handled in Python by the configparser library; this is our way of doing something with the essentially static text in these files. Since we're keeping vars separate from app source code, we now need to create a file and a class which exists merely to access these values.

    Creating a Python Class to Extract Variables

    Instead of explicitly hardcoding a dump of all variables, we're going to create a class that provides an easy syntax for accessing variables on demand. Check it out:

    # config_loader.py
    from configparser import SafeConfigParser
    import os
    
    
    class Config:
        """Interact with configuration variables."""
    
        configParser = SafeConfigParser()
        configFilePath = (os.path.join(os.getcwd(), 'config.ini'))
    
        @classmethod
        def initialize(cls, newhire_table):
            """Start config by reading config.ini."""
            cls.configParser.read(cls.configFilePath)
    
        @classmethod
        def prod(cls, key):
            """Get prod values from config.ini."""
            return cls.configParser.get('PROD', key)
    
        @classmethod
        def dev(cls, key):
            """Get dev values from config.ini."""
            return cls.configParser.get('DEV', key)
    

    This simple class goes a long way to simplify grabbing variables. The class never needs to be instantiated, so we can import Config wherever we please and immediately start pulling values.

    To separate variables by concern, each block in config.ini receives its own class method. Now retrieving the proper variables is as simple as Config.prod('DATABASE') will return the URI for a production database. Easy to use, simple to understand.

    Option 2: Complex YAML Configurations

    Unless you're developing apps in isolation in an isolated third-world nation or under a dictatorship which blocks internet access, you already know that .yaml files are all the rage when it comes to storing static values in text files (wow, this really is an obscure topic for a post).

    YAML files provide plenty of upsides to alternative file types. Where .ini files are simply grouped variables, YAML provides a hierarchy structure. This makes YAML files much easier to understand and maintain for larger applications, as some variables only make sense in the context of being a sub-variable (?).

    Check out what a sample YAML config might look like:

    appName: appName
    logLevel: WARN
    
    AWS:
        Region: us-east-1
        Resources:
          EC2: 
            Type: "AWS::EC2::Instance"
            Properties: 
              ImageId: "ami-0ff8a91507f77f867"
              InstanceType: t2.micro
              KeyName: testkey
              BlockDeviceMappings:
                -
                  DeviceName: /dev/sdm
                  Ebs:
                    VolumeType: io1
                    Iops: 200
                    DeleteOnTermination: false
                    VolumeSize: 20
          Lambda:
              Type: "AWS::Lambda::Function"
              Properties: 
                Handler: "index.handler"
                Role: 
                  Fn::GetAtt: 
                    - "LambdaExecutionRole"
                    - "Arn"
                Runtime: "python3.7"
                Timeout: 25
                TracingConfig:
                  Mode: "Active"
    
    routes:
      admin:
        url: /admin
        template: admin.html
        assets:
            templates: /templates
            static: /static
      dashboard:
        url: /dashboard
        template: dashboard.html
        assets:
            templates: /templates
            static: /static
      account:
        url: /account
        template: account.html
        assets:
            templates: /templates
            static: /static
            
    databases:
      cassandra:
        host: example.cassandra.db
        username: user
        password: password
      redshift:
        jdbcURL: jdbc:redshift://<IP>:<PORT>/file?user=username&password=pass
        tempS3Dir: s3://path/to/redshift/temp/dir/ 
      redis:
        host: hostname
        port: port-number
        auth: authentication
        db: database

    This would read horribly if we tried to fit this in an .ini file. A more fair comparison would be to JSON configurations: JSON objects indeed share the same hierarchy advantages of YAML, but JSON syntax is prone to errors and unhelpful error messages, thanks to being a brainchild of Old Man JavaScript. YAML doesn't care if you open and close with brackets, use double quotes, or leave a trailing comma. All of these stupid things are why I prefer Python.

    Parsing YAML in Python

    I recommend the Python Confuse library (a package name that's sure to raise some eyebrows by your company's information security team).

    Confuse allows use to interact with YAML files almost identically to how we would with JSON, with the exception that we specify .get() at the end of walking through the tree hierarchy, like so:

    config = confuse.Configuration('MyApp', __name__)
    
    config['AWS']['Lambda']['Runtime'].get()
    

    .get() can accept a datatype value such as int. Doing so ensures that the value we're getting is actually of the schema we're expecting, which is a neat feature.

    Validators

    Confuse's documentation details additional validation methods for values we pull from YAML files. Methods like as_filename(), as_number(), and as_str_seq() do basically what you'd expect them to.

    CLI Configuration

    Confuse also gets into the realm of building CLIs, allowing use to use our YAML file to inform arguments which can be passed to a CLI and their potential values:

    config = confuse.Configuration('myapp')
    parser = argparse.ArgumentParser()
    parser.add_argument('--foo', help='a parameter')
    args = parser.parse_args()
    config.set_args(args)
    print(config['foo'].get())
    

    There's plenty of things you can go nuts with here.

    Option 3: Using .env Config Files

    Lastly, we can leverage the already well-known .env format to set variables. Working this way is pretty equivalent to working with .ini files, but we're human beings so we're stupid and do things like build the same protocols over and over. In .env, we get to store beautiful values such as these:

    CONFIG_PATH=${HOME}/.config/foo
    DOMAIN=example.org
    [email protected]${DOMAIN}

    To read these values, we'll be using the python-dotenv library. This gets you started:

    from dotenv import load_dotenv
    from pathlib import Path
    
    load_dotenv(verbose=True)
    
    env_path = Path('.') / '.env'
    load_dotenv(dotenv_path=env_path)
    

    After that, it's a matter of setting variables in Python to values you extract from .env:

    import os
    SECRET_KEY = os.getenv("EMAIL")
    DATABASE_PASSWORD = os.getenv("DATABASE_PASSWORD")
    

    So Yeah, Basically Just Use What You Want

    Clearly there are plenty of ways to set environment and project variables in Python. We could spend all day investigating the nuances of each and how their accompanying Python configuration class should be structured, but we've got apps to build.

    Besides, I need to go reflect on my life after writing a thousand words about loading variables in Python.

    Todd Birchard's' avatar
    New York City Website
    Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.

    Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.