Using Yaml in a Python app

Introduction

I’d like to load an Yaml file and get values from it without using a plain Python dictionary syntax (which I don’t like much) but also without having to map the yaml structure to classes in order to access values in a property invocation style (e.g.: a.b.c) Finally I’d like also to load different config files depending on the environment.

Using PyYaml and addict

The library chosen to load yaml was PyYaml. PyYaml works really straight forward, it loads a yaml file and returns a Python dictionary. The problem with Python dictionaries is that the syntaxis required to access a deep node in the tree structure is a little bit verbose.

deep node access
config['auth']['fields']['username']

On the other hand, I was looking more for something like:

deep node access
config.auth.fields.username

In order to achieve the required property-like syntax I found Addict.

YamlConfigLoader
import logging
import os

from addict import Dict
from yaml import load, Loader, YAMLError


log = logging.getLogger("config")


class YamlConfigLoader:
    def load(self, path):
        if path and os.path.exists(path):
            with open(path, 'r') as ymlfile:
                try:
                    yaml = load(ymlfile, Loader=Loader) (1)
                    conf = Dict(yaml) (2)

                    return conf
                except YAMLError as error:
                    log.error("config/error/yaml: {}".format(error))
        else:
            log.error("config/error/not_found: {}".format(path))
1 loads yaml with PyYaml to get a Python dictionary
2 converts a Python dictionary to an Addict dictionary

Now lets say I have my app config file config.yml:

database:
  dialect: postgres+pg8000
  user: john
  password: supersecret
  host: localhost
  port: 5432

log:
  loggers:
    - name: security
      level: INFO
    - name: api
      level: DEBUG

We can access configuration properties using the property-dot syntax:

def test_simple():
    yaml = YamlConfigLoader().load("config.yml")

    assert yaml.database.user     == "john"
    assert yaml.database.password == "supersecret"
    assert yaml.database.dialect  == "postgres+pg8000"
    assert yaml.database.host     == "localhost"
    assert yaml.database.port     == 5432

    assert len(yaml.log.loggers) == 2

Environments

It make sense in many projects to load different properties depending on the environment we’re deploying the application to. Lets see how our code looks like when adding environment as a parameter:

YamlConfigLoader
import logging
import ntpath
import os

from addict import Dict
from yaml import load, Loader, YAMLError


log = logging.getLogger("config")


class YamlConfigLoader:
    def load(self, env, path):
        path = self.resolve_name_with_environment(env, path)

        if path and os.path.exists(path):
            with open(path, 'r') as ymlfile:
                try:
                    yaml = load(ymlfile, Loader=Loader) (1)
                    conf = Dict(yaml) (2)

                    return conf
                except YAMLError as error:
                    log.error("config/error/yaml: {}".format(error))
        else:
            log.error("config/error/not_found: {}".format(path))

    def resolve_name_with_environment(self, env, path):
        parent, filename = ntpath.split(path)
        name, ext = filename.split(".")

        if env:
            return os.path.join(parent,"{}-{}.{}".format(name, env, ext))
        else:
            return os.path.join(parent, filename)

Now our application may use some system environment variable to receive the name of the environment the app is going to use:

using test environments
def test_simple():
    # given: a yaml loader
    loader = YamlConfigLoader()

    # when: loading a pro configuration file (myapp-pro.yml)
    yaml_pro = loader.load("pro", "myapp.yml")

    # then: we should get a value from pro environment
    assert yaml_pro.database.user == "john_from_pro"

    # when: loading a test configuration file (myapp-test.yml)
    yaml_pro = loader.load("test", "myapp.yml")

    # then: we should get a value from test environment
    assert yaml_pro.database.user == "john_from_test"

Using system environment variables

It’s more and more common to deploy applications as containers. In this kind of environments usually some configuration properties are passed as system environment variables. Can we create our yaml file with some values taken from system environment variables ?

PyYaml to the rescue!. Here we have the config-env.yml file with that idea in mind:

config-env.yml
database:
  dialect: postgres+pg8000
  user: !env USERNAME
  password: !env PASSWORD
  host: !env HOST:localhost
  port: !env PORT

Thanks to PyYaml we can get the values marked with !env and process them to get the value from system environment variables. Moreover, look at the logging configuration, we can also provide a default value following the syntax:

key: !env VARIABLE:default_value

In order to make it work, we’re adding a constructor, which is basically a yaml directive processor responsible for transforming the values found in a yaml node after a specific directive. The constructor is applied globally to PyYaml so you can add the constructor anywhere in your code via the add_constructor function. And the use the previous version of YamlConfigLoader:

adding constructor
import logging
import ntpath
import os

from addict import Dict
from yaml import add_constructor, load, Loader, YAMLError


log = logging.getLogger("config")


def process_env_directive(loader, node):
    log.info("procesing !env: {}".format(node.value))

    node_val = node.value
    splitted = node_val.split(":")

    if len(splitted) == 2: (1)
        key, value = splitted

        return os.environ.get(key) or value (2)
    else:
        return os.environ.get(node_val) (3)


add_constructor(u'!env', process_env_directive)
1 checks whether it has a default value or not
2 if there is a default value tries to resolve env variable if not returns default value
3 if there is not a default value tries to resolve env variable

This way we can test the whole thing with the previous config-env.yml:

using directives
def test_simple():
    # given: a yaml loader
    loader = YamlConfigLoader()

    # when: setting a environment variable
    os.environ["USERNAME"] = "outsider"

    # and: loading a pro configuration file (config-env.yml)
    yaml_pro = loader.load("env", "config.yml")

    # then: we should get a value from environment variable
    assert yaml_pro.database.user == "outsider"

    # and: because we didn't set the logger variable we get it
    # from the default value
    assert yaml_pro.database.host == "localhost"
When calling to yaml.load(…​) make sure you’re using the yaml.Loader loader as described here, otherwise directive processing wont' work

References