Import configuration from Hiera or a Git repository with YAML files into Terraform


De-duplication of configuration information is key when managing large environments which use different types of automation (Terraform, Jenkins, Ansible, scripts executed as Systemd timers, Puppet…). Although many different configuration management tools exist (RDBMS, Consul, …), one of the easiest to use is Hiera or just a plain normal Git repository with YAML files, in some hierarchical way (which Hiera in theory is).

The YAML configuration hierarchy could be defined as the following file structure:

  • common.yaml: Default settings no matter which role or host. These can be overridden in all the below.
  • my_environment01.yaml: Environment specific configuration (example: development, staging, production, amsterdam, az01, az04, …). These can be overridden in all the below.
  • common/roles/some_server_role.yaml: A server role, or type definition, which contains role specific configuration parameters. The roles could implement an extra hierarchy as for instance:
    • debian::databases::postgres
    • debian::databases::postgres::timescale
    • debian::databases::postgres::timescale::prometheus
    • debian::loadbalancer::internal
    • debian::application::request_processor

      The hierarchy steps are divided by :: in the above example, and need to be inherited accordingly, each with their own YAML file.
      These can be overridden in all the below.
  • my_environment01/roles/some_server_role.yaml: override role configuration parameters per environment.
    These can even be overridden on host level below.
  • my_environment01/hosts/my_hostname01.yaml: set host specific configuration parameters. This file is actually always required and should contain at least the IP address of the node and the server role string.

Let’s take the following example: The host vmazdbprm01 has the role debian::databases::postgres::timescale::prometheus and is deployed in the environment in my_cool_location03. The configuration management should search for parameters in the following file locations (and first verify if the file path exists):

  1. common.yaml
  2. my_cool_location01.yaml
  3. common/roles/debian.yaml
  4. common/roles/debian::databases.yaml
  5. common/roles/debian::databases::postgres.yaml
  6. common/roles/debian::databases::postgres::timescale.yaml
  7. common/roles/debian::databases::postgres::timescale::prometheus.yaml
  8. my_cool_location01/roles/debian.yaml
  9. my_cool_location01/roles/debian::databases.yaml
  10. my_cool_location01/roles/debian::databases::postgres.yaml
  11. my_cool_location01/roles/debian::databases::postgres::timescale.yaml
  12. my_cool_location01/roles/debian::databases::postgres::timescale::prometheus.yaml
  13. my_cool_location01/hosts/vmazdbprm01.yaml

This means that any code which wants to implement the above configuration management, needs to verify if the above 13 files exists from top to bottom, and if yes loads the YAML file accordingly.

Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services.

Implementing the above YAML hierarchy in Terraform, could be done as follows:

locals {
  host_cfg             = yamldecode(fileexists("cfgmgmt/${var.environment}/hosts/${var.node}.yaml") ? file("cfgmgmt/${var.environment}/hosts/${var.node}.yaml") : "{server_role: debian}")

  roles_list           = split("::", local.host_cfg.server_role)
  all_roles_list       = [ for index in range(length(local.roles_list)): join("::",slice(local.roles_list, 0, index + 1))  ]

  common_cfg           = yamldecode(fileexists("cfgmgmt/common.yaml") ? file("cfgmgmt/common.yaml") : "{}")
  common_role_cfg_list = [ for file in local.all_roles_list:
      yamldecode(fileexists("cfgmgmt/common/roles/${file}.yaml") ? file("cfgmgmt/common/roles/${file}.yaml") : "{}" )]
  env_cfg              = yamldecode(fileexists("cfgmgmt/${var.environment}.yaml") ? file("cfgmgmt/${var.environment}.yaml") : "{}")
  env_role_cfg_list    = [ for file in local.all_roles_list:
      yamldecode(fileexists("cfgmgmt/${var.environment}/roles/${file}.yaml") ? file("cfgmgmt/${var.environment}/roles/${file}.yaml") : "{}") ]
  common_role_cfg_map  = merge(local.common_role_cfg_list...)
  env_role_cfg_map     = merge(local.env_role_cfg_list...)

  cfg                  = merge(local.common_cfg, local.env_cfg, local.common_role_cfg_map, local.env_role_cfg_map, local.host_cfg)

Let’s have a look what actually happens in the above code.

All YAML files are stored in a Git/ Hiera repository, accessible in the sub-directory cfgmgmt.

The code declares “local” variables by issuing the resource “locals“, starting from line 1.

Line 2 will check if a file called cfgmgmt/${var.environment}/hosts/${var.node}.yaml exists and if true, loads the YAML content as an map into the local variable host_cfg. If the file doesn’t exists, a default YAML code will be loaded. In theory, each node/ host must have a file defined as it should have at least data configuration such as:

  • unique node host name
  • IP address
  • server role
  • (optionally) VLAN/ subnet configuration

Line 4 splits the server role string, stored in local.host_cfg.server_role, into a list, to build the server role hierarchy further below.

Line 5 creates a list of top level server roles which need to be imported too. Example: if the server role was set to debian::databases::postgres::timescale::prometheus , the list all_roles_list will contain the following elements:

  • debian
  • debian::databases
  • debian::databases::postgres
  • debian::databases::postgres::timescale
  • debian::databases::postgres::timescale::prometheus

Line 7 loads the YAML content of common.yaml, if it exists.

Line 8 loops over the all_roles_list elements, created on line 5, and will load the YAML content of the server roles (if the file exists) into a list element. The result is a list called common_role_cfg_list.

Line 11 loads the general environment configuration YAML content (if it exists) into the local variable env_cfg.

Line 12 will do the same thing as line 8, but for environment specific roles. (for instance: when certain server roles have environment specific configuration parameters).

Lines 15 and 16 will merge the elements (which in theory are maps of YAML data) of the server role lists into one big map, in the order of the list. This allows that keys can be overridden. The expansion ... notation is explained at https://www.terraform.io/language/expressions/function-calls#expanding-function-arguments.

Finally on line 18, a local variable called cfg will be created, which merges the values of:

  • local.common_cfg
  • local.env_cfg
  • local.common_role_cfg_map
  • local.env_role_cfg_map
  • local.host_cfg

By providing the environment name and the host/ node name to the above code (as var.environment and var.node ), all required configuration parameters can be loaded per node in Terraform, but since we’ve used a Git repository, this information can be loaded in any kind of automation tool (required is of course that each automation implements the same kind of hierarchy code).

1 comment
  1. Hey, have your right about using the terraform hiera provider? It adds his as a data source and can perform proper interpolation using hiera (ie: using the hierarch you have defined in your hiera.yaml). This means that it can perform lookups on a key with context. For example, I have my hierarchy:

    – common/*
    – environment/%{env}.yaml
    – region/%{region}.yaml

    So I can define my values at the most appropriate level, and have default values in common overridden by more specific values in environment or region.

    To make this work, I typically encode his provider config (ie, the env & region) into the workspace name. So a single codebase can be used to deploy into multiple regions and environments. All your have to do it’s `terraform workspace new prod.us-east-1`, and run plan + apply. Your tf code needs to use a local to parse `${terraform workspace}` into env and region before passing them to the hiera provider.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Post

IPTables Logs in Loki and Grafana (with Promtail)

Next Post

Perl script to monitor the rate of logs

Related Posts