De-duplication of configuration information is key when managing large environments which use different types of automation (Terraform, Jenkins, Ansible, scripts executed as Systemd timers, Puppet…). Although many different configuration management tools exist (RDBMS, Consul, …), one of the easiest to use is Hiera or just a plain normal Git repository with YAML
files, in some hierarchical way (which Hiera in theory is).
The YAML configuration hierarchy could be defined as the following file structure:
common.yaml
: Default settings no matter which role or host. These can be overridden in all the below.my_environment01.yaml
: Environment specific configuration (example: development, staging, production, amsterdam, az01, az04, …). These can be overridden in all the below.common/roles/some_server_role.yaml
: A server role, or type definition, which contains role specific configuration parameters. The roles could implement an extra hierarchy as for instance:debian::databases::postgres
debian::databases::postgres::timescale
debian::databases::postgres::timescale::prometheus
debian::loadbalancer::internal
debian::application::request_processor
The hierarchy steps are divided by::
in the above example, and need to be inherited accordingly, each with their own YAML file.
These can be overridden in all the below.
my_environment01/roles/some_server_role.yaml
: override role configuration parameters per environment.
These can even be overridden on host level below.my_environment01/hosts/my_hostname01.yaml
: set host specific configuration parameters. This file is actually always required and should contain at least the IP address of the node and the server role string.
Let’s take the following example: The host vmazdbprm01
has the role debian::databases::postgres::timescale::prometheus
and is deployed in the environment in my_cool_location03
. The configuration management should search for parameters in the following file locations (and first verify if the file path exists):
common.yaml
my_cool_location01.yaml
common/roles/debian.yaml
common/roles/debian::databases.yaml
common/roles/debian::databases::postgres.yaml
common/roles/debian::databases::postgres::timescale.yaml
common/roles/debian::databases::postgres::timescale::prometheus.yaml
my_cool_location01/roles/debian.yaml
my_cool_location01/roles/debian::databases.yaml
my_cool_location01/roles/debian::databases::postgres.yaml
my_cool_location01/roles/debian::databases::postgres::timescale.yaml
my_cool_location01/roles/debian::databases::postgres::timescale::prometheus.yaml
my_cool_location01/hosts/vmazdbprm01.yaml
This means that any code which wants to implement the above configuration management, needs to verify if the above 13 files exists from top to bottom, and if yes loads the YAML file accordingly.
Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services.
Implementing the above YAML
hierarchy in Terraform, could be done as follows:
locals { host_cfg = yamldecode(fileexists("cfgmgmt/${var.environment}/hosts/${var.node}.yaml") ? file("cfgmgmt/${var.environment}/hosts/${var.node}.yaml") : "{server_role: debian}") roles_list = split("::", local.host_cfg.server_role) all_roles_list = [ for index in range(length(local.roles_list)): join("::",slice(local.roles_list, 0, index + 1)) ] common_cfg = yamldecode(fileexists("cfgmgmt/common.yaml") ? file("cfgmgmt/common.yaml") : "{}") common_role_cfg_list = [ for file in local.all_roles_list: yamldecode(fileexists("cfgmgmt/common/roles/${file}.yaml") ? file("cfgmgmt/common/roles/${file}.yaml") : "{}" )] env_cfg = yamldecode(fileexists("cfgmgmt/${var.environment}.yaml") ? file("cfgmgmt/${var.environment}.yaml") : "{}") env_role_cfg_list = [ for file in local.all_roles_list: yamldecode(fileexists("cfgmgmt/${var.environment}/roles/${file}.yaml") ? file("cfgmgmt/${var.environment}/roles/${file}.yaml") : "{}") ] common_role_cfg_map = merge(local.common_role_cfg_list...) env_role_cfg_map = merge(local.env_role_cfg_list...) cfg = merge(local.common_cfg, local.env_cfg, local.common_role_cfg_map, local.env_role_cfg_map, local.host_cfg) }
Let’s have a look what actually happens in the above code.
All YAML files are stored in a Git/ Hiera repository, accessible in the sub-directory cfgmgmt
.
The code declares “local
” variables by issuing the resource “locals
“, starting from line 1.
Line 2 will check if a file called cfgmgmt/${var.environment}/hosts/${var.node}.yaml
exists and if true
, loads the YAML
content as an map into the local variable host_cfg
. If the file doesn’t exists, a default YAML
code will be loaded. In theory, each node/ host must have a file defined as it should have at least data configuration such as:
- unique node host name
- IP address
- server role
- (optionally) VLAN/ subnet configuration
- …
Line 4 splits the server role string, stored in local.host_cfg.server_role
, into a list, to build the server role hierarchy further below.
Line 5 creates a list of top level server roles which need to be imported too. Example: if the server role was set to debian::databases::postgres::timescale::prometheus
, the list all_roles_list
will contain the following elements:
debian
debian::databases
debian::databases::postgres
debian::databases::postgres::timescale
debian::databases::postgres::timescale::prometheus
Line 7 loads the YAML content of common.yaml
, if it exists.
Line 8 loops over the all_roles_list
elements, created on line 5, and will load the YAML content of the server roles (if the file exists) into a list element. The result is a list called common_role_cfg_list
.
Line 11 loads the general environment configuration YAML content (if it exists) into the local variable env_cfg
.
Line 12 will do the same thing as line 8, but for environment specific roles. (for instance: when certain server roles have environment specific configuration parameters).
Lines 15 and 16 will merge the elements (which in theory are maps
of YAML data) of the server role lists into one big map, in the order of the list. This allows that keys can be overridden. The expansion ...
notation is explained at https://www.terraform.io/language/expressions/function-calls#expanding-function-arguments.
Finally on line 18, a local variable called cfg
will be created, which merges the values of:
local.common_cfg
local.env_cfg
local.common_role_cfg_map
local.env_role_cfg_map
local.host_cfg
By providing the environment name and the host/ node name to the above code (as var.environment
and var.node
), all required configuration parameters can be loaded per node in Terraform, but since we’ve used a Git repository, this information can be loaded in any kind of automation tool (required is of course that each automation implements the same kind of hierarchy code).
Hey, have your right about using the terraform hiera provider? It adds his as a data source and can perform proper interpolation using hiera (ie: using the hierarch you have defined in your hiera.yaml). This means that it can perform lookups on a key with context. For example, I have my hierarchy:
– common/*
– environment/%{env}.yaml
– region/%{region}.yaml
So I can define my values at the most appropriate level, and have default values in common overridden by more specific values in environment or region.
To make this work, I typically encode his provider config (ie, the env & region) into the workspace name. So a single codebase can be used to deploy into multiple regions and environments. All your have to do it’s `terraform workspace new prod.us-east-1`, and run plan + apply. Your tf code needs to use a local to parse `${terraform workspace}` into env and region before passing them to the hiera provider.