Introduction
In our previous tutorials we did a couple of things. Firstly, we introduced terraform explaining a few of its key concepts, and secondly we talked about terraform variables. We slowly slowly gain more knowledge around Terraform, experimenting with a few features that will help us to build the right foundations so we can use it in more advanced cases. One key feature that we haven't covered yet is terraform modules, and we aim to do it in this article. Without further ado, let's get started.
If you missed our previous articles, check this and this.
Definition
Modules are containers for multiple resources that are used together. A module consists of a collection of.tf
and/or.tf.json
files kept together in a directory.
The idea of a terraform module doesn't deviate from the idea of a module in any traditional programming language. In all languages, we aim to export some logic and reuse it in different parts of our application, with terraform not being an exception to this. In terraform, we configure multiple resources within files and export that configuration on parent modules that will consume it. With that way, we create abstractions to share functionality, and make a good use case of the DRY principle(Do not Repeat Yourself).
For those coming from a language like javascript, they may need a bit of time to wrap their head around a terraform module, mainly because a terraform module includes all file within a directory, which conflicts with the idea of an ES module that refers to a single file. Nonetheless, this subtle difference doesn't affect the general idea of a module.
Root vs Child Modules
A root module is the top level module that imports other modules and orchestrates their invocation. All terraform setups have at least one root module, with the simplest scenario being a single main.tf
file at the top-level directory. In this case, the top-level directory is the root module and the setup doesn't have any child modules.
While a root module gathers functionality from other modules, a child module exports configuration to other modules to consume. As such, a child module is not called by itself, rather than someone else is responsible to do it. In addition, there isn't any limitation on how many times a parent module can call the child module, nor who may call it. As long as we are satisfied with the infrastructure we build, terraform doesn't complain about it.
Module Structure
The recommended structure of a terraform module is the follow:
main.tf
: includes the resources and configuration we want to export.outputs.tf
: includes the output values that a parent module may consume. Outputs are usually identifiers of our resources that are distributed to other modules as well.variables.tf
: includes inputs that the module accepts. The inputs can alter the behavior of a module and tune it the way we want.
As you may have noticed, each file has its own purpose. While we could have a single file that includes the whole logic, we choose to separate our setup into three files and follow the Soc(Separation of concerns) principle. This gives us more readability and makes the maintenance much easier.
If you can't remember the module structure, memorise this:
A module is like a function that accepts inputs and returns outputs. In variables.tf
we declare the inputs we pass in the function. main.tf
contains the main block of the logic, which in our case are the resources we declare to build the infrastructure, and within outputs.tf
we set the values that are returned back to the parent module.
How you import a module?
Let's assume we have a child module Y and a parent module X. X can import Y using the module
block.
module "X" {
source = "./path/to/child/module/Y"
version = "1.0.0"
arg1 = "arg1"
arg2 = "arg2"
}
There are a few things to note here:
- We use the
source
option to set the location of the child module. - We use the
version
option to specify the version of the module. If is not specified, the latest version will be used. Nonetheless, its a good practise to always set one. - As we discussed a moment ago, a module is a function that accepts inputs and returns outputs. Therefore, if the module requires any arguments, we pass those arguments after the
source
andversion
options. In our case we assume that the module requiresarg1
andarg2
, so we pass them down to itself. To access the outputs of a module you can use this syntaxmodule.[Name].[Output]
Implementing a terraform module
We talked a lot, but we haven't shown any code. In this part we are going to build a terraform module that launches EC2 instances in AWS. We also assume that the module will target different environments, such as stage and prod, which is very common in large organisations. The code is available on Github and you can find it here.
Firstly, let's examine the folder structure:
modules
: we include all child modules that are created.stage
: contains the configuration we use to run the stage environment.prod
: contains the configuration we use to run the prod environment..gitignore
: git file to avoid commiting files such as.terraform
,terraform.tfstate
etc.README.md
: A readme file that contains some information for the repository.
If you navigate within the infrastructure
module, you can see that I follow the same structure we discussed a moment ago. I split the module into three files main.tf
, outputs.tf
, variables.tf
, and also included a readme file to describe the module.
variable "environment" {
description = "target environment"
type = string
default = "stage"
}
variable "org" {
description = "organization name"
type = string
default = "mariossimou"
}
variable "cidr_block" {
description = "cidr block of the vpc"
type = string
}
variable "subnets" {
description = "a map of subnets and their options"
type = map
}
variable "acl_rules" {
description = "list of acl rules"
type = list(object({
rule_number = number
egress = bool
protocol = string
rule_action = string
from_port = number
to_port = number
cidr_block = string
}))
}
variable "security_groups" {
description = "a list of security groups"
type = map
}
variable "instances" {
description = "a map of ec2 instances and options"
type = map
}
The variables.tf
file includes all the inputs the module accepts. All variables include a description
and a type
, mainly to make it clear to the reader what values need to pass. In addition, the environment
and org
variables include a default value, so if I don't specify them, the module will fallback to stage
and mariossimou
values, respectively.
In main.tf
we include the configuration of the module. We declare a few resources and create the right structures to pass them as inputs in the resources.
We create an aws_vpc
and enable dns support and hostnames. This means that all instances within a VPC will be assigned a hostname and can be resolved from Route53 . In addition, we set the environment
and Name
tags so we can easily find it. By default, a vpc comes with its own route table
and network ACL
and we won't need to create them later.
We have also created an internet gateway, which by default is detached from the VPC. This means that we will need to create a link between the route table and the internet gateway in a later stage.
resource "aws_vpc" "vpc" {
cidr_block = var.cidr_block
enable_dns_support = true
enable_dns_hostnames = true
tags = {
environment = var.environment
Name = format("%s-%s-vpc", var.org, var.environment )
}
}
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.vpc.id
tags = {
environment = var.environment
Name = format("%s-%s-igw", var.org, var.environment )
}
}
Below, we update the route table to be the main table of the vpc and then add a route record. The record allows traffic in the vpc and connects it with the internet gateway.
We are also creating a few subnets, which are later connected to the route table. If you are wondering what is the for_each
keyword, it's a meta-argument that allows you to iterate and create multiple resources at once. In this case, it expects a map from var.subnets
and makes available those values through the each.key
and each.value
keywords.
resource "aws_main_route_table_association" "main_rt" {
vpc_id = aws_vpc.vpc.id
route_table_id = aws_vpc.vpc.default_route_table_id
}
resource "aws_route" "rt_all_route" {
route_table_id = aws_vpc.vpc.default_route_table_id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}
resource "aws_subnet" "subnets" {
for_each = var.subnets
vpc_id = aws_vpc.vpc.id
cidr_block = each.value.cidr_block
tags = {
environment = var.environment
Name = format("%s-%s-%s", var.org, var.environment, each.key)
}
}
resource "aws_route_table_association" "rt_subnets_association" {
count = length(local.subnets_ids)
route_table_id = aws_vpc.vpc.main_route_table_id
subnet_id = element(local.subnets_ids, count.index)
}
To improve security we do a couple of things:
- we set a few
ACL
rules in the networkACL
. - create a few security groups and rules to bind with an EC2 instance.
In the locals
block we do some processing and build a few structures that are used in the resource
blocks. For example, the aws_security_group_rule
resource depends on local.security_groups_rules
, which creates an array of rules including the id of each security group.
The count
keyword is also a meta-argument and allows to iterate over a resource. We use the count.index
keyword to access a specific rule on a certain location.
locals {
subnets_ids = [for subnet in aws_subnet.subnets: subnet.id]
security_groups_id_name_map = zipmap(
[for sgName, options in var.security_groups : sgName],
[for sg in aws_security_group.sgs : sg.id]
)
security_groups_rules = flatten(
[for sgName, rules in var.security_groups:
[for rule in rules: merge(rule, {id = lookup(local.security_groups_id_name_map, sgName) })]
]
)
}
resource "aws_network_acl_rule" "network_acl_rules" {
count = length(var.acl_rules)
network_acl_id = aws_vpc.vpc.default_network_acl_id
rule_number = element(var.acl_rules, count.index).rule_number
egress = element(var.acl_rules, count.index).egress
protocol = element(var.acl_rules, count.index).protocol
rule_action = element(var.acl_rules, count.index).rule_action
from_port = element(var.acl_rules, count.index).from_port
to_port = element(var.acl_rules, count.index).to_port
cidr_block = element(var.acl_rules, count.index).cidr_block
}
resource "aws_security_group" "sgs" {
for_each = var.security_groups
vpc_id = aws_vpc.vpc.id
name = format("%s-%s-%s-sg", var.org, var.environment, each.key)
tags = {
environment = var.environment
Name = format("%s-%s-%s-sg", var.org, var.environment, each.key)
}
}
resource "aws_security_group_rule" "sgs_rules" {
count = length(local.security_groups_rules)
security_group_id = element(local.security_groups_rules, count.index).id
type = element(local.security_groups_rules, count.index).type
from_port = element(local.security_groups_rules, count.index).from_port
to_port = element(local.security_groups_rules, count.index).to_port
protocol = element(local.security_groups_rules, count.index).protocol
cidr_blocks = element(local.security_groups_rules, count.index).cidr_blocks
}
Last but not least, we generate a few elastic IPs to associate with the EC2 instances. This allows us to assign a static IP on each instance and use it to forward traffic in it. It also ensures that the IP will always be the same, even after a restart. As such, we won't need to update any DNS records so the instance is still trackable.
Something else that I want to mention is the aws_ami
data block. A data block fetches information for a specific resource, which in this case is the ubuntu
AMI image. All our EC2 instances are based on this image.
locals {
public_servers_names = [for serverName, options in var.instances: serverName if options.public ]
servers_name_ids_map = zipmap(
[for instanceName, options in var.instances: instanceName],
[for server in aws_instance.servers: server.id]
)
}
resource "aws_eip" "eips" {
count = length(local.public_servers_names)
instance = lookup(
local.servers_name_ids_map,
element(local.public_servers_names, count.index)
)
vpc = true
}
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
}
}
resource "aws_instance" "servers" {
for_each = var.instances
ami = data.aws_ami.ubuntu.id
instance_type = each.value.instance_type
key_name = each.value.key_name
subnet_id = local.subnets_ids[0]
vpc_security_group_ids = [ lookup(local.security_groups_id_name_map, each.key) ]
tags = {
environment = var.environment
Name = format("%s-%s-%s-server", var.org, var.environment, each.key)
}
}
Moving on, the outputs.tf
file includes all the values we want to return back to the parent module. It includes the vpc_id
, subnets_ids
, rt_id
, igw_id
and many others.
output "vpc_id" {
description = "vpc id"
value = aws_vpc.vpc.id
}
output "subnets_ids" {
description = "subnets ids"
value = local.subnets_ids
}
output "rt_id" {
description = "route table id"
value = aws_vpc.vpc.default_route_table_id
}
output "igw_id" {
description = "internet gateway id"
value = aws_internet_gateway.igw.id
}
output "security_groups_ids" {
description = "a map of the name and the id of a securit group"
value = local.security_groups_id_name_map
}
output "instances_id" {
description = "a map of instances names and ids"
value = local.servers_name_ids_map
}
output "instances_public_ips" {
description = "a map of instances names and public ips"
value = zipmap(
[for instanceName, options in var.instances: instanceName],
[for server in aws_instance.servers: server.public_ip]
)
}
If you move within the stage
directory, you will see the following structure:
The only addition is the terraform.tfvars
file that includes the values we want to load as environment variables. We include the region
, org
, environment
and web_key_name
within it.
I won't be paying too much attention in variables.tf
and outputs.tf
, mainly because they are very similar to what we discussed, however, I want us to check the content of the main.tf
file.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.0"
}
}
}
provider "aws" {
region = var.region
}
module "infrastructure" {
source = "../modules/infrastructure"
environment = var.environment
org = var.org
cidr_block = "10.0.0.0/16"
subnets = {
primary = {
cidr_block = "10.0.1.0/24"
}
secondary = {
cidr_block = "10.0.2.0/24"
}
}
acl_rules = [
{
rule_number = 20
egress = false
protocol = "tcp"
rule_action = "allow"
from_port = 443
to_port = 443
cidr_block = "0.0.0.0/0"
},
{
rule_number = 40
egress = false
protocol = "tcp"
rule_action = "allow"
from_port = 80
to_port = 80
cidr_block = "0.0.0.0/0"
},
{
rule_number = 60
egress = false
protocol = "tcp"
rule_action = "allow"
from_port = 22
to_port = 22
cidr_block = "0.0.0.0/0"
},
{
rule_number = 70
egress = false
protocol = "tcp"
rule_action = "allow"
from_port = 1024
to_port = 65535
cidr_block = "0.0.0.0/0"
},
{
rule_number = 90
egress = false
protocol = "-1"
rule_action = "deny"
from_port = -1
to_port = -1
cidr_block = "0.0.0.0/0"
}
]
security_groups = {
web = [
{
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
},
{
type = "ingress"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
},
{
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
},
{
type = "egress"
from_port = -1
to_port = -1
protocol= "-1"
cidr_blocks = ["0.0.0.0/0"]
}
]
}
instances = {
web = {
public = true
instance_type = "t2.micro"
key_name = var.web_key_name
}
}
}
At the top, we configure the terraform
block and then set the AWS
region in the provider
block. Then, we import the infrastructure
module, set the source
option and pass the right arguments to it.
In brief, we set the CIDR block of the VPC to be 10.0.0.0/16
and create a primary
and secondary
subnet within it. The former will include the 10.0.1.0/24
CIDR block and the latter 10.0.2.0/24
.
We also set a few ACL rules. Those rules correspond to HTTPS
, HTTP
, SSH
incoming traffic, as well as any traffic between port 1024 and 65535. Any other traffic is disallowed.
Later, we set a security group for the web
EC2 instance. The security group allows HTTP
, HTTPS
and SSH
incoming traffic. In addition, any external traffic is allowed.
At the bottom, we set the configuration of the EC2 instances, which includes a public web
instance of t2.micro
type. We also declare an SSH
key to access it whenever we want.
If you want to run it, cd
within stage and run terraform init && terraform apply -auto-approve
. I assume you are authorised and created a web_key_name
to include in the variables for the EC2 instance. The final output should be similar to this:
The prod
directory contains the same configuration as stage
, mainly because I want to keep the setup separated and leave both modules to evolve independently. Now developers can submit PRs to update the configuration of each environment, without having to update anything from the infrastructure
module. We optimised the way we build infrastructure, modularise our codebase, and simplify the way that developers can interact with it.
Summary
In summary, we covered the follow points:
- A
terraform module
is a container of files that configurate resources and share the configuration to other modules. - A
module
can either be achild
orparent
. A child module includes the configuration of resources, and theparent
module is responsible to consume it. - A child module can be used by other modules as many times as you want.
- The recommended structure of a terraform module includes the
main.tf
,variables.tf
andoutputs.tf
files. Each file has its own responsibility and follows the Soc principle.