How to optimize Terraform for maximum scalability
Learn how Terragrunt helps to address key issues with Terraform workspaces
June 12, 2020 | By: Yasa Vaividh
Here’s a glaring truth: An infrastructure with thousands of Amazon Web Services® (AWS®) instances can easily require up to thousands of AWS® services like Amazon VPC, Amazon RDS, security groups and so on. Yes, thousands. The challenge? How to safely and efficiently scale across your infrastructure with minimal resources. Enter Terraform®. Terraform® by HashiCorp offers modules, workspaces, definition files and templates that can be used to build and manage infrastructure as code (IaC) across a variety of cloud providers.
Terragrunt helps scale and simplify IaC
Terraform® is powerful, often requiring manual manipulation—workspaces alone are not a suitable tool for system decomposition, because each subsystem should have its own separate configuration and back end, and will thus have its own distinct set of workspaces. Scaling efficiently with Terraform® requires that you create repeatable modules that appropriately instantiate common elements into a variety of different back ends—new tools like Terragrunt come into play. Terragrunt organizes Terraform® code in an effective, consistent and scalable way—automating functions and discouraging hard-coded values, reducing risk and helping you ship your IaC faster. With Terragrunt, you can manage infrastructure across your enterprise with fewer resources.
Don’t repeat yourself
The DRY (don’t repeat yourself) principle of coding states that every piece of knowledge must have a single, unambiguous and authoritative representation within a system. This means that no code is to be rewritten or repeated—not even a variable. When discussing DRY Terraform® code, it’s easy to focus on Terraform® modules that define resources and parameters, ignoring what goes into the Terraform® variables (.tfvars) files.
Let’s take a look:
You need to create 1,000 individual AWS® EC2 instances in the same region. Using Terraform® in the traditional way: Create an Amazon® EC2 module and define a provider in the main.tf file. Create a region variable in the provider section, as shown.
After creating the module using different Terraform® resources, create the EC2 instances. Because each instance is related to a different application, you must create 1,000 folders and import the EC2 module into all of them. Next, create a .tfvars file in each folder to pass values to the variables that were defined in the module. For example, if the EC2 instances are created in the us-east-1 region, the value ‘aws_region = “us-east-1”’ is passed 1,000 times. Note: this example describes one variable and one environment. In reality, there will be multiple variables common across different resources at the account, region, Availability Zone (AZ), virtual private cloud (VPC), application and environment levels.
Looking at the above example, it doesn’t look particularly DRY. However, Terragrunt can get you very close to essential complexity—where there’s no room for additional simplification.
To the left, you see a root directory called “Terraform®”—there are two main directories called “modules” and “live.” The modules directory is where we create modules for individual AWS® services. The live directory is where we provision resources inside AWS® using the respective modules. This is where we run commands like:
- terragrunt init
- terragrunt plan
- terragrunt apply
- terragrunt destroy
Again, below you’ll see the structure with all directories expanded. It looks very complicated at first glance. As you spend some time observing the organization, it will start making sense. Consider an application called app1. Regardless of the environment, account or region where it’s deployed, the application doesn’t change.
Why this directory structure?
If you are deploying an application in the dev environment, it will be in a VPC. A VPC resides in a region inside an account, such as “us-east-1.” This structure helps us in two ways:
- It generates highly DRY code
- It creates a “key” path in the back-end S3 bucket automatically
When we create a resource in the dev environment by running terragrunt apply, it stores the terraform.tfstate file in the back-end S3 bucket here: ec2/app1/account1/us-east1/dev/terraform.tfstate
If you create the same application in the “account2” account in the “us-east-1” region of the prod environment, its state file will be stored inside the back-end S3 bucket at: ec2/app1/account2/us-east1/prod/terraform.tfstate
With this design, you no longer have to manually manage the S3 “key” path to store terraform.tfstate files. Terragrunt automatically creates it based on the directory structure.
For a final example, if you are deploying “app2” in “account1” in “us-east-1” in the QA environment, it’s state file would be saved inside the back-end S3 bucket at: ec2/app2/account1/us-east1/qa/terraform.tfstate
Inside the magic
The two files that make this happen are terragrunt.hcl and the account-based .hcl files inside the EC2 directory. Here’s what’s inside: terragrunt.hcl
In the above picture, “arguments” is how we direct the required .tfvars files to pass values to the variables defined in the module. This helps us declare the values once and call them when required.
Values unique to every resource being created must be declared in the resources.tfvars file. If there is a variable already declared at a higher-level .tfvars file, we can overwrite it by declaring it inside resources.tfvars with a different value. account1.hcl
Notice the “include” block inside terragrunt.hcl and the “key” value inside account1.hcl. First, “include” tries to find the account1.hcl file in all the chain of directories and tracks the directory structure it went through. It then replaces ${path_relative_to_include()} with the relative directory path it followed. That’s how directories inside the back-end S3 are automatically created.
Constant experimentation
It takes time and frequent testing to find the optimal directory structure to address your organization’s requirements. Consider refactoring code when you encounter a new challenge. Terragrunt is an important new tool that helps address key issues with Terraform® workspaces.
Yasa Vaividh is a practice architect with TEKsystems Global Services. Over the last seven years, he has worked in number of technical and leadership roles in the cloud and DevOps field. He has expertise in cloud architecture, DevOps, IaC, CI/CD, containerized environments, monitoring and automation.