Coherent load balancing of jobs across EC2 instances
Technical white paper
April 5, 2018 | By Mayank Rajput and Tatwika Kashyap
In today’s world of big data, we need distributed computing to be able to scale and provide viable data applications. The new tools and technology come with inherent support for distributed processing and storage, addressing the scalability and availability problems natively. Not all tools leveraged for developing data solutions support distributed computing. Traditionally, the scalability and availability problems with these tools are solved by introducing load balancing architecture in the application layer. In this architecture pattern, applications performing the same job run on multiple servers to provide scalability and high availability.
This process of load balancing of applications across servers aims at optimizing resources use and distributes load across resources. In this scenario, the application architecture needs to take care of synchronization of processes between the load balanced resources.
For our use case, we have traditional batch jobs moving data from databases into AWS S3 buckets, which need to be load balanced across multiple EC2 instances. Most of the traditional jobs read/write metadata and outputs files to the local hard disk. When we take advantage of load balancing by deciding to kick off a job from the pool of EC2 instances available to us, we are faced with the challenge of synchronizing the metadata files and the output/input files created by the jobs across the EC2 instance pool.
A classic and a simple case about this synchronization problem can be explained below:
Day1: Jobs running on EC2 instance 1 reads/writes metadata files.
Day2: Due to load balancing, the same job is running on EC2 instance 2, which then access the metadata files residing on instance 2 which is outdated as instance 1 has performed certain operations on the metadata file.
To mitigate such challenges it is critical to ensure that the application architecture takes into consideration the metadata synchronization process as part of the solution.
Solution
Amazon EFS can be used as a central file storage system. EFS is automatically mounted on the EC2 instances during their launch by running a launch script. Therefore, any data that is being written by a job running on any instance gets stored in the EFS. This data can then be accessed by jobs running on any other instance.
Advantages, shortcomings and a workaround
Advantages: EFS automatically manages file storage configuration and infrastructure.
Shortcomings: The solution above provides a coherent way to use a file system among instances. But, it does not provide information about the EC2 instances that have EFS mounted on them. Currently, the number of connections can be visualized using AWS CloudWatch but AWS does not provide a way to directly check the instances that are mounting a file system.
Workaround: To address this problem, we build a custom event-based solution to track and store instance details as EC2 instances get instantiated.
Implementation of Solution
Amazon EC2 instance can be configured to mount an Amazon EFS automatically while getting launched with a script that works with cloud-init. The script is added during the Launch Instance wizard of the EC2 management console.The automation script would handle the below activities
- Installation of NFS client
- Captures the details of the EC2 instance being instantiated and the EFS has to be mounted (DNS and directory structures)
- Mount the EFS whenever an EC2 instance is instantiated and make available the shared metadata
- Maintain and update any dependencies for the script execution
Note: Create a Python file on EFS before proceeding with the steps below. Also, provide the appropriate file names for “ConnectionsWithEFS.txt and InstancesMountingEFS.txt” in the Python script. The script also writes logs on CloudWatch (By default - “us-east-1”).
Sample File: python-file-name.py
################################# Getting Private Ips of the EC2 instances attached with EFS#################################
###Date: March 3, 2018###
###Author: Mayank Rajput ###
import logging
import socket
import hashlib
##### Writing CLoud watch logs in us-east-1 #####
##### Log Group :/var/log/messages #####
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
try:### Getting Private IP of the machine ###
Private_IP= socket.gethostbyname(socket.gethostname())
### Logging IP of the machine on cloudwatch ###
logger.info('EFS mounted on ip {}'.format(Private_IP))
### Appending IP of newly added instance in .txt file
file = open("ConnectionsWithEFS.txt", "a+")
file.write("\n %s" % Private_IP)
file.close()
completed_lines_hash = set()
file = open("ConnectionsWithEFS.txt", "r")
### Writing Unizque IP's in second text file ###
### Considering performance of the script we are writing new file ###
lines=file.readlines()
output_file=open("InstancesMountingEFS.txt","w+")
for line in lines:
hashValue = hashlib.md5(line.rstrip().encode('utf-8')).hexdigest()
if hashValue not in completed_lines_hash:
# print(line)
completed_lines_hash.add(hashValue)
output_file.write(line)
output_file.close()
except Exception as e:
print(e)
logger.error('something went wrong')
In the Configure Instance Details step, configure instance details, expand the Advanced section, and then do the following:
Paste the following script into User data. You must update the script by providing the appropriate values for file-system-id, aws-region, efs-mount-point, python-file-name.py
We provide the script:
---
#cloud-config
package_upgrade: true
packages:
- nfs-utils
runcmd:
- mkdir -p / efs-mount-point/
- chown ec2-user:ec2-user / efs-mount-point/
- sudo yum update -y
- sudo yum install -y awslogs
- sudo service awslogs start
- sudo chkconfig awslogs on
- echo "file-system-id.efs.aws-region.amazonaws.com:/ /efs-mount-point nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 0 0" >> /etc/fstab - mount -a -t nfs4
- cd / efs-mount-point/
- python python-file-name.py
After successfully following the steps above, IPs of all the newly launched instances can be found in the “InstancesMountingEFS.txt” file.
Long-Term Focus: Design an API that will be able to perform everything (from instance launch to find client connection) on the user’s behalf.
Result: Successfully synchronize data among instances and create logs of the connections that are being made to the EFS.
Authors:
Mayank Rajput at marajput @ teksystems.com
Tatwika Kashyap at tkashyap @ teksystems.com