Wednesday, February 8, 2023
HomeBig DataAmazon EMR on EKS will get as much as 19% efficiency increase...

Amazon EMR on EKS will get as much as 19% efficiency increase operating on AWS Graviton3 Processors vs. Graviton2

Amazon EMR on EKS is a deployment possibility that allows you to run Spark workloads on Amazon Elastic Kubernetes Service (Amazon EKS) simply. It permits you to innovate quicker with the most recent Apache Spark on Kubernetes structure whereas benefiting from the performance-optimized Spark runtime powered by Amazon EMR. This deployment possibility elects Amazon EKS as its underlying compute to orchestrate containerized Spark functions with higher value efficiency.

AWS regularly innovates to offer alternative and higher price-performance for our prospects, and the third-generation Graviton processor is the following step within the journey. Amazon EMR on EKS now helps Amazon Elastic Compute Cloud (Amazon EC2) C7g—the most recent AWS Graviton3 occasion household. On a single EKS cluster, we measured EMR runtime for Apache Spark efficiency by evaluating C7g with C6g households throughout chosen occasion sizes of 4XL, 8XL and 12XL. We’re excited to watch a most 19% efficiency achieve over the sixth technology C6g Graviton2 situations, which ends up in a 15% price discount.

On this put up, we talk about the efficiency check outcomes that we noticed whereas operating the identical EMR Spark runtime on completely different Graviton-based EC2 occasion varieties.

For some use instances, such because the benchmark check, operating an information pipeline that requires a mixture of CPU varieties for the granular-level price effectivity, or migrating an present software from Intel to Graviton-based situations, we often spin up completely different clusters that host separate varieties of processors, akin to x86_64 vs. arm64. Nevertheless, Amazon EMR on EKS has made it simpler. On this put up, we additionally present steering on operating Spark with a number of CPU architectures in a typical EKS cluster, in order that we will save vital effort and time on establishing a separate cluster to isolate the workloads.

Infrastructure innovation

AWS Graviton3 is the most recent technology of AWS-designed Arm-based processors, and C7g is the primary Graviton3 occasion in AWS. The C household is designed for compute-intensive workloads, together with batch processing, distributed analytics, information transformations, log evaluation, and extra. Moreover, C7g situations are the primary within the cloud to characteristic DDR5 reminiscence, which supplies 50% increased reminiscence bandwidth in comparison with DDR4 reminiscence, to allow high-speed entry to information in reminiscence. All these improvements are well-suited for large information workloads, particularly the in-memory processing framework Apache Spark.

The next desk summarizes the technical specs for the examined occasion varieties:

Occasion Identify vCPUs Reminiscence (GiB) EBS-Optimized Bandwidth (Gbps) Community Bandwidth (Gbps) On-Demand Hourly Price
c6g.4xlarge 16 32 4.75 As much as 10 $0.544
c7g.4xlarge 16 32 As much as 10 As much as 15 $0.58
c6g.8xlarge 32 64 9 12 $1.088
c7g.8xlarge 32 64 10 15 $1.16
c6g.12xlarge 48 96 13.5 20 $1.632
c7g.12xlarge 48 96 15 22.5 $1.74

These situations are all constructed on AWS Nitro System, a group of AWS-designed {hardware} and software program improvements. The Nitro System offloads the CPU virtualization, storage, and networking capabilities to devoted {hardware} and software program, delivering efficiency that’s almost indistinguishable from naked steel. Particularly, C7g situations have included help for Elastic Material Adapter (EFA), which turns into the usual on this occasion household. It permits our functions to speak instantly with community interface playing cards offering decrease and extra constant latency. Moreover, these are all Amazon EBS-optimized situations, and C7g supplies increased devoted bandwidth for EBS volumes, which can lead to higher I/O efficiency contributing to faster learn/write operations in Spark.

Efficiency check outcomes

To quantify efficiency, we ran TPC-DS benchmark queries for Spark with a 3TB scale. These queries are derived from TPC-DS customary SQL scripts, and the check outcomes usually are not akin to different revealed TPC-DS benchmark outcomes. Other than the benchmark requirements, a single Amazon EMR 6.6 Spark runtime (appropriate with Apache Spark model 3.2.0) was used as the info processing engine throughout six completely different managed node teams on an EKS cluster: C6g_4, C7g_4,C6g_8, C7g_8, C6g_12, C7g_12. These teams are named after occasion sort to differentiate the underlying compute sources. Every group can robotically scale between 1 and 30 nodes inside its corresponding occasion sort. Architecting the EKS cluster in such a means, we will run and evaluate our experiments in parallel, every of which is hosted in a single node group, i.e., an remoted compute setting on a typical EKS cluster. It additionally makes it doable to run an software with a number of CPU architectures on the only cluster. Try the pattern EKS cluster configuration and benchmark job examples for extra particulars.

We measure the Graviton efficiency and value enhancements utilizing two calculations: whole question runtime and geometric imply of the overall runtime. The next desk exhibits the outcomes for equal sized C6g and C7g situations and the identical Spark configurations.

Benchmark Attributes 12 XL 8 XL 4 XL
Job parallelism (spark.executor.core*spark.executor.situations) 188 cores (4*47) 188 cores (4*47) 188 cores (4*47)
spark.executor.reminiscence 6 GB 6 GB 6 GB
Variety of EC2 situations 5 7 16
EBS quantity 4 * 128 GB io1 disk 4 * 128 GB io1 disk 4 * 128 GB io1 disk
Provisioned IOPS per quantity 6400 6400 6400
Complete question runtime on C6g (sec) 2099 2098 2042
Complete question runtime on C7g (sec) 1728 1738 1660
Complete run time enchancment with C7g 18% 17% 19%
Geometric imply question time on C6g (sec) 9.74 9.88 9.77
Geometric imply question time on C7g (sec) 8.40 8.32 8.08
Geometric imply enchancment with C7g 13.8% 15.8% 17.3%
EMR on EKS reminiscence utilization price on C6g (per run) $0.28 $0.28 $0.28
EMR on EKS vCPU utilization price on C6g (per run) $1.26 $1.25 $1.24
Complete price per benchmark run on C6g (EC2 + EKS cluster + EMR value) $6.36 $6.02 $6.52
EMR on EKS reminiscence utilization price on C7g (per run) $0.23 $0.23 $0.22
EMR on EKS vCPU utilization price on C7g (per run) $1.04 $1.03 $0.99
Complete price per benchmark run on C7g (EC2 + EKS cluster + EMR value) $5.49 $5.23 $5.54
Estimated price discount with C7g 13.7% 13.2% 15%

The full variety of cores and reminiscence are equivalent throughout all benchmarked situations, and 4 provisioned IOPS SSD disks had been connected to every EBS-optimized occasion for the optimum disk I/O efficiency. To permit for comparability, these configurations had been deliberately chosen to match with settings in different EMR on EKS benchmarks. Try the earlier benchmark weblog put up Amazon EMR on Amazon EKS supplies as much as 61% decrease prices and as much as 68% efficiency enchancment for Spark workloads for C5 situations primarily based on x86_64 Intel CPU.

The desk signifies C7g situations have constant efficiency enchancment in comparison with equal C6g Graviton2 situations. Our check outcomes confirmed 17–19% enchancment in whole question runtime for chosen occasion sizes, and 13.8–17.3% enchancment in geometric imply. On price, we noticed 13.2–15% price discount on C7g efficiency checks in comparison with C6g whereas operating the 104 TPC-DS benchmark queries.

Information shuffle in a Spark workload

Usually, huge information frameworks schedule computation duties for various nodes in parallel to attain optimum efficiency. To proceed with its computation, a node should have the outcomes of computations from upstream. This requires transferring intermediate information from a number of servers to the nodes the place information is required, which is termed as shuffling information. In lots of Spark workloads, information shuffle is an inevitable operation, so it performs an vital position in efficiency assessments. This operation might contain a excessive fee of disk I/O, community information transmission, and will burn a big quantity of CPU cycles.

In case your workload is I/O certain or bottlenecked by present information shuffle efficiency, one advice is to benchmark on improved {hardware}. General, C7g gives higher EBS and community bandwidth in comparison with equal C6g occasion varieties, which can aid you optimize efficiency. Subsequently, in the identical benchmark check, we captured the next additional info, which is damaged down into per-instance-type community/IO enhancements.

Based mostly on the TPC-DS question check consequence, this graph illustrates the share will increase of information shuffle operations in 4 classes: most disk learn and write, and most community obtained and transmitted. Compared to c6g situations, the disk learn efficiency improved between 25–45%, whereas the disk write efficiency improve was 34–47%. On the community throughput comparability, we noticed a rise of 21–36%.

Run an Amazon EMR on EKS job with a number of CPU architectures

When you’re evaluating migrating to Graviton situations for Amazon EMR on EKS workloads, we suggest testing the Spark workloads primarily based in your real-world use instances. If you might want to run workloads throughout a number of processor architectures, for instance check the efficiency for Intel and Arm CPUs, comply with the walkthrough on this part to get began with some concrete concepts.

Construct a single multi-arch Docker picture

To construct a single multi-arch Docker picture (x86_64 and arm64), full the next steps:

  1. Get the Docker Buildx CLI extension.Docker Buildx is a CLI plugin that extends the Docker command to help the multi-architecture characteristic. Improve to the most recent Docker desktop or manually obtain the CLI binary. For extra particulars, take a look at Working with Buildx.
  2. Validate the model after the set up:
  3. Create a brand new builder that offers entry to the brand new multi-architecture options (you solely need to carry out this activity as soon as):
    docker buildx create --name mybuilder --use

  4. Log in to your personal Amazon ECR registry:
    ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output textual content)
    aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_URL

  5. Get the EMR Spark base picture from AWS:
    docker pull $SRC_ECR_URL/spark/emr-6.6.0:newest
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $SRC_ECR_URL

  6. Construct and push a customized Docker picture.

On this case, we construct a single Spark benchmark utility docker picture on high of Amazon EMR 6.6. It helps each Intel and Arm processor architectures:

  • linux/amd64 – x86_64 (also referred to as AMD64 or Intel 64)
  • linux/arm64 – Arm
docker buildx construct 
--platform linux/amd64,linux/arm64 
-t $ECR_URL/eks-spark-benchmark:emr6.6 
-f docker/benchmark-util/Dockerfile 
--build-arg SPARK_BASE_IMAGE=$SRC_ECR_URL/spark/emr-6.6.0:newest 
--push .

Submit Amazon EMR on EKS jobs with and with out Graviton

For our first instance, we submit a benchmark job to the Graviton3 node group that spins up c7g.4xlarge situations.

The next just isn’t an entire script. Try the full model of the instance on GitHub.

aws emr-containers start-job-run 
--virtual-cluster-id $VIRTUAL_CLUSTER_ID 
--name emr66-c7-4xl 
--execution-role-arn $EMR_ROLE_ARN 
--release-label emr-6.6.0-latest 
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "native:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "sparkSubmitParameters": "........"}}' 
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "": “C7g_4”

Within the following instance, we run the identical job on non-Graviton C5 situations with Intel 64 CPU. The full model of the script is obtainable on GitHub.

aws emr-containers start-job-run 
--virtual-cluster-id $VIRTUAL_CLUSTER_ID 
--name emr66-c5-4xl 
--execution-role-arn $EMR_ROLE_ARN 
--release-label emr-6.6.0-latest 
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "native:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "sparkSubmitParameters": "........"}}'     
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "”: “C5_4”


In Could 2022, the Graviton3 occasion household was made out there to Amazon EMR on EKS. After operating the performance-optimized EMR Spark runtime on the chosen newest Arm-based Graviton3 situations, we noticed as much as 19% efficiency improve and as much as 15% price financial savings in comparison with C6g Graviton2 situations. As a result of Amazon EMR on EKS gives 100% API compatibility with open-source Apache Spark, you possibly can shortly step into the analysis course of with no software adjustments.

When you’re questioning how a lot efficiency achieve you possibly can obtain along with your use case, check out the benchmark answer or the EMR on EKS Workshop. You may as well contact your AWS Options Architects, who could be of help alongside your innovation journey.

In regards to the creator

Melody Yang is a Senior Massive Information Resolution Architect for Amazon EMR at AWS. She is an skilled analytics chief working with AWS prospects to offer greatest observe steering and technical recommendation so as to help their success in information transformation. Her areas of pursuits are open-source frameworks and automation, information engineering and DataOps.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments