EMR Cluster using boto3 - Service Role Insufficient Permissions EMR Cluster Creation Troubleshooting Service Role Insufficient Permissions with Boto3 Problem You re trying to create an EMR cluster using Boto3 in Python but e 3 min read 04-10-2024 9
AWS EMR - Output from print statement in pyspark job not present in log files Understanding AWS EMR Why Print Statements in Py Spark Jobs May Not Show Up in Log Files When working with AWS EMR Elastic Map Reduce and Py Spark many develope 3 min read 29-09-2024 4
Spark UI not showing details in EMR 7 Troubleshooting Spark UI Not Showing Details in EMR 7 Problem Overview If you re utilizing Amazon EMR Elastic Map Reduce 7 and find that the Spark UI is not dis 2 min read 26-09-2024 15
Spark aggregate on multiple columns or a hash Understanding Spark Aggregate on Multiple Columns or a Hash Apache Spark is a powerful open source engine for big data processing known for its speed and ease o 3 min read 24-09-2024 15
AWS EMR Jupyterhub notebook run fails with error: Session isn't active Troubleshooting AWS EMR Jupyter Hub Notebook Session Isnt Active Error When working with Jupyter Hub notebooks on AWS EMR Elastic Map Reduce users may encounter 3 min read 18-09-2024 14
Spark EMR long running transformation job GC is taking more time Optimizing Spark EMR Long Running Transformation Jobs Dealing with GC Overhead When working with Amazon EMR Elastic Map Reduce for big data processing Spark is 3 min read 16-09-2024 17
How to enable "Use for Hive table metadata" in "AWS Glue Data Catalog settings" using Terraform? How to Enable Use for Hive Table Metadata in AWS Glue Data Catalog Settings Using Terraform When working with AWS Glue Data Catalog one of the important setting 3 min read 15-09-2024 22
How to read large zip files in pyspark Unzipping and Processing Large Zip Files in Py Spark A Practical Guide Working with large zip files in Py Spark can be a challenge especially when dealing with 3 min read 05-09-2024 15
SPARK on EMR Container from a bad node Troubleshooting Spark on EMR Container Errors A Case Study This article delves into a common challenge encountered when running Spark applications on AWS EMR cl 2 min read 05-09-2024 28
Hive "Show Tables" Fails with MetaException Troubleshooting Hive Show Tables Errors A Deep Dive Encountering a Meta Exception when attempting to execute show tables in Hive can be frustrating This error o 2 min read 04-09-2024 17
Apache Crunch Job On AWS EMR using Oozie Running Apache Crunch Jobs on AWS EMR with Oozie Troubleshooting Write Issues This article explores a common issue encountered when running Apache Crunch jobs w 3 min read 02-09-2024 17
How to use EMR studio notebooks with EMR serverless Mastering EMR Studio Notebooks with EMR Serverless A Guide to Kernel Selection and Permissions EMR Serverless offers a powerful and cost effective way to run bi 2 min read 02-09-2024 16
DBT Spark on EMR using AWS Glue Data Catalog Leveraging DBT Spark with AWS Glue Data Catalog Building a Modern Lakehouse on EMR The world of data warehousing is rapidly shifting towards lakehouse architect 2 min read 02-09-2024 29
Spark emr jobs: Is the number of task defined by AQE (adaptive.enabled)? Understanding Spark AQE and Task Count on EMR When working with Spark jobs on Amazon EMR understanding how Sparks Adaptive Query Execution AQE impacts task exec 2 min read 02-09-2024 15
ClassCastException in Spark SQL Incremental Load with DBT Troubleshooting Class Cast Exception in Spark SQL Incremental Load with DBT This article explores a common issue encountered when implementing incremental loads 3 min read 01-09-2024 17
What does retry in SparkUI means? Understanding Retry in Spark UI A Deep Dive into Task Failures and Adaptive Query Execution When analyzing your Spark applications performance in the Spark UI y 3 min read 01-09-2024 18
EMR-Spark Job creating max 1000 partitions/task when AQE is enabled EMR Spark Jobs Understanding Partition Limits and Adaptive Query Execution AQE Adaptive Query Execution AQE is a powerful feature in Spark that optimizes query 3 min read 01-09-2024 14
Troubleshooting Kafka Integration with Spark Streaming on Amazon EMR Serverless Troubleshooting Kafka Integration with Spark Streaming on Amazon EMR Serverless This article will dive into the common challenges faced when integrating Kafka w 3 min read 31-08-2024 15
Spark EMR Shuffle Read Fetch Wait Time is in 4hrs Decoding the Spark EMR Shuffle Read Fetch Wait Time Nightmare A 4 Hour Delay Solved Have you ever encountered a Spark job that inexplicably takes hours to compl 3 min read 31-08-2024 19
Spark Repartition/shuffle optimization Optimizing Spark Repartition and Shuffle Operations A Deep Dive Repartitioning data in Apache Spark is a crucial step for parallel processing and efficient data 2 min read 30-08-2024 17
Airflow error while creating EMR cluster via DAG Troubleshooting Invalid Instance Profile Error When Creating EMR Clusters with Airflow Creating EMR clusters within your workflow is a powerful capability but s 3 min read 29-08-2024 16
AWS EMR - reading multiple "zip" files from S3 bucket returns Your key is too long Your key is too long Debugging S3 File Access Issues in AWS EMR When working with large datasets in AWS EMR reading data from S3 buckets is a common operation H 2 min read 28-08-2024 13
Apache oozie JA008 error - job state changed from SUCCEDED to FAILED Decoding the Apache Oozie JA 008 Error Why Your Successful Job Suddenly Fails Apache Oozie is a powerful workflow engine for managing complex data processing pi 3 min read 28-08-2024 15
Spark-Scala vs Pyspark Dag is different? Spark Scala vs Py Spark DAG Differences and Performance Variations This article delves into the differences between Spark Scala and Py Spark DAGs Directed Acycl 2 min read 27-08-2024 24
Does spark shuffle/exchange converts compress data to uncompress form? h1 Does Spark Shuffle Exchange Convert Compressed Data to Uncompressed Form h1 This article will explore the relationship between data compression in Apache Spa 4 min read 27-08-2024 18