New AWS Glue 4.0 – New and Updated Engines, More Data Formats, and More

November 28, 2022

Jeff Barr

AWS Glue is a scalable, serverless tool that helps you to accelerate the development and execution of your data integration and ETL workloads. Today we are launching Glue 4.0, with updated engines, support for additional data formats, Ray support, and a lot more.

Before I dive in, just a word about versioning. Unlike most AWS services, where the service team owns and has full control over the APIs, Glue includes a collection of libraries, engines, and tools developed by the open source community. Some of these components do not maintain strict backward compatibility, often in pursuit of efficiency. In order to make sure that changes to the components do not impact your Glue jobs, you must select a particular Glue version when you create the job.

Each version of Glue includes performance and reliability benefits in addition to the added features, and you should plan to upgrade your jobs over time to take advantage of all that Glue has to offer.

Dive in to Glue
Let’s take a look at what’s new in Glue 4.0:

Updated Engines – This version of Glue includes Python 3.10 and Apache Spark 3.3.0. Both engines include bug fixes and performance enhancements; Spark includes new features such as row-level runtime filtering, improved error messages, additional built-in functions, and much more. Glue and Amazon EMR make use of the same optimized Spark runtime, which has been optimized to run in the AWS cloud and can be 2-3 times faster than the basic open source version.

New Engine Plugins – Glue 4.0 adds native support for the Cloud Shuffle Service Plugin for Spark to help you scale your disk usage, and Adaptive Query Execution to dynamically optimize your queries as they run.

Pandas Support – Pandas is an open source data analysis and manipulation tool that is built on top of Python. It is easy to learn and includes all kinds of interesting and useful data manipulation functions.

New Data Formats – Whether you are building a data lake or a data warehouse, Glue 4.0 now handles new open source data formats for sources and targets, with support for Apache Hudi, Apache Iceberg, and Delta Lake. To learn more about these new options and formats, read Get Started with Apache Hudi using AWS Glue by Implementing Key Design Concepts.

Everything Else – In addition to the above items, Glue 4.0 also includes the Parquet vectorized reader, with support for additional data types and encodings. It has been upgraded to use log4j 2 and is no longer dependent on log4j 1.

Available Now
Glue 4.0 is available today in the US East (Ohio, N. Virginia), US West (N. California, Oregon), Africa (Cape Town), Asia Pacific (Hong Kong, Jakarta, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Stockholm), Middle East (Bahrain), and South America (Sao Paulo) AWS Regions.

— Jeff;

New AWS Glue 4.0 – New and Updated Engines, More Data Formats, and More

Introducing resource control policies (RCPs), a new type of authorization policy in AWS Organizations

AWS BuilderCards second edition at re:Invent 2024

AWS Weekly Roundup: 20 years of AWS News Blog, Express brokers for Amazon MSK, Windows Server 2025 images on EC2, and more (Nov 11, 2024)

Announcing new APIs for Amazon Location Service Routes, Places, and Maps

Introducing Express brokers for Amazon MSK to deliver high throughput and faster scaling for your Kafka clusters

Introducing the last cohort of AWS Heroes this year – November 2024

AWS Weekly Roundup: AWS Lambda, Amazon Bedrock, Amazon Redshift, Amazon CloudWatch, and more (Nov 4, 2024)

Fine-tuning for Anthropic’s Claude 3 Haiku model in Amazon Bedrock is now generally available

Unlock the potential of your supply chain data and gain actionable insights with AWS Supply Chain Analytics

Amazon Aurora PostgreSQL Limitless Database is now generally available

Simplify and enhance Amazon S3 static website hosting with AWS Amplify Hosting

Celebrating 10 Years of Amazon ECS: Powering a Decade of Containerized Innovation

AWS Weekly Roundup: New code editor in AWS Lambda console, Amazon Q Business analytics, Claude 3.5 upgrades, and more (October 28, 2024)

EC2 Image Builder now supports building and testing macOS images

Announcing three new capabilities for the Claude 3.5 model family in Amazon Bedrock

AWS Weekly Roundup: Agentic workflows, Amazon Transcribe, AWS Lambda insights, and more (October 21, 2024)

Amazon Aurora PostgreSQL and Amazon DynamoDB zero-ETL integrations with Amazon Redshift now generally available

AWS Weekly Roundup: What’s App, AWS Lambda, Load Balancers, AWS Console, and more (Oct 14, 2024)

Convert AWS console actions to reusable code with AWS Console-to-Code, now generally available

AWS Weekly Roundup: HIPAA eligible with Amazon Q Business, Amazon DCV, AWS re:Post Agent, and more (Oct 07, 2024)