AWS analytics services streamline user access to data, permissions setting, and auditing

May 29, 2024

Sébastien Stormacq

I am pleased to announce a new use case based on trusted identity propagation, a recently introduced capability of AWS IAM Identity Center.

Tableau, a commonly used business intelligence (BI) application, can now propagate end-user identity down to Amazon Redshift. This has a triple benefit. It simplifies the sign-in experience for end users. It allows data owners to define access based on real end-user identity. It allows auditors to verify data access by users.

Trusted identity propagation allows applications that consume data (such as Tableau, Amazon QuickSight, Amazon Redshift Query Editor, Amazon EMR Studio, and others) to propagate the user’s identity and group memberships to the services that store and manage access to the data, such as Amazon Redshift, Amazon Athena, Amazon Simple Storage Service (Amazon S3), Amazon EMR, and others. Trusted identity propagation is a capability of IAM Identity Center that improves the sign-in experience across multiple analytics applications, simplifies data access management, and simplifies audit. End users benefit from single sign-on and do not have to specify the IAM roles they want to assume to connect to the system.

Before diving into more details, let’s agree on terminology.

I use the term “identity providers” to refer to the systems that hold user identities and group memberships. These are the systems that prompt the user for credentials and perform the authentication. For example, Azure Directory, Okta, Ping Identity, and more. Check the full list of identity providers we support.

I use the term “user-facing applications” to designate the applications that consume data, such as Tableau, Microsoft PowerBI, QuickSight, Amazon Redshift Query Editor, and others.

And finally, when I write “downstream services”, I refer to the analytics engines and storage services that process, store, or manage access to your data: Amazon Redshift, Athena, S3, EMR, and others.

To understand the benefit of trusted identity propagation, let’s briefly talk about how data access was granted until today. When a user-facing application accesses data from a downstream service, either the upstream service uses generic credentials (such as “tableau_user“) or assumes an IAM role to authenticate against the downstream service. This is the source of two challenges.

First, it makes it difficult for the downstream service administrator to define access policies that are fine-tuned for the actual user making the request. As seen from the downstream service, all requests originate from that common user or IAM role. If Jeff and Jane are both mapped to the BusinessAnalytics IAM role, then it is not possible to give them different levels of access, for example, readonly and read-write. Furthermore, if Jeff is also in the Finance group, he needs to choose a role in which to operate; he cannot access data from both groups in the same session.

Secondly, the task of associating a data-access event to an end user involves some undifferentiated heavy lifting. If the request originates from an IAM role called BusinessAnalytics, then additional work is required to figure out which user was behind that action.

Well, this particular example might look very simple, but in real life, organizations have hundreds of users and thousands of groups to match to hundreds of datasets. There was an opportunity for us to Invent and Simplify.

Once configured, the new trusted identity propagation provides a technical mechanism for user-facing applications to access data on behalf of the actual user behind the keyboard. Knowing the actual user identity offers three main advantages.

First, it allows downstream service administrators to create and manage access policies based on actual user identities, the groups they belong to, or a combination of the two. Downstream service administrators can now assign access in terms of users, groups, and datasets. This is the way most of our customers naturally think about access to data—intermediate mappings to IAM roles are no longer necessary to achieve these patterns.

Second, auditors now have access to the original user identity in system logs and can verify that policies are implemented correctly and follow all requirements of the company or industry-level policies.

Third, users of BI applications can benefit from single sign-on between applications. Your end-users no longer need to understand your company’s AWS accounts and IAM roles. Instead, they can sign in to EMR Studio (for example) using their corporate single sign-on that they’re used to for so many other things they do at work.

How does trusted identity propagation work?
Trusted identity propagation relies on standard mechanisms from our industry: OAuth2 and JWT. OAuth2 is an open standard for access delegation that allows users to grant third-party user-facing applications access to data on other services (downstream services) without exposing their credentials. JWT (JSON Web Token) is a compact, URL-safe means of representing identities and claims to be transferred between two parties. JWTs are signed, which means their integrity and authenticity can be verified.

How to configure trusted identity propagation
Configuring trusted identity propagation requires setup in IAM Identity Center, at the user-facing application, and at the downstream service because each of these needs to be told to work with end-user identities. Although the particulars will be different for each application, they will all follow this pattern:

Configure an identity source in AWS IAM Identity Center. AWS recommends enabling automated provisioning if your identity provider supports it, as most do. Automated provisioning works through the SCIM synchronization standard to synchronize your directory users and groups into IAM Identity Center. You probably have configured this already if you currently use IAM Identity Center to federate your workforce into the AWS Management Console. This is a one-time configuration, and you don’t have to repeat this step for each user-facing application.
Configure your user-facing application to authenticate its users with your identity provider. For example, configure Tableau to use Okta.
Configure the connection between the user-facing application and the downstream service. For example, configure Tableau to access Amazon Redshift. In some cases, it requires using the ODBC or JDBC driver for Redshift.

Then comes the configuration specific to trusted identity propagation. For example, imagine your organization has developed a user-facing web application that authenticates the users with your identity provider, and that you want to access data in AWS on behalf of the current authenticated user. For this use case, you would create a trusted token issuer in IAM Identity Center. This powerful new construct gives you a way to map your application’s authenticated users to the users in your IAM Identity Center directory so that it can make use of trusted identity propagation. My colleague Becky wrote a blog post to show you how to develop such an application. This additional configuration is required only when using third-party applications, such as Tableau, or a customer-developed application, that authenticate outside of AWS. When using user-facing applications managed by AWS, such as Amazon QuickSight, no further setup is required.

Finally, downstream service administrators must configure the access policies based on the user identity and group memberships. The exact configuration varies from one downstream service to the other. If the application reads or writes data in Amazon S3, the data owner may use S3 Access Grants in the Amazon S3 console to grant access for users and groups to prefixes in Amazon S3. If the application makes queries to an Amazon Redshift data warehouse, the data owner must configure IAM Identity Center trusted connection in the Amazon Redshift console and match the audience claim (aud) from the identity provider.

Now that you have a high-level overview of the configuration, let’s dive into the most important part: the user experience.

The end-user experience
Although the precise experience of the end user will obviously be different for different applications, in all cases, it will be simpler and more familiar to workforce users than before. The user interaction will begin with a redirect-based authentication single sign-on flow that takes the user to their identity provider, where they can sign in with credentials, multi-factor authentication, and so on.

Let’s look at the details of how an end user might interact with Okta and Tableau when trusted identity propagation has been configured.

Here is an illustration of the flow and the main interactions between systems and services.

Here’s how it goes.

1. As a user, I attempt to sign in to Tableau.

2. Tableau initiates a browser-based flow and redirects to the Okta sign-in page where I can enter my sign-in credentials. On successful authentication, Okta issues an authentication token (ID and access token) to Tableau.

3. Tableau initiates a JDBC connection with Amazon Redshift and includes the access token in the connection request. The Amazon Redshift JDBC driver makes a call to Amazon Redshift. Because your Amazon Redshift administrator enabled IAM Identity Center, Amazon Redshift forwards the access token to IAM Identity Center.

4. IAM Identity Center verifies and validates the access token and exchange the access token for an Identity Center issued token.

5. Amazon Redshift will resolve the Identity Center token to determine the corresponding Identity Center user and authorize access to the resource. Upon successful authorization, I can connect from Tableau to Amazon Redshift.

Once authenticated, I can start to use Tableau as usual.

And when I connect to Amazon Redshift Query Editor, I can observe the sys_query_history table to check who was the user who made the query. It correctly reports awsidc:<email address>, the Okta email address I used when I connected from Tableau.

You can read Tableau’s documentation for more details about this configuration.

Pricing and availability
Trusted identity propagation is provided at no additional cost in the 26 AWS Regions where AWS IAM Identity Center is available today.

Here are more details about trusted identity propagation and downstream service configurations.

Happy reading!

With trusted identity propagation, you can now configure analytics systems to propagate the actual user identity, group membership, and attributes to AWS services such as Amazon Redshift, Amazon Athena, or Amazon S3. It simplifies the management of access policies on these services. It also allows auditors to verify your organization’s compliance posture to know the real identity of users accessing data.

Get started now and configure your Tableau integration with Amazon Redshift.

-- seb

PS: Writing a blog post at AWS is always a team effort, even when you see only one name under the post title. In this case, I want to thank Eva Mineva, Laura Reith, and Roberto Migli for their much-appreciated help in understanding the many subtleties and technical details of trusted identity propagation.

AWS analytics services streamline user access to data, permissions setting, and auditing

EC2 Image Builder now supports building and testing macOS images

Upgraded Claude 3.5 Sonnet from Anthropic (available now), computer use (public beta), and Claude 3.5 Haiku (coming soon) in Amazon Bedrock

AWS Weekly Roundup: Agentic workflows, Amazon Transcribe, AWS Lambda insights, and more (October 21, 2024)

Amazon Aurora PostgreSQL and Amazon DynamoDB zero-ETL integrations with Amazon Redshift now generally available

AWS Weekly Roundup: What’s App, AWS Lambda, Load Balancers, AWS Console, and more (Oct 14, 2024)

Convert AWS console actions to reusable code with AWS Console-to-Code, now generally available

AWS Weekly Roundup: HIPAA eligible with Amazon Q Business, Amazon DCV, AWS re:Post Agent, and more (Oct 07, 2024)

NICE DCV is now Amazon DCV with 2024.0 release

AWS Weekly Roundup: Jamba 1.5 family, Llama 3.2, Amazon EC2 C8g and M8g instances and more (Sep 30, 2024)

Run your compute-intensive and general purpose workloads sustainably with the new Amazon EC2 C8g, M8g instances

Introducing Llama 3.2 models from Meta in Amazon Bedrock: A new generation of multimodal vision and lightweight models

Jamba 1.5 family of models by AI21 Labs is now available in Amazon Bedrock

AWS Weekly Roundup: Amazon EC2 X8g Instances, Amazon Q generative SQL for Amazon Redshift, AWS SDK for Swift, and more (Sep 23, 2024)

AWS named as a Leader in the 2024 Gartner Magic Quadrant for Desktop as a Service (DaaS)

Now available: Graviton4-powered memory-optimized Amazon EC2 X8g instances

Data engineering professional certificate: New hands-on specialization by DeepLearning.AI and AWS

Amazon S3 Express One Zone now supports AWS KMS with customer managed keys

AWS Weekly Roundup: Oracle Database@AWS, Amazon RDS, AWS PrivateLink, Amazon MSK, Amazon EventBridge, Amazon SageMaker and more

Amazon RDS for MySQL zero-ETL integration with Amazon Redshift, now generally available, enables near real-time analytics

Amazon SageMaker HyperPod introduces Amazon EKS support