添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Databricks Community

I'm seeing the access denied error from spark cluster while reading s3 file into notebook.

Running on personal single user compute with LTS 13.3 ML.

configs setup looks like this:

spark.conf. set ( "spark.hadoop.fs.s3a.access.key" , access_id)
spark.conf. set ( "spark.hadoop.fs.s3a.secret.key" , access_key)
spark.conf. set ( "spark.hadoop.fs.s3a.session.token" , session_token)
spark.conf. set ( "spark.hadoop.fs.s3a.aws.credentials.provider" , "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" )
spark.conf. set ( "spark.hadoop.fs.s3a.endpoint" , "s3.us-east-1.amazonaws.com" )
Code block looks like this
file_location = "s3://bucket_name/"
file_type = "parquet"
df = spark.read. format (file_type).load(file_location)
display(df.head())


Error that I'm getting:
java.nio.file.AccessDeniedException: s3://bucket_name/xxx.parquet: getFileStatus ons3://bucket_name/xxx.parquet: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://bucket_name.parquet {} Hadoop 3.3.4, aws-sdk-java/1.12.390 Linux/5.15.0-1045-aws OpenJDK_64-Bit_Server_VM/25.372-b07 java/1.8.0_372 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest; Request ID: RD3ZAB9V0G6C4W7B, Extended Request ID: 7BDXsMzY0O6RwMdKfFLlGuHlw2AkKj0+O2U6vL2UnF1nXzu9sDsVtPVH4qXv5sYzLf8vV65sNdU=, Cloud Provider: AWS, Instance ID: i-06f065a5b0db0e707 credentials-provider: com.amazonaws.auth.AnonymousAWSCredentials credential-header: no-credential-header signature-present: false (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: RD3ZAB9V0G6C4W7B; S3 Extended Request ID: 7BDXsMzY0O6RwMdKfFLlGuHlw2AkKj0+O2U6vL2UnF1nXzu9sDsVtPVH4qXv5sYzLf8vV65sNdU=; Proxy: null), S3 Extended Request ID: 7BDXsMzY0O6RwMdKfFLlGuHlw2AkKj0+O2U6vL2UnF1nXzu9sDsVtPVH4qXv5sYzLf8vV65sNdU=:403 Forbidden
Please help.

Hi @Monika_Bagyal , The "Access Denied" error you are seeing is likely due to insufficient permissions to read the S3 bucket.

The configurations you've set up are correct for accessing S3 using temporary AWS credentials, but the credentials themselves or the permissions associated with those credentials might not have sufficient access to the S3 bucket.

Here are some possible solutions:

1. **Check your AWS credentials**: Ensure that the access_id, access_key, and session_token you are using are correct and have not expired.

2. **Check your AWS permissions**: The AWS credentials you are using should have the necessary permissions to read the S3 bucket. You might need to adjust your AWS IAM policies to allow access to the S3 bucket.

3. **Check your bucket policy**: Your S3 bucket policy should allow your AWS credentials to read data. You might need to adjust your bucket policy to allow access.

4. **Check your endpoint**: Make sure the endpoint you are using is correct. It should match the region where your S3 bucket is located.

If you've checked all of these and you're still having issues, it might be a more specific issue related to your setup, and you might need to contact Databricks support by filing a support ticket or AWS support for further assistance.

If there isn’t a group near you, start one and help create a community that brings people together. Request a New Group Unity Catalog Volume mounting broken by cluster environment variables (http proxy) in Administration & Architecture Error: PERMISSION_DENIED: AWS IAM role does in Administration & Architecture dlt Streaming Checkpoint Not Found in Data Engineering Volumes unzip files in Data Engineering Autoloader File Notifications mode S3 Access Denied error in Data Engineering © Databricks 2024. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation.
  • Privacy Notice
  • Terms of Use
  • Your Privacy Choices
  • Your California Privacy Rights
  •