Welcome to the Earthdata Forum! Here, the scientific user community and subject matter experts from NASA Distributed Active Archive Centers (DAACs), and other contributors, discuss research needs, data, and data applications.
user = 'someusername'
password = 'somepassword
url = '
https://archive.podaac.earthdata.nasa.gov/s3credentials
'
url = requests.get(url, allow_redirects=False).headers['Location']
creds = requests.get(url, auth=(user, password)).json()
# creds returned containing sessionToken, secretAcccessKey and accessKeyId
aws_access_key_id = creds['accessKeyId']
aws_secret_access_key = creds['secretAccessKey']
aws_session_token = creds['sessionToken']
region = 'us-west-2'
# pass the retrieve session details to boto3 session
session = boto3.Session(aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
aws_session_token=aws_session_token,
region_name=region
client = session.client('s3', verify=False)
bucket = 'podaac-ops-cumulus-protected'
prefix = ''
delimiter = '/'
key = "OSTIA-UKMO-L4-GLOB-v2.0/20230626120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02.0-fv02.0.nc"
filename = '2023062612-GHRSST-OSTIA.nc'
print('downloading file starting....')
client.download_file(Bucket=bucket, Key=key, Filename=filename)
print('downloading file complete....')
This code executes up until the client download_file line which throws
Traceback (most recent call last):
File "test.py", line 46, in <module>
client.download_file(Bucket=bucket, Key=key, Filename=filename)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
I have checked my aws region that the code is running in and is set to us-west-1.
I have seen other questions around this same issue but with no resolution to how to direct download the data from s3 bucket. All the threads i have seen around the place talk about a misconfigured s3 bucket.
jmcnelis same response to the previous attempts
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
what versions of python and boto3 are expected, currently running version 3.6.9 of python and boto3==1.16.56
botocore==1.19.63
Thanks Jack have tried that with the same resultant error though the stack trace was further enhanced with the following information
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
The above exception was the direct cause of the following exception:
PermissionError: Forbidden
Hi nerdherdwa,
Please issue the following command in your shell and report back with the output:
aws configure get region
Thanks,
us-west-2 is the output from the console. which is being set by both the code and is configured using the aws cli.
session = boto3.Session(
aws_access_key_id=creds["accessKeyId"],
aws_secret_access_key=creds["secretAccessKey"],
aws_session_token=creds["sessionToken"],
region_name='us-west-2'
boto should only be using the aws cli configured region when one isnt provided as part of the session as per the code above. At this point the S3 boto doesnt look viable and have reverted to a https request for the GHRSST-OSTIA files passing headers in with the authentication session. Preference would be for a S3 download but i dont think the configuration/permissions of the S3 buckets have been setup correctly to allow this type of download or there is an issue with boto when it passes in the session information. My code works with S3 buckets that i have in my own AWS environment
Thanks nerdherdwa. I can't offer any explanation for why access through boto3 is not working from your EC2 instance running in us-west-2.
The s3fs approach works for me against this same collection. We are looking into it; can you use s3fs instead in the meantime? Here's my working code:
import os
import s3fs
import requests
import xarray as xr
def begin_s3_direct_access(url: str="
https://archive.podaac.earthdata.nasa.gov/s3credentials
"):
response = requests.get(url).json()
return s3fs.S3FileSystem(key=response['accessKeyId'],
secret=response['secretAccessKey'],
token=response['sessionToken'],
client_kwargs={'region_name':'us-west-2'})
fs = begin_s3_direct_access()
short_name = "OSTIA-UKMO-L4-GLOB-v2.0"
files = pd.Series(sorted(fs.glob(os.path.join("podaac-ops-cumulus-protected/", short_name, "*.nc"))))
This code assumes you have Earthdata Login credentials set up properly inside your .netrc file.
Thanks for bringing the issue to our attention.