The below code executes a 'get' api method to retrieve objects from s3 and write to the data lake.
The problem arises when I use dbutils.secrets.get to get the keys required to establish the connection to s3
my_dataframe.rdd.foreachPartition(partition => {
val AccessKey = dbutils.secrets.get(scope = "ADB_Scope", key = "AccessKey-ID")
val SecretKey = dbutils.secrets.get(scope = "ADB_Scope", key = "AccessKey-Secret")
val creds = new BasicAWSCredentials(AccessKey, SecretKey)
val clientRegion: Regions = Regions.US_EAST_1
val s3client = AmazonS3ClientBuilder.standard()
.withRegion(clientRegion)
.withCredentials(new AWSStaticCredentialsProvider(creds))
.build()
partition.foreach(x => {
val objectKey = x.getString(0)
val i = s3client.getObject(s3bucketName, objectKey).getObjectContent
val inputS3String = IOUtils.toString(i, "UTF-8")
val filePath = s"${data_lake_get_path}"
val file = new File(filePath)
val fileWriter = new FileWriter(file)
val bw = new BufferedWriter(fileWriter)
bw.write(inputS3String)
bw.close()
fileWriter.close()
})
at com.databricks.dbutils_v1.impl.SecretUtilsImpl.getSecretManagerClient(SecretUtilsImpl.scala:36)
at com.databricks.dbutils_v1.impl.SecretUtilsImpl.getBytesInternal(SecretUtilsImpl.scala:46)
at com.databricks.dbutils_v1.impl.SecretUtilsImpl.get(SecretUtilsImpl.scala:61)
When the actual secret scope values for AccessKey and SecretKey are passed the above code works fine.
How can this work using dbutils.secrets.get so that keys are not exposed in the code?
Hi @Sandesh Puligundla , You just need to move the following two lines:
val AccessKey = dbutils.secrets.get(scope = "ADB_Scope", key = "AccessKey-ID")
val SecretKey = dbutils.secrets.get(scope = "ADB_Scope", key = "AccessKey-Secret")
Outside of the foreachpartition block, so these functions will be executed in the context of the driver and sent to the worker nodes.
Hi @Sandesh Puligundla , You just need to move the following two lines:
val AccessKey = dbutils.secrets.get(scope = "ADB_Scope", key = "AccessKey-ID")
val SecretKey = dbutils.secrets.get(scope = "ADB_Scope", key = "AccessKey-Secret")
Outside of the foreachpartition block, so these functions will be executed in the context of the driver and sent to the worker nodes.
Welcome to Databricks Community: Lets learn, network and celebrate together
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.