添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Hello all,

I am trying to run spark job using spark-submit with a docker image over yarn.

I followed the instructions in the Blog provided by cloudera in the following link:

https://blog.cloudera.com/introducing-apache-spark-on-docker-on-top-of-apache-yarn-with-cdp-datacent...

and I ran into an error that I couldn't fine an answer to.

Note: I already did all the configurations required from the post.

I ran this command :

spark-submit \
--master yarn \
--deploy-mode cluster \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=faresdev8/python3:v5 \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro,/opt/cloudera/parcels/:/opt/cloudera/parcels/:ro,/data1/opt/cloudera/parcels/:/data1/opt/cloudera/parcels/:ro" \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=faresdev8/python3:v5 \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro,/opt/cloudera/parcels/:/opt/cloudera/parcels/:ro,/data1/opt/cloudera/parcels/:/data1/opt/cloudera/parcels/:ro" \
ols.py

And this is the error I get:

MLOpsEngineer_0-1659346343337.png

Sometimes its gives me an exit code 29. I don't understand what the problem is especially that I followed instructions properly.

Okay so I solved this problem.

if anyone got something related check on these 3 things:

1- spark version does not mismatch the python version. Spark 2 doesn't support python higher than 3.7.

2- Make sure that your python code starts a spark session, I forgot that I removed that when I was experimenting.

3- Make sure there are no problems in the code it self and test it on another machine to check if it works properly

Okay so I solved this problem.

if anyone got something related check on these 3 things:

1- spark version does not mismatch the python version. Spark 2 doesn't support python higher than 3.7.

2- Make sure that your python code starts a spark session, I forgot that I removed that when I was experimenting.

3- Make sure there are no problems in the code it self and test it on another machine to check if it works properly

Apache Hadoop and associated open source project names are trademarks of the Apache Software Foundation. For a complete list of trademarks, click here.