When does EMR bootstrap actions run
Answer a question
I am creating an AWS cluster and I have a bootstrap action to change spark-defaults.conf.
Server is keep getting terminated saying
can't read /etc/spark/conf/spark-defaults.conf: No such file or directory
Though if I skip this and check on server the files does exist. So I assume the order of things are not correct. I am using Spark 1.6.1 by provided EMR 4.5 so it should be installed by default.
Any clues?
Thanks!
Answers
You should not change Spark configurations in a bootstrap action. Instead you should specify any changes you have to spark-defaults in a special json file you need to add when launching the cluster. If you use the cli to launch, the command should look something like this:
aws --profile MY_PROFILE emr create-cluster \
--release-label emr-4.6.0 \
--applications Name=Spark Name=Ganglia Name=Zeppelin-Sandbox \
--name "Name of my cluster" \
--configurations file:///path/to/my/emr-configuration.json \
...
--bootstrap-actions ....
--step ...
In the emr-configuration.json file you then set your changes to spark-defaults. An example could be:
[
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
}
},
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.dynamicAllocation.enabled": "true",
"spark.executor.cores":"7"
}
}
]
更多推荐
所有评论(0)