Answer a question

I am creating an AWS cluster and I have a bootstrap action to change spark-defaults.conf.

Server is keep getting terminated saying

can't read /etc/spark/conf/spark-defaults.conf: No such file or directory

Though if I skip this and check on server the files does exist. So I assume the order of things are not correct. I am using Spark 1.6.1 by provided EMR 4.5 so it should be installed by default.

Any clues?

Thanks!

Answers

You should not change Spark configurations in a bootstrap action. Instead you should specify any changes you have to spark-defaults in a special json file you need to add when launching the cluster. If you use the cli to launch, the command should look something like this:

 aws --profile MY_PROFILE emr create-cluster \
 --release-label emr-4.6.0 \
 --applications Name=Spark Name=Ganglia Name=Zeppelin-Sandbox \
 --name "Name of my cluster" \
 --configurations file:///path/to/my/emr-configuration.json \
 ...
 --bootstrap-actions ....
 --step ...

In the emr-configuration.json file you then set your changes to spark-defaults. An example could be:

[
  {
    "Classification": "capacity-scheduler",
    "Properties": {
      "yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
    }
  },
  {
    "Classification": "spark",
    "Properties": {
      "maximizeResourceAllocation": "true"
    }
  },
  {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.dynamicAllocation.enabled": "true",
      "spark.executor.cores":"7"
    }
  }
]
Logo

更多推荐