link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

VPC errors during Amazon EMR cluster operations

The following errors are common to VPC configuration in Amazon EMR.

Topics

Invalid subnet configuration
Missing DHCP Options Set
Permissions errors
Errors that result in START_FAILED
Cluster Terminated with errors and NameNode fails to start

Invalid subnet configuration

On the Cluster Details page, in the Status field, you see an error similar to the following:

The subnet configuration was invalid: Cannot find route to InternetGateway in main RouteTable rtb-id for vpc vpc-id .

To solve this problem, you must create an Internet Gateway and attach it to your VPC. For more information, see Adding an internet gateway to your VPC .

Alternatively, verify that you have configured your VPC with Enable DNS resolution and Enable DNS hostname support enabled. For more information, see Using DNS with your VPC .

Missing DHCP Options Set

You see a step failure in the cluster system log (syslog) with an error similar to the following:

ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id '


             application_id

' doesn't exist in RM.

ERROR org.apache.hadoop.streaming.StreamJob (main): Error Launching job : org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id ' application_id ' doesn't exist in RM.

To solve this problem, you must configure a VPC that includes a DHCP Options Set whose parameters are set to the following values:

Note

If you use the AWS GovCloud (US-West) region, set domain-name to us-gov-west-1.compute.internal instead of the value used in the following example.

domain-name = ec2.internal

Use ec2.internal if your region is US East (N. Virginia). For other regions, use region-name .compute.internal . For example in us-west-2, use domain-name = us-west-2.compute.internal .

domain-name-servers = AmazonProvidedDNS

For more information, see DHCP Options Sets .

Permissions errors

A failure in the stderr log for a step indicates that an Amazon S3 resource does not have the appropriate permissions. This is a 403 error and the error looks like:


Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: REQUEST_ID

If the ActionOnFailure is set to TERMINATE_JOB_FLOW , then this would result in the cluster terminating with the state, SHUTDOWN_COMPLETED_WITH_ERRORS .

A few ways to troubleshoot this problem include:

If you are using an Amazon S3 bucket policy within a VPC, make sure to give access to all buckets by creating a VPC endpoint and selecting Allow all under the Policy option when creating the endpoint.

Make sure that any policies associated with S3 resources include the VPC in which you launch the cluster.

Try running the following command from your cluster to verify you can access the bucket


hadoop fs -copyToLocal s3://path-to-bucket /tmp/

You can get more specific debugging information by setting the log4j.logger.org.apache.http.wire parameter to DEBUG in /home/hadoop/conf/log4j.properties file on the cluster. You can check the stderr log file after trying to access the bucket from the cluster. The log file will provide more detailed information:


Access denied for getting the prefix for bucket - us-west-2.elasticmapreduce with path samples/wordcount/input/
15/03/25 23:46:20 DEBUG http.wire: >> "GET /?prefix=samples%2Fwordcount%2Finput%2F&delimiter=%2F&max-keys=1 HTTP/1.1[\r][\n]"
15/03/25 23:46:20 DEBUG http.wire: >> "Host: us-west-2.elasticmapreduce.s3.amazonaws.com[\r][\n]"

Errors that result in `START_FAILED`

Before AMI 3.7.0, for VPCs where a hostname is specified, Amazon EMR maps the internal hostnames of the subnet with custom domain addresses as follows: ip- X.X.X.X.customdomain.com .tld . For example, if the hostname was ip-10.0.0.10 and the VPC has the domain name option set to customdomain.com, the resulting hostname mapped by Amazon EMR would be ip-10.0.1.0.customdomain.com . An entry is added in /etc/hosts to resolve the hostname to 10.0.0.10. This behavior is changed with AMI 3.7.0 and now Amazon EMR honors the DHCP configuration of the VPC entirely. Previously, customers could also use a bootstrap action to specify a hostname mapping.

If you would like to preserve this behavior, you must provide the DNS and forward resolution setup you require for the custom domain.

Cluster `Terminated with errors` and NameNode fails to start

When launching an EMR cluster in a VPC which makes use of a custom DNS domain name, your cluster may fail with the following error message in the console:


Terminated with errors  On the master instance(instance-id), bootstrap action 1 returned a  non-zero return code

The failure is a result of the NameNode not being able to start up. This will result in the following error found in the NameNode logs, whose Amazon S3 URI is of the form: s3:// mybucket / logs / cluster-id /daemons/ master instance-id /hadoop-hadoop-namenode- master node hostname .log.gz :


2015-07-23 20:17:06,266 WARN
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem (main): Encountered  exception
      loading fsimage  java.io.IOException: NameNode is not formatted.      
      org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:212)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1020)
      org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:739)
      org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:537)
      org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:596)      
      at  org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:765)
      org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:749)      
      org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1441)
      org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1507)

This is due to a potential issue where an EC2 instance can have multiple sets of fully qualified domain names when launching EMR clusters in a VPC, which makes use of both an