YARN log aggregation on AWS EMR - UnsupportedFileS

I am struggling to enable YARN log aggregation for my Amazon EMR cluster. I am following this documentation for the configuration:

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html#emr-plan-debugging-logs-archive

Under the section titled: "To aggregate logs in Amazon S3 using the AWS CLI".

I've verified that the hadoop-config bootstrap action puts the following in yarn-site.xml

<property><name>yarn.log-aggregation-enable</name><value>true</value></property>
<property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property>
<property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>3000</value></property>
<property><name>yarn.nodemanager.remote-app-log-dir</name><value>s3://mybucket/logs</value></property>

I can run a sample job (pi from hadoop-examples.jar) and see that it completed successfully on the ResourceManager's GUI.

It even creates a folder under s3://mybucket/logs named with the application id. But the folder is empty, and if I run yarn logs -applicationID <applicationId>, I get a stacktrace:

14/10/20 23:02:15 INFO client.RMProxy: Connecting to ResourceManager at /10.XXX.XXX.XXX:9022
Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
    at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:333)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:330)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
    at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:322)
    at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85)
    at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388)
    at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:112)
    at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
    at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199)

Which is doesn't make any sense to me; I can run hdfs dfs -ls s3://mybucket/ and it lists the contents just fine. The machines are getting credentials from AWS IAM Roles, I've tried adding fs.s3n.awsAccessKeyId and such to core-site.xml with no change in behavior.

Any advice is much appreciated.

标签： hadoop yarn emr amazon-emr hadoop2

1条回答

Evening l夕情丶

2楼-- · 2019-04-09 04:01

Hadoop provides two fs interfaces - FileSystem and AbstractFileSystem. Most of the time, we work with FileSystem and use configuration options like fs.s3.impl to provide custom adapters.

yarn logs, however, uses the AbstractFileSystem interface.

If you can find an implementation of that for S3, you can specify it using fs.AbstractFileSystem.s3.impl.

See core-default.xml for examples of fs.AbstractFileSystem.hdfs.impl etc.

0人赞添加讨论(0) 举报

YARN log aggregation on AWS EMR - UnsupportedFileS

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间