How to fix resource changed on src filesystem issu

2020-03-30 16:08发布

I'm trying to use Hive on MR executing SQL and it fails half way with errors below:

Application application_1570514228864_0001 failed 2 times due to AM Container for appattempt_1570514228864_0001_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2019-10-08 13:57:49.272]Failed to download resource { { s3a://tpcds/tmp/hadoop-yarn/staging/root/.staging/job_1570514228864_0001/libjars, 1570514262820, FILE, null },pending,[(container_1570514228864_0001_02_000001)],1132444167207544,DOWNLOADING} java.io.IOException: Resource s3a://tpcds/tmp/hadoop-yarn/staging/root/.staging/job_1570514228864_0001/libjars changed on src filesystem (expected 1570514262820, was 1570514269265

The key message from the error log from my perspective is libjars changed on src filesystem (expected 1570514262820, was 1570514269265. There are several threads about this issue at SO but not been answered yet, like thread1 and thread2.

I found something valuable from apache jira and redhat bugzilla. I synced clock by NTP through all nodes related. But same issue is still there.

Any comment is welcomed, thx.

1条回答
乱世女痞
2楼-- · 2020-03-30 16:32

I still didn't know why the timestamp of resource file is inconsistent and there isn't a way to fix it in configuration way, AFAIK.

However, I managed to find a workaround to skip the issue. Let me summarize it here for anyone who might run into same issue.

By checking error log and search it at Hadoop source code, we can trace the issue at hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java.

Just remove the exception throwing statements,

  private void verifyAndCopy(Path destination)
      throws IOException, YarnException {
    final Path sCopy;
    try {
      sCopy = resource.getResource().toPath();
    } catch (URISyntaxException e) {
      throw new IOException("Invalid resource", e);
    }
    FileSystem sourceFs = sCopy.getFileSystem(conf);
    FileStatus sStat = sourceFs.getFileStatus(sCopy);
    if (sStat.getModificationTime() != resource.getTimestamp()) {
            /**
      throw new IOException("Resource " + sCopy +
          " changed on src filesystem (expected " + resource.getTimestamp() +
          ", was " + sStat.getModificationTime());
          **/
            LOG.debug("[Gearon][Info] The timestamp is not consistent among resource files.\n" +
                            "Stop throwing exception . It doesn't affect other modules. ");
    }
    if (resource.getVisibility() == LocalResourceVisibility.PUBLIC) {
      if (!isPublic(sourceFs, sCopy, sStat, statCache)) {
        throw new IOException("Resource " + sCopy +
            " is not publicly accessible and as such cannot be part of the" +
            " public cache.");
      }
    }

    downloadAndUnpack(sCopy, destination);
  }

Build hadoop-yarn-project and copy 'hadoop-yarn-common-x.x.x.jarto$HADOOP_HOME/share/hadoop/yarn`.

Leave this thread here and thanks for any further explanation about how to fix it without changing hadoop source.

查看更多
登录 后发表回答