失败的地图任务#超过允许限度(# of failed Map Tasks exceeded allo

2019-10-16 18:42发布

我使用Python试图在Hadoop流我的手。 我已经写了简单的地图,并通过采取帮助减少脚本这里

map脚本如下:

#!/usr/bin/env python

import sys, urllib, re

title_re = re.compile("<title>(.*?)</title>", re.MULTILINE | re.DOTALL | re.IGNORECASE)

for line in sys.stdin:
    url = line.strip()
    match = title_re.search(urllib.urlopen(url).read())
    if match :
        print url, "\t", match.group(1).strip()

reduce脚本如下:

#!/usr/bin/env python

from operator import itemgetter
import sys

for line in sys.stdin :
    line = line.strip()
    print line

运行使用Hadoop流罐子这些脚本之后, map任务完成,我可以看到他们是100%完成,但reduce工作会卡在22%以上,时间久了之后它给ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.错误。

我无法找出背后的确切原因。

我的终端窗口看起来如下:

shekhar@ubuntu:/host/Shekhar/Softwares/hadoop-1.0.0$ hadoop jar contrib/streaming/hadoop-streaming-1.0.0.jar -mapper /host/Shekhar/HadoopWorld/MultiFetch.py -reducer /host/Shekhar/HadoopWorld/reducer.py -input /host/Shekhar/HadoopWorld/urls/* -output /host/Shekhar/HadoopWorld/titles3
Warning: $HADOOP_HOME is deprecated.

packageJobJar: [/tmp/hadoop-shekhar/hadoop-unjar2709939812732871143/] [] /tmp/streamjob1176812134999992997.jar tmpDir=null
12/05/27 11:27:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/05/27 11:27:46 INFO mapred.FileInputFormat: Total input paths to process : 3
12/05/27 11:27:46 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-shekhar/mapred/local]
12/05/27 11:27:46 INFO streaming.StreamJob: Running job: job_201205271050_0006
12/05/27 11:27:46 INFO streaming.StreamJob: To kill this job, run:
12/05/27 11:27:46 INFO streaming.StreamJob: /host/Shekhar/Softwares/hadoop-1.0.0/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201205271050_0006
12/05/27 11:27:46 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201205271050_0006
12/05/27 11:27:47 INFO streaming.StreamJob:  map 0%  reduce 0%
12/05/27 11:28:07 INFO streaming.StreamJob:  map 67%  reduce 0%
12/05/27 11:28:37 INFO streaming.StreamJob:  map 100%  reduce 0%
12/05/27 11:28:40 INFO streaming.StreamJob:  map 100%  reduce 11%
12/05/27 11:28:49 INFO streaming.StreamJob:  map 100%  reduce 22%
12/05/27 11:31:35 INFO streaming.StreamJob:  map 67%  reduce 22%
12/05/27 11:31:44 INFO streaming.StreamJob:  map 100%  reduce 22%
12/05/27 11:34:52 INFO streaming.StreamJob:  map 67%  reduce 22%
12/05/27 11:35:01 INFO streaming.StreamJob:  map 100%  reduce 22%
12/05/27 11:38:11 INFO streaming.StreamJob:  map 67%  reduce 22%
12/05/27 11:38:20 INFO streaming.StreamJob:  map 100%  reduce 22%
12/05/27 11:41:29 INFO streaming.StreamJob:  map 67%  reduce 22%
12/05/27 11:41:35 INFO streaming.StreamJob:  map 100%  reduce 100%
12/05/27 11:41:35 INFO streaming.StreamJob: To kill this job, run:
12/05/27 11:41:35 INFO streaming.StreamJob: /host/Shekhar/Softwares/hadoop-1.0.0/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201205271050_0006
12/05/27 11:41:35 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201205271050_0006
12/05/27 11:41:35 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201205271050_0006_m_000001
12/05/27 11:41:35 INFO streaming.StreamJob: killJob...
Streaming Job Failed!

谁能帮帮我吗??

编辑作业服务器的详细信息如下:

Hadoop job_201205271050_0006 on localhost

User: shekhar
Job Name: streamjob1176812134999992997.jar
Job File: file:/tmp/hadoop-shekhar/mapred/staging/shekhar/.staging/job_201205271050_0006/job.xml
Submit Host: ubuntu
Submit Host Address: 127.0.1.1
Job-ACLs: All users are allowed
Job Setup: Successful
Status: Failed
Failure Info:# of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201205271050_0006_m_000001
Started at: Sun May 27 11:27:46 IST 2012
Failed at: Sun May 27 11:41:35 IST 2012
Failed in: 13mins, 48sec
Job Cleanup: Successful
Black-listed TaskTrackers: 1
Kind    % Complete  Num Tasks   Pending Running Complete    Killed  Failed/Killed
Task Attempts
map 100.00%
3   0   0   2   1   4 / 0
reduce  100.00%
1   0   0   0   1   0 / 1

Answer 1:

这个错误只是一个一般性错误,有太多的Map任务未能完成:

失败的地图任务的超过允许限度

您可以使用EMR控制台导航到日志个人的Map / Reduce任务。 那么你应该能够看到的问题是什么。

在我的情况 - 我时,我小的失误,如不正确的路径设置为地图脚本得到这个错误。

步骤以查看任务的日志:

http://antipatterns.blogspot.nl/2013/03/amazon-emr-map-reduce-error-of-failed.html



Answer 2:

我只是有同样的错误出现。 在我的情况下,它竟然是一个解析错误。 有在其标准输入,在分割线的地方“令人惊喜的”新行。 我建议检查您的数据文件。 一旦我删除其中有这些新线的列它工作得很好。



Answer 3:

首先检查你的标准错误。 您的信息是不够的,决定什么典型错误是,在标准错误:{你的Hadoop的临时目录这里} / mapred /本地/ userlogs / {您的作业id} / {你学尝试ID} /标准错误

肖恩的答案是最的情况下,当你第一次使用Hadoop的,所以我想你可能会得到一个“ENV:蟒\ R:没有这样的文件或目录”错误。 如果是这样,只需更换您的CR到LF来解决这个问题。 只是运行一个脚本\ n替换\ r



Answer 4:

添加下面一行在你的映射和减速的开始:

#!/usr/bin/python


文章来源: # of failed Map Tasks exceeded allowed limit