Hive Broken pipe error

2019-07-19 07:34发布

I have been working on a project that include a hive query.

INSERT INTO OVERWRITE .... TRANSFORM (....) USING 'python script.py' FROM .... LEFT OUTER JOIN . . . LEFT OUTER JOIN . . . LEFT OUTER JOIN

At the begining everything work fine until we loaded a big amount of dummy data. We just write the same records with small variations on some fields. After that we run this again and we are getting a Broken pipe error without much information. There is no log about the error, just the IOException: Broken pipe error. . . .

To simplify the script and isolate errors we modify the script to

for line in sys.stdin.readlines():
    print line

to avoid any error at that level. We still have the same error.

标签： hadoop hive

2条回答

Anthone

2楼-- · 2019-07-19 08:13

Another work around on this is to remove the transform and generate a new query inserting the data in another table just running the transformation. I'm not 100% sure why, the scrtip is correct. I think the issue may be a really big amount of data streamed because of the so many joins.

0人赞添加讨论(0) 举报

Emotional °昔

3楼-- · 2019-07-19 08:26

The problem seems to be solved by spliting so many joins in different queries and using intermediate tables. Then you just add a final query with a last join summarizing all the previous results. As I understand this mean no error at the script level but too many data to handle by hive

0人赞添加讨论(0) 举报

Hive Broken pipe error

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间