issues on python packages on hadoop distributed sy

2019-04-17 11:48发布

I use python to do hadoop streaming. We use an AWS hadoop streaming distributed systems which has a master node, and four slave nodes. If I need to install a package on python, I need to install the package on every node of the system to make it work. But in my cases, even after I install the python package on every node of the system, the python hadoop streaming job still can not work. I am wondering why. Thank you!

more details:

I use the python hadoop streaming to do the mapreduce work. And the hadoop system I use is an AWS hadoop distributed system which has one master node, and four slave nodes. Sometimes, I need to install packages on python. It is a distributed system, theoretically, if I need to install a package on python, I need to install the package on python on every node of the distributed system to make the streaming job work. But the thing is even after I install the needed python packages on every node of the hadoop distributed system, the hadoop streaming job still failed. But after I delete the "import some-package" line in hadoop streaming python script, and change the code accordingly, the hadoop streaming python job will work. So the problem is on the newly added python packages. I am just confused, why I can not make the python streaming job work even though I install the python packages on the python of every node of the hadoop distributed system .

0条回答
登录 后发表回答