使用子输出到HDFS中的文件(Outputting to a file in HDFS using

我有一个脚本，通过行文字行的内容，稍微修改了行，然后输出线到文件中。我可以阅读文本到文件罚款，问题是，我不能输出的文本。这里是我的代码。

cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"], stdout=subprocess.PIPE)
for line in cat.stdout:
    line = line+"Blah";
    subprocess.Popen(["hadoop", "fs", "-put", "/user/test/moddedfile.txt"], stdin=line)

这是我得到的错误。

AttributeError: 'str' object has no attribute 'fileno'
cat: Unable to write to output stream.

Answer 1:

硬和快速的方法，使你的工作的代码：

import subprocess
from tempfile import NamedTemporaryFile

cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"],
                       stdout=subprocess.PIPE)

with NamedTemporaryFile() as f:
    for line in cat.stdout:
        f.write(line + 'Blah')

    f.flush()
    f.seek(0)

    cat.wait()

    put = subprocess.Popen(["hadoop", "fs", "-put", f.name,  "/user/test/moddedfile.txt"],
                           stdin=f)
    put.wait()

但我建议你看看HDFS / webhdfs Python库。

例如pywebhdfs 。

Answer 2:

stdin参数不接受字符串。它应该是PIPE ， None或现有的文件（一些与有效.fileno()或整数文件描述符）。

from subprocess import Popen, PIPE

cat = Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"],
            stdout=PIPE, bufsize=-1)
put = Popen(["hadoop", "fs", "-put", "-", "/user/test/moddedfile.txt"],
            stdin=PIPE, bufsize=-1)
for line in cat.stdout:
    line += "Blah"
    put.stdin.write(line)

cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()

文章来源: Outputting to a file in HDFS using a subprocess