使用子输出到HDFS中的文件(Outputting to a file in HDFS using

2019-10-19 13:33发布

我有一个脚本,通过行文字行的内容,稍微修改了行,然后输出线到文件中。 我可以阅读文本到文件罚款,问题是,我不能输出的文本。 这里是我的代码。

cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"], stdout=subprocess.PIPE)
for line in cat.stdout:
    line = line+"Blah";
    subprocess.Popen(["hadoop", "fs", "-put", "/user/test/moddedfile.txt"], stdin=line)

这是我得到的错误。

AttributeError: 'str' object has no attribute 'fileno'
cat: Unable to write to output stream.

Answer 1:

硬和快速的方法,使你的工作的代码:

import subprocess
from tempfile import NamedTemporaryFile

cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"],
                       stdout=subprocess.PIPE)

with NamedTemporaryFile() as f:
    for line in cat.stdout:
        f.write(line + 'Blah')

    f.flush()
    f.seek(0)

    cat.wait()

    put = subprocess.Popen(["hadoop", "fs", "-put", f.name,  "/user/test/moddedfile.txt"],
                           stdin=f)
    put.wait()

但我建议你看看HDFS / webhdfs Python库。

例如pywebhdfs 。



Answer 2:

stdin参数不接受字符串。 它应该是PIPENone或现有的文件(一些与有效.fileno()或整数文件描述符)。

from subprocess import Popen, PIPE

cat = Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"],
            stdout=PIPE, bufsize=-1)
put = Popen(["hadoop", "fs", "-put", "-", "/user/test/moddedfile.txt"],
            stdin=PIPE, bufsize=-1)
for line in cat.stdout:
    line += "Blah"
    put.stdin.write(line)

cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()


文章来源: Outputting to a file in HDFS using a subprocess