可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Hive user can stream table through script to transform that data:

ADD FILE replace-nan-with-zeros.py;

SELECT
  TRANSFORM (...)
  USING 'python replace-nan-with-zeros.py'
  AS (...)
FROM some_table;

I have a simple Python script:

#!/usr/bin/env python
import sys


kFirstColumns= 7

def main(argv):

    for line in sys.stdin:
        line = line.strip();
        inputs = line.split('\t')

        # replace NaNs with zeros
        outputs = [ ]
        columnIndex = 1;
        for value in inputs:
            newValue = value
            if columnIndex > kFirstColumns:
                newValue = value.replace('NaN','0.0')
            outputs.append(newValue)
            columnIndex = columnIndex + 1

        print '\t'.join(outputs)

if __name__ == "__main__":
    main(sys.argv[1:])

How to make kFirstColumns to be a command-line or some other kind of parameter to this Python script?

Thank you!

回答1:

Solution is really trivial. Use

ADD FILE replace-nan-with-zeros.py;

SELECT
  TRANSFORM (...)
  USING 'python replace-nan-with-zeros.py 7'
  AS (...)
FROM some_table;

instead of just

  ...
  USING 'python replace-nan-with-zeros.py'
  ...

It works fine for me.

Python script should be changed to:

kFirstColumns= int(sys.argv[1])

回答2:

Well, you are already sort of doing it.

You are grabbing sys.argv[1:] and passing it to main, but not using the arguments. What I would suggest (easiest route wise) would be to change your script as follows:

def main(kFirstColumns):
    ...

if __name__ == "__main__":
    main(int(sys.argv[1]))

Then run your script like

$ python myScript.py 7

Then, you can look at argparse when you want to do more complicated command line options.

回答3:

A bit of a hack, but you could pass the parameter by including it as an additional column in your query.

SELECT
  TRANSFORM (...)
  USING 'python replace-nan-with-zeros.py'
  AS (...)
FROM (SELECT 7 AS kFirstColumns, * FROM some_table);

Then, when you parse the row in your script, the first column value will be the parameter you are looking for. Simply pop it into your local variable to remove it from the list of column values.

line = line.strip();
inputs = line.split('\t')
kFirstColumns = inputs.pop(0)

Hope that helps.

How to pass parameters to Python streaming script

问题:

回答1:

回答2:

回答3:

收藏的人(0)

How to pass parameters to Python streaming script

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮