Splitting out the output of ps using Python

On Linux, the command ps aux outputs a list of processes with multiple columns for each stat. e.g.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
...
postfix  22611  0.0  0.2  54136  2544 ?        S    15:26   0:00 pickup -l -t fifo -u
apache   22920  0.0  1.5 198340 16588 ?        S    09:58   0:05 /usr/sbin/httpd

I want to be able to read this in using Python and split out each row and then each column so they can be used as values.

For the most part, this is not a problem:

ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
processes = ps.split('\n')

I can now loop through processes to get each row and split it out by spaces, for example

sep = re.compile('[\s]+')
for row in processes:
    print sep.split(row)

However, the problem is that the last column, the command, sometimes has spaces in. In the example above this can be seen in command

pickup -l -t fifo -u

which would be split out as

['postfix', '22611', '0.0', '0.2', '54136', '2544', '?', 'S', '15:26', '0:00', 'pickup', '-l', '-t', 'fifo', '-u']

but I really want it as:

['postfix', '22611', '0.0', '0.2', '54136', '2544', '?', 'S', '15:26', '0:00', 'pickup -l -t fifo -u']

So my question is, how can I split out the columns but when it comes to the command column, keep the whole string as one list element rather than split out by spaces?

标签： python regex linux

5条回答

啃猪蹄的小仙女

2楼-- · 2019-01-09 07:54

The maxsplit optional argument to the split method might help you:

sep.split.(row, maxsplit=42)

0人赞添加讨论(0) 举报

疯言疯语

3楼-- · 2019-01-09 07:58

Use the second parameter to split which specifies the maximum number of fields to split the string into. I guess you can find the number by counting the number of fields in the first line, i.e. the column titles.

ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
processes = ps.split('\n')
# this specifies the number of splits, so the splitted lines
# will have (nfields+1) elements
nfields = len(processes[0].split()) - 1
for row in processes[1:]:
    print row.split(None, nfields)

0人赞添加讨论(0) 举报

霸刀☆藐视天下

4楼-- · 2019-01-09 08:01

Why don't you use PSI instead? PSI provides process information on Linux and other Unix variants.

import psi.process
for p in psi.process.ProcessTable().values(): …

0人赞添加讨论(0) 举报

放荡不羁爱自由

5楼-- · 2019-01-09 08:02

Here's a nice routine and usage to get you going:

def getProcessData():
    ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
    processes = ps.split('\n')
    # this specifies the number of splits, so the splitted lines
    # will have (nfields+1) elements
    nfields = len(processes[0].split()) - 1
    retval = []
    for row in processes[1:]:
        retval.append(row.split(None, nfields))
    return retval

wantpid = int(contents[0])
pstats = getProcessData()
for ps in pstats:
    if (not len(ps) >= 1): continue
    if (int(ps[1]) == wantpid):
        print "process data:"
        print "USER              PID       %CPU        %MEM       VSZ        RSS        TTY       STAT      START TIME      COMMAND"
        print "%-10.10s %10.10s %10.10s %10.10s %10.10s %10.10s %10.10s %10.10s %10.10s  %s" % (ps[0], ps[1], ps[2], ps[3], ps[4], ps[5], ps[6], ps[7], ps[8], ps[9])

0人赞添加讨论(0) 举报

乱世女痞

6楼-- · 2019-01-09 08:04

Check out the python.psutils package.

psutil.process_iter returns a generator which you can use to iterate over all processes. p.cmdline is a list of each Process object's cmdline arguments, separated just the way you want.

You can create a dictionary of pids vs (pid,cmdline,path) with just one line and then use it anyway you want.

pid_dict = dict([(p.pid, dict([('pid',p.pid), ('cmdline',p.cmdline), ('path',p.path)]))
                 for p in psutil.process_iter()]))

0人赞添加讨论(0) 举报

Splitting out the output of ps using Python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间