python saving output from a for iteration and subp

2019-08-01 07:23发布

问题:

The purpose of this script is to pull md5 checksum from each file of a directory as source and then (I'm working on that also) execute the script on the destination so validate it has copied correctly.

#!/usr/bin/env python

import os
from sys import *
import subprocess


script, path = argv

destination = "./new_directorio/"
archivo = "cksum.txt"


def checa_sum(x):
        ck = "md5 %s" % x
        p = subprocess.Popen(ck, stdout=subprocess.PIPE, shell=True)
        (output, err) = p.communicate()

        out = open(archivo,'w')
        out.write("%s" % (output))
        out.close()

files = [f for f in os.listdir(path) if os.path.isfile(f)]
for i in files:
        if not "~" in i:
                checa_sum(i)

What gives me is a file called: "cksum.txt" but only one result inside the file.

bash-3.2$ more cksum.txt
MD5 (victor) = 4703ee63236a6975abab75664759dc29
bash-3.2$ 

An other try, instead of "open", "write", "close" structure is using the following:

def checa_sum(x):
            ck = "md5 %s" % x
            p = subprocess.Popen(ck, stdout=subprocess.PIPE, shell=True)
            (output, err) = p.communicate()

             with open(archivo,'w') as outfile:
                   outfile.write(output)

Why is only dropping me one result when I expect the following result in the file?:

MD5 (pysysinfo.py) = 61a532c898e6f461ef029cee9d1b63dd

MD5 (pysysinfo_func.py) = ac7a1c1c43b2c5e20ceced5ffdecee86

MD5 (pysysinfo_new.py) = 38b06bac21af3d08662d00fd30f6c329

MD5 (test) = b2b0c958ece30c119bd99837720ffde1

MD5 (test_2.py) = 694fb14d86c573fabda678b9d770e51a

MD5 (uno.txt) = 466c9f9d6a879873688b000f7cbe758d

MD5 (victor) = 4703ee63236a6975abab75664759dc29

Moreover, I don't know how to tackle the space between each iteration. I'm looking for that too.

After having this, I'm going to compare each item to verify the integrity once is copied to the destination.

回答1:

ah, someone asked for alternatives, there are of course :)

import logging
import hashlib
import os
outfile = "hash.log"
indir = "/Users/daniel/Sites/work"
logging.basicConfig(filename=outfile, filemode="w", format='%(message)s', level=logging.DEBUG)
for filename in (file for file in os.listdir(indir) if os.path.isfile(file) and not file.endswith("~")):
    with open(filename) as checkfile:
        logging.info(hashlib.md5(checkfile.read()).hexdigest())

i've been using something like this before.

what i like is using the logging module, because it makes things scalable, i don't have to keep a file open, or keep on opening it. the logger is highly configurable, but for just generating something like needed here, the simple setup is a one liner.

here i am not doing any console parsing, because i am using pythons hashlib to generate the file md5. now one could say, doing this could be slowing things down, but at least for the file sizes i usually encounter i had no problems so far.

would be interesting to test on larger files, otherwise the logging mechanism could also be used in your case. i only preferred hashlib back then, because i did not fancy parsing console output.



回答2:

You keep opening with w and overwriting , open with a to append.

The best way is to simply redirect stdout to a file object, something like:

def checa_sum(x):
    with open(archivo,'a') as outfile:
        check_call(["md5",x], stdout=outfile)

using check_call will raise a CalledProcessError for a non-zero exit status which you should handle accordingly.

To catch the exception:

  try:
     check_call(["md5sum", x], stdout=outfile)
  except CalledProcessError as e:
     print("Exception for {}".format(e.cmd))

Use a generator expression to get the files and if you want to ignore copies use not f.endswith("~"):

files = (f for f in os.listdir("/home/padraic") if os.path.isfile(f) and not f.endswith("~"))
for i in files:
    checa_sum(i)