可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

In Python, for a binary file, I can write this:

buf_size=1024*64           # this is an important size...
with open(file, "rb") as f:
   while True:
      data=f.read(buf_size)
      if not data: break
      # deal with the data....

With a text file that I want to read line-by-line, I can write this:

with open(file, "r") as file:
   for line in file:
       # deal with each line....

Which is shorthand for:

with open(file, "r") as file:
   for line in iter(file.readline, ""):
       # deal with each line....

This idiom is documented in PEP 234 but I have failed to locate a similar idiom for binary files.

I have tried this:

>>> with open('dups.txt','rb') as f:
...    for chunk in iter(f.read,''):
...       i+=1

>>> i
1                # 30 MB file, i==1 means read in one go...

I tried putting iter(f.read(buf_size),'') but that is a syntax error because of the parens after the callable in iter().

I know I could write a function, but is there way with the default idiom of for chunk in file: where I can use a buffer size versus a line oriented?

Thanks for putting up with the Python newbie trying to write his first non-trivial and idiomatic Python script.

回答1:

I don't know of any built-in way to do this, but a wrapper function is easy enough to write:

def read_in_chunks(infile, chunk_size=1024*64):
    while True:
        chunk = infile.read(chunk_size)
        if chunk:
            yield chunk
        else:
            # The chunk was empty, which means we're at the end
            # of the file
            return

Then at the interactive prompt:

>>> from chunks import read_in_chunks
>>> infile = open('quicklisp.lisp')
>>> for chunk in read_in_chunks(infile):
...     print chunk
... 
<contents of quicklisp.lisp in chunks>

Of course, you can easily adapt this to use a with block:

with open('quicklisp.lisp') as infile:
    for chunk in read_in_chunks(infile):
        print chunk

And you can eliminate the if statement like this.

def read_in_chunks(infile, chunk_size=1024*64):
    chunk = infile.read(chunk_size)
    while chunk:
        yield chunk
        chunk = infile.read(chunk_size)

回答2:

Try:

>>> with open('dups.txt','rb') as f:
...    for chunk in iter((lambda:f.read(how_many_bytes_you_want_each_time)),''):
...       i+=1

iter needs a function with zero arguments.

a plain f.read would read the whole file, since the size parameter is missing;
f.read(1024) means call a function and pass its return value (data loaded from file) to iter, so iter does not get a function at all;
(lambda:f.read(1234)) is a function that takes zero arguments (nothing between lambda and :) and calls f.read(1234).

There is equivalence between following:

somefunction = (lambda:f.read(how_many_bytes_you_want_each_time))

and

def somefunction(): return f.read(how_many_bytes_you_want_each_time)

and having one of these before your code you could just write: iter(somefunction, '').

Technically you can skip the parentheses around lambda, python's grammar will accept that.

Python file iterator over a binary file with newer

问题:

回答1:

回答2:

收藏的人(0)

Python file iterator over a binary file with newer

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮