How do I split one file into two files using Pytho

2020-04-10 01:25发布

问题:


Want to improve this question? Update the question so it's on-topic for Stack Overflow.

Closed 6 years ago.

I have a file containing some lines of code followed by a string pattern. I need to write everything before the line containing the string pattern in file one and everything after the string pattern in file two:

e.g. (file-content)

  • codeline 1
  • codeline 2
  • string pattern
  • codeline 3

The output should be file one with codeline 1, codeline 2 and file two with codeline 3.

I am familiar with writing files, but unfortunately I do not know how to determine the content before and after the string pattern.

回答1:

If the input file fits into memory, the easiest solution is to use str.partition():

with open("inputfile") as f:
    contents1, sentinel, contents2 = f.read().partition("Sentinel text\n")
with open("outputfile1", "w") as f:
    f.write(contents1)
with open("outputfile2", "w") as f:
    f.write(contents2)

This assumes that you know the exact text of the line separating the two parts.



回答2:

This approach is similar to Lev's but uses itertools because it's fun.

 dont_break = lambda l: l.strip() != 'string_pattern'

 with open('input') as source:
     with open('out_1', 'w') as out1:
         out1.writelines(itertools.takewhile(dont_break, source))
     with open('out_2', 'w') as out2:
         out2.writelines(source)

You could replace the dont_break function with a regular expression or anything else if necessary.



回答3:

with open('data.txt') as inf, open('out1.txt','w') as of1, open('out2.txt','w') as of2:
    outf = of1
    for line in inf:
        if 'string pattern' in line:
            outf = of2
            continue  # prevent output of the line with "string pattern" 
        outf.write(line)

will work with large files since it works line by line. Assumes string pattern occurs only once in the input file. I like the str.partition() approach best if the whole file can fit into memory (which may not be a problem)

Using with ensures the files are automatically closed when you are done, or an exception is encountered.



回答4:

A more efficient answer which will handle large files and consume a limited amount of memory..

inp = open('inputfile')
out = open('outfile1', 'w')
for line in inp:
  if line == "Sentinel text\n":
    out.close()
    out = open('outfile2', 'w')
  else:
    out.write(line)
out.close()
inp.close()


回答5:

A naive example (that doesn't load the file into memory like Sven's):

with open('file', 'r') as r:
    with open('file1', 'w') as f:
        for line in r:
            if line == 'string pattern\n':
                break
            f.write(line)
    with open('file2', 'w') as f:
        for line in r:
            f.write(line)

This assumes that 'string pattern' occurs once in the input file.

If the pattern isn't a fixed string, you can use the re module.



回答6:

No more than three lines:

with open('infile') as fp, open('of1','w') as of1, open('of2','w') as of2:
    of1.writelines(iter(fp.readline, sentinel))
    of2.writelines(fp)


回答7:

You need something like:

def test_pattern(x):
    if x.startswith('abc'): # replace this with some exact test
        return True
    return False

found = False
out = open('outfile1', 'w')
for line in open('inputfile'):
    if not found and test_pattern(line):
        found = True
        out.close()
        out = open('outfile2', 'w')
    out.write(line)
out.close()

replace the line with startswith with a test that works on your pattern (using pattern matching from re if necessary, but anything that finds the devider line will do).