Python - Writing to a new file from another file

2019-06-04 05:36发布

问题:

I wish to have to have the first field (Username) from File1 and the second field(Password) output into a third file which is created during the function but I am unable to do it. :(

The format of the files will always be the same which are:

File 1:

Username:DOB:Firstname:Lastname:::

File2:

Lastname:Password

My current code:

def merge(f1,f2,f3):
   with open(f3, "a") as outputFile:
      with open(f1) as usernameFile:
         for line in usernameFile:
            line = line[:-3]
            username = line.split(':')
            outputFile.write(username[0])
      with open(f2) as passwordFile:
         for line in passwordFile:
            password = line.split(':')
            outputFile.write(password[1])

merge('file1.txt', 'file2.txt', 'output.txt')

I want the Username from File1 and the Password from File2 to write to File3 with the layout:

Username:Password
Username:Password
Username:Password

Any help would be appreciated. :)

回答1:

This is the minimum change that you would need to do to your code to make it work:

def merge(f1,f2,f3):
  with open(f3, "a") as outputFile:

     with open(f1) as usernameFile:
        for line in usernameFile:
           username = line.split(':')[0]
           lastname = line.split(':')[3]
           outputFile.write(username)

        with open(f2) as passwordFile: 
           for line in passwordFile:
              lN, password = line.split(':')
              if lN == lastname: outputFile.write(password[1]) 

merge('file1.txt', 'file2.txt', 'output.txt')

However, this method isn't very good because it reads a file multiple times. I would go ahead and make a dictionary for the second file, with the lastname as a key. Dictionaries are very helpful in these situations. The dictionary can be made apriori as follows:

def makeDict(f2):
  dOut = {}
  with open(f2) as f:
     for l in f:
        dOut[ l.split(':')[0] ] = l.split(':')[1]

  return dOut


def merge(f1,f2,f3):

  pwd = makeDict(f2)
  print pwd
  with open(f3, "a") as outputFile:

     with open(f1) as usernameFile:
        for line in usernameFile:
           if line.strip() == '': continue
           username = line.split(':')[0]
           lastname = line.split(':')[3]
           if lastname in pwd: 
              outputFile.write(username + ':' + pwd[lastname] + '\n')


merge('f1.txt', 'f2.txt', 'f3.txt'  )

I just ran the following program using the files:

f1.txt

Username0:DOB:Firstname:Lastname0:::
Username1:DOB:Firstname:Lastname1:::
Username2:DOB:Firstname:Lastname2:::
Username3:DOB:Firstname:Lastname3:::

f2.txt

Lastname0:Password0
Lastname1:Password1
Lastname2:Password2
Lastname3:Password3

and got the output:

Username0:Password0

Username1:Password1

Username2:Password2

Username3:Password3

I did add the last line merge(...) and another like which would be used to skip blank lines in the input text, but otherwise, everything should be fine. There wont be any output if the merge(... function isn't called.



回答2:

If the files are identically sorted (i.e. the users appear in the same order in both files), use the tip in this answer to iterate over both files at the same time rather than one after the other in your example.

from itertools import izip

with open(f3, "a") as outputFile:
  for line_from_f1, line_from_f2 in izip(open(f1), open(f2)):
    username = line_from_f1.split(':')[0]
    password = line_from_f1.split(':')[1]
    outputfile.write("%s:%s" % (username, password))

If the files are not identically sorted, first create a dictionary with keys lastname and values username from file1. Then create a second dictionary with keys lastname and values password from file2. Then iterate over the keys of either dict and print both values.



回答3:

Abstract the data extraction from the file i/o, then you can re-use merge() with different extraction functions.

import itertools as it
from operator import itemgetter    
from contextlib import contextmanager

def extract(foo):
    """Extract username and password, compose and return output string

    foo is a tuple or list
    returns str

    >>> len(foo) == 2
    True
    """
    username = itemgetter(0)
    password = itemgetter(1)
    formatstring = '{}:{}\n'
    item1, item2 = foo
    item1 = item1.strip().split(':')
    item2 = item2.strip().split(':')
    return formatstring.format(username(item1), password(item2))

@contextmanager
def files_iterator(files):
    """Yields an iterator that produces lines synchronously from each file

    Intended to be used with contextlib.contextmanager decorator.
    yields an itertools.izip object

    files is a list or tuple of file paths - str
    """
    files = map(open, files)
    try:
        yield it.izip(*files)
    finally:
        for file in files:
            file.close()


def merge(in_files,out_file, extract):
    """Create a new file with data extracted from multiple files.

    Data is extracted from the same/equivalent line of each file:
        i.e. File1Line1, File2Line1, File3Line1
             File1Line2, File2Line2, File3Line2

    in_files --> list or tuple of str, file paths
    out_file --> str, filepath
    extract --> function that returns list or tuple of extracted data

    returns none
    """
    with files_iterator(in_files) as files, open(out_file, 'w') as out:
        out.writelines(map(extract, files))
##        out.writelines(extract(lines) for lines in files)

merge(['file1.txt', 'file2.txt'], 'file3.txt', extract)

Files_Iterator is a With Statement Context Manager that allows multiple synchronous file iteration and ensures the files will be closed. Here is a good start for reading - Understanding Python's "with" statement



回答4:

I would recommend building two dictionaries to represent the data in each file, then write File3 based on that structure:

d1 = {}
with open("File1.txt", 'r') as f:
    for line in f:
        d1[line.split(':')[3]] = line.split(':')[0]

d2 = {}
with open("File2.txt", 'r') as f:
    for line in f:
        d2[line.split(':')[0]] = line.split(':')[1]

This will give you two dictionaries that look like this:

d1 = {Lastname: Username}
d2 = {Lastname: Password}

To then write this to File 3, simply run through the keys of either dicitonary:

with open("File3.txt", 'w') as f:
    for key in d1:
        f.write("{}:{}\n".format(d1[key], d2[key]))

Some things to Note:

  • If the files don't have all the same values, you'll need to throw in some handling for that (let me know if this is the case and I can toss a few ideas your way

  • This approach does not preserve any order the files were in

  • The code assumes that all lines are of the same format. A more complicated file will need some code to handle "odd" lines



回答5:

Its fine to avoid this if you have identically sorted rows in each file. But, if it gets any more complicated than that, then you should be using pandas for this. With pandas, you can essentially do a join, so, no matter how the rows are ordered in each file, this will work. Its also very concise.

import pandas as pd

df1 = pd.read_csv(f1, sep=':', header=None).ix[:,[0,3]]
df1.columns = ['username', 'lastname']
df2 = pd.read_csv(f2, sep=':', header=None)
df2.columns = ['lastname', 'password']
df3 = pd.merge(df1, df2).ix[:,['username','password']]
df3.to_csv(f3, header=False, index=False, sep=':')

Note that you will also have the option to do outer joins. This is useful, if for some reason, there are usernames without passwords or vice versa in your files.



回答6:

This is pretty close. Be sure no blank line at end of input files, or add code to skip blank lines when you read.

#!/usr/bin/env python
"""
File 1:
Username:DOB:Firstname:Lastname:::

File2:
Lastname:Password

File3:
Username:Password

"""

def merge(f1,f2,f3):
   username_lastname = {}
   with open(f3, "a") as outputFile:
      with open(f1) as usernameFile:
         for line in usernameFile:
            user = line.strip().split(':')
            print user
            username_lastname[user[3]] = user[0] # dict with Lastname as key, Username as value

      print username_lastname      
      with open(f2) as passwordFile:
         for line in passwordFile:
            lastname_password = line.strip().split(':')
            print lastname_password
            password = lastname_password[1]
            username = username_lastname[lastname_password[0]]
            print username, password
            out_line = "%s:%s\n" % (username, password)
            outputFile.write(out_line)
         outputFile.close()

merge('f1.txt', 'f2.txt', 'output.txt')

f1:
Username1:DOB:Firstname:Lastname1:::
Username2:DOB:Firstname:Lastname2:::
Username3:DOB:Firstname:Lastname3:::

f2:
Lastname1:Password1
Lastname2:Password2
Lastname3:Password3

f3: 
Username1:Password1
Username2:Password2
Username3:Password3