可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I wish to have to have the first field (Username) from File1 and the second field(Password) output into a third file which is created during the function but I am unable to do it. :(
The format of the files will always be the same which are:
File 1:
Username:DOB:Firstname:Lastname:::
File2:
Lastname:Password
My current code:
def merge(f1,f2,f3):
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
line = line[:-3]
username = line.split(':')
outputFile.write(username[0])
with open(f2) as passwordFile:
for line in passwordFile:
password = line.split(':')
outputFile.write(password[1])
merge('file1.txt', 'file2.txt', 'output.txt')
I want the Username from File1 and the Password from File2 to write to File3 with the layout:
Username:Password
Username:Password
Username:Password
Any help would be appreciated. :)
回答1:
This is the minimum change that you would need to do to your code to make it work:
def merge(f1,f2,f3):
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
username = line.split(':')[0]
lastname = line.split(':')[3]
outputFile.write(username)
with open(f2) as passwordFile:
for line in passwordFile:
lN, password = line.split(':')
if lN == lastname: outputFile.write(password[1])
merge('file1.txt', 'file2.txt', 'output.txt')
However, this method isn't very good because it reads a file multiple times. I would go ahead and make a dictionary for the second file, with the lastname as a key. Dictionaries are very helpful in these situations. The dictionary can be made apriori as follows:
def makeDict(f2):
dOut = {}
with open(f2) as f:
for l in f:
dOut[ l.split(':')[0] ] = l.split(':')[1]
return dOut
def merge(f1,f2,f3):
pwd = makeDict(f2)
print pwd
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
if line.strip() == '': continue
username = line.split(':')[0]
lastname = line.split(':')[3]
if lastname in pwd:
outputFile.write(username + ':' + pwd[lastname] + '\n')
merge('f1.txt', 'f2.txt', 'f3.txt' )
I just ran the following program using the files:
f1.txt
Username0:DOB:Firstname:Lastname0:::
Username1:DOB:Firstname:Lastname1:::
Username2:DOB:Firstname:Lastname2:::
Username3:DOB:Firstname:Lastname3:::
f2.txt
Lastname0:Password0
Lastname1:Password1
Lastname2:Password2
Lastname3:Password3
and got the output:
Username0:Password0
Username1:Password1
Username2:Password2
Username3:Password3
I did add the last line merge(...)
and another like which would be used to skip blank lines in the input text, but otherwise, everything should be fine. There wont be any output if the merge(...
function isn't called.
回答2:
If the files are identically sorted (i.e. the users appear in the same order in both files), use the tip in this answer to iterate over both files at the same time rather than one after the other in your example.
from itertools import izip
with open(f3, "a") as outputFile:
for line_from_f1, line_from_f2 in izip(open(f1), open(f2)):
username = line_from_f1.split(':')[0]
password = line_from_f1.split(':')[1]
outputfile.write("%s:%s" % (username, password))
If the files are not identically sorted, first create a dictionary with keys lastname
and values username
from file1
. Then create a second dictionary with keys lastname
and values password
from file2
. Then iterate over the keys of either dict and print both values.
回答3:
Abstract the data extraction from the file i/o, then you can re-use merge()
with different extraction functions.
import itertools as it
from operator import itemgetter
from contextlib import contextmanager
def extract(foo):
"""Extract username and password, compose and return output string
foo is a tuple or list
returns str
>>> len(foo) == 2
True
"""
username = itemgetter(0)
password = itemgetter(1)
formatstring = '{}:{}\n'
item1, item2 = foo
item1 = item1.strip().split(':')
item2 = item2.strip().split(':')
return formatstring.format(username(item1), password(item2))
@contextmanager
def files_iterator(files):
"""Yields an iterator that produces lines synchronously from each file
Intended to be used with contextlib.contextmanager decorator.
yields an itertools.izip object
files is a list or tuple of file paths - str
"""
files = map(open, files)
try:
yield it.izip(*files)
finally:
for file in files:
file.close()
def merge(in_files,out_file, extract):
"""Create a new file with data extracted from multiple files.
Data is extracted from the same/equivalent line of each file:
i.e. File1Line1, File2Line1, File3Line1
File1Line2, File2Line2, File3Line2
in_files --> list or tuple of str, file paths
out_file --> str, filepath
extract --> function that returns list or tuple of extracted data
returns none
"""
with files_iterator(in_files) as files, open(out_file, 'w') as out:
out.writelines(map(extract, files))
## out.writelines(extract(lines) for lines in files)
merge(['file1.txt', 'file2.txt'], 'file3.txt', extract)
Files_Iterator
is a With Statement Context Manager that allows multiple synchronous file iteration and ensures the files will be closed. Here is a good start for reading - Understanding Python's "with" statement
回答4:
I would recommend building two dictionaries to represent the data in each file, then write File3 based on that structure:
d1 = {}
with open("File1.txt", 'r') as f:
for line in f:
d1[line.split(':')[3]] = line.split(':')[0]
d2 = {}
with open("File2.txt", 'r') as f:
for line in f:
d2[line.split(':')[0]] = line.split(':')[1]
This will give you two dictionaries that look like this:
d1 = {Lastname: Username}
d2 = {Lastname: Password}
To then write this to File 3, simply run through the keys of either dicitonary:
with open("File3.txt", 'w') as f:
for key in d1:
f.write("{}:{}\n".format(d1[key], d2[key]))
Some things to Note:
If the files don't have all the same values, you'll need to throw in some handling for that (let me know if this is the case and I can toss a few ideas your way
This approach does not preserve any order the files were in
The code assumes that all lines are of the same format. A more complicated file will need some code to handle "odd" lines
回答5:
Its fine to avoid this if you have identically sorted rows in each file. But, if it gets any more complicated than that, then you should be using pandas for this. With pandas, you can essentially do a join, so, no matter how the rows are ordered in each file, this will work. Its also very concise.
import pandas as pd
df1 = pd.read_csv(f1, sep=':', header=None).ix[:,[0,3]]
df1.columns = ['username', 'lastname']
df2 = pd.read_csv(f2, sep=':', header=None)
df2.columns = ['lastname', 'password']
df3 = pd.merge(df1, df2).ix[:,['username','password']]
df3.to_csv(f3, header=False, index=False, sep=':')
Note that you will also have the option to do outer joins. This is useful, if for some reason, there are usernames without passwords or vice versa in your files.
回答6:
This is pretty close. Be sure no blank line at end of input files, or add code to skip blank lines when you read.
#!/usr/bin/env python
"""
File 1:
Username:DOB:Firstname:Lastname:::
File2:
Lastname:Password
File3:
Username:Password
"""
def merge(f1,f2,f3):
username_lastname = {}
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
user = line.strip().split(':')
print user
username_lastname[user[3]] = user[0] # dict with Lastname as key, Username as value
print username_lastname
with open(f2) as passwordFile:
for line in passwordFile:
lastname_password = line.strip().split(':')
print lastname_password
password = lastname_password[1]
username = username_lastname[lastname_password[0]]
print username, password
out_line = "%s:%s\n" % (username, password)
outputFile.write(out_line)
outputFile.close()
merge('f1.txt', 'f2.txt', 'output.txt')
f1:
Username1:DOB:Firstname:Lastname1:::
Username2:DOB:Firstname:Lastname2:::
Username3:DOB:Firstname:Lastname3:::
f2:
Lastname1:Password1
Lastname2:Password2
Lastname3:Password3
f3:
Username1:Password1
Username2:Password2
Username3:Password3