Batch rename part of a filename from a lookup file

2019-05-22 21:35发布

问题:

edit: see the bottom for my eventual solution

I have a directory of ~12,700 text files.

They have names like this:

1 - Re/ Report Novenator public call for bury - by Lizbett on Thu, 10 Sep 2009.txt

Where the leading digital increments with each file (e.g. the last file in the directory begins with "12,700 - ").

Unfortunately, the files are not timesorted, and I need them to be. Luckily I have a separate CSV file where the ID numbers are mapped e.g. the 1 in the example above should really be 25 (since there are 24 messages before it), and 2 should really be 8, and 3 should be 1, and so forth, like so:

OLD_FILEID  TIMESORT_FILEID
21      0
23      1
24      2
25      3

I don't need to change anything in the file title except for this single leading number which I need to swap with its associated value. In my head, the way this would work is to open a file name, check the digits which appear before the dash, look them up in the CSV, replace them with the associated value, and then save the file with the adjusted title and go on to the next file.

What would be the best way to go about doing something like this? I'm a python newbie but have played around enough to feel comfortable following most directions or suggestions. Thanks :)

e: following the instructions below as best I could I did this, which doesn't work, but I'm not sure why:

import os
import csv
import sys

#open and store the csv file
with open('timesortmap.csv','rb') as csvfile:
timeReader = csv.reader(csvfile, delimiter = ',', quotechar='"')

#get the list of files
for filename in os.listdir('DiggOutput-TIMESORT/'):
oldID = filename.split(' - ')[0]
newFilename = filename.replace(oldID, timeReader[oldID],1)
os.rename(oldID, newFilename)

The error I get is:

TypeError: '_csv.reader' object is not subscriptable

I am not using DictReader, but that's because when I use csv.reader and print the rows, it looks like this:

['12740', '12738']
['12742', '12739']
['12738', '12740']
['12737', '12741']
['12739', '12742']

And when I use DictReader it looks like this:

{'FILEID-TS': '12738', 'FILEID-OLD': '12740'}
{'FILEID-TS': '12739', 'FILEID-OLD': '12742'}
{'FILEID-TS': '12740', 'FILEID-OLD': '12738'}
{'FILEID-TS': '12741', 'FILEID-OLD': '12737'}
{'FILEID-TS': '12742', 'FILEID-OLD': '12739'}

And I get this error in terminal:

File "TimeSorter.py", line 16, in <module>
newFilename = filename.replace(oldID, timeReader[oldID],1)
AttributeError: DictReader instance has no attribute '__getitem__'

回答1:

This should really be very simple to do in Python just using the csv and os modules.

Python has a built-in dictionary type called dict that could be used to store the contents of the csv file in-memory while you are processing. Basically, you would need to read the csv file using the csv module and convert each entry into a dictionary entry, probably using the OLD_FILEID field as the key and the TIMESORT_FILEID as the value.

You can then use os.listdir() to get the list of files and use a loop to get each file name in turn. (If you need to filter the list of file names to exclude some files, take a look at the glob module). Inside your loop, you just need to extract the number associated with the file, which can be done using something like this:

file_number = filename.split(' - ')[0] 

Then call os.rename() passing in the old file name and the new file name. The new filename can be found using something like:

new_filename = filename.replace(file_number, file_mapping[file_number], 1)

Where file_mapping is the dictionary created from the csv file. This will replace the first occurrence of the file_number with the number from your mapping file.

Edit

As TheodrosZelleke points out, there is the potential to overwrite an existing file by literally following what I laid out above. Several possible strategies:

  1. Use os.rename() to move the renamed versions of the files into a different directory (e.g. a subdirectory of the current directory or, even better, a temporary directory created using tempfile.mkdtemp(). Once all the files have been renamed, use os.rename to move the files from the temporary directory to the current directory.
  2. Add an extension to the new filename, e.g., .tmp, assuming that the extension chosen will not cause other conflicts. Once all the renames are done, use a second loop to rename the files to exclude the .tmp extension.


回答2:

Here's what I ended up working out with friends, should anyone find and look for this:

import os
import csv
import sys

IDs = {}

#open and store the csv file
with open('timesortmap.csv','rb') as csvfile:
        timeReader = csv.reader(csvfile, delimiter = ',', quotechar='"')

        # build a dictionary with the associated IDs
        for row in timeReader:
              IDs[ row[0] ] = row[1]

# #get the list of files
path = 'DiggOutput-OLDID/'
tmpPath = 'DiggOutput-TIMESORT/'
for filename in os.listdir('DiggOutput-OLDID/'):
    oldID = filename.split(' - ')[0]
    newFilename = filename.replace(oldID, IDs[oldID])
    os.rename(path + filename, tmpPath + newFilename)