Non-alphanumeric list order from os.listdir()

2019-01-02 20:49发布

I often use python to process directories of data. Recently, I have noticed that the default order of the lists has changed to something almost nonsensical. For example, if I am in a current directory containing the following subdirectories: run01, run02, ... run19, run20, and then I generate a list from the following command:

dir = os.listdir(os.getcwd())

then I usually get a list in this order:

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

and so on. The order used to be alphanumeric. But this new order has remained with me for a while now.

What is determining the (displayed) order of these lists?

10条回答
几人难应
2楼-- · 2019-01-02 21:33

You can use the builtin sorted function to sort the strings however you want. Based on what you describe,

sorted(os.listdir(whatever_directory))

Alternatively, you can use the .sort method of a list:

lst = os.listdir(whatever_directory)
lst.sort()

I think should do the trick.

Note that the order that os.listdir gets the filenames is probably completely dependent on your filesystem.

查看更多
像晚风撩人
3楼-- · 2019-01-02 21:34
aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

As In case of mine requirement I have the case like row_163.pkl here os.path.splitext('row_163.pkl') will break it into ('row_163', '.pkl') so need to split it based on '_' also.

but in case of your requirement you can do something like

sorted(aa, key = lambda x: (int(re.sub('\D','',x)),x))

where

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

and also for directory retrieving you can do sorted(os.listdir(path))

and for the case of like 'run01.txt' or 'run01.csv' you can do like this

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))
查看更多
几人难应
4楼-- · 2019-01-02 21:35

Per the documentation:

os.listdir(path)

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory.

Order cannot be relied upon and is an artifact of the filesystem.

To sort the result, use sorted(os.listdir(path)).

查看更多
高级女魔头
5楼-- · 2019-01-02 21:35

I found "sort" does not always do what I expected. eg, I have a directory as below, and the "sort" give me a very strange result:

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

It seems it compares the first character first, if that is the biggest, it would be the last one.

查看更多
登录 后发表回答