How to open more than 19 files in parallel (Python

2019-07-24 12:55发布

问题:

I have a project that needs to read data, then write in more than 23 CSV files in parallel depending on each line. For example, if the line is about temperature, we should write to temperature.csv, if about humidity, >>to humid.CSV , etc.

I tried the following:

with open('Results\\GHCN_Daily\\MetLocations.csv','wb+') as locations, \
            open('Results\\GHCN_Daily\\Tmax.csv','wb+')as tmax_d, \
            open('Results\\GHCN_Daily\\Tmin.csv','wb+')as tmin_d, \
            open('Results\\GHCN_Daily\\Snow.csv', 'wb+')as snow_d, \
            .
            .
            # total of 23 'open' statements
            .

            open('Results\\GHCN_Daily\\SnowDepth.csv','wb+')as snwd_d, \
            open('Results\\GHCN_Daily\\Cloud.csv', 'wb+')as cloud_d, \
            open('Results\\GHCN_Daily\\Evap.csv', 'wb+')as evap_d, \

I got the following error

SystemError: too many statically nested blocks python

I searched for this error, and I get to this post which says that

You will encounter this error when you nest blocks more than 20. This is a design decision of Python interpreter to restrict it to 20.

But the open statement I wrote opens the files in parallel, not nested.

What am I doing wrong, and how can I solve this problem?

Thanks in advance.

回答1:

Each open is a nested context, its just that python syntax allows you to put them in a comma-separated list. contextlib.ExitStack is a context container that lets you put as many contexts as you like in a stack and exits each of them when you are done. So, you could do

import contextlib

files_to_process = (
    ('Results\\GHCN_Daily\\MetLocations.csv', 'locations'),
    ('Results\\GHCN_Daily\\Tmax.csv', 'tmax_d'),
    ('Results\\GHCN_Daily\\Tmin.csv', 'tmin_d'),
    # ...
)

with contextlib.ExitStack() as stack:
    files = {varname:stack.enter_context(open(filename, 'rb'))
        for filename, varname in files_to_process}
    # and for instance...
    files['locations'].writeline('my location\n')

If you find dict access less tidy than attribute access, you could create a simple container class

class SimpleNamespace:

    def __init__(self, name_val_pairs):
        self.__dict__.update(name_val_pairs)

with contextlib.ExitStack() as stack:
    files = SimpleNamespace(((varname, stack.enter_context(open(filename, 'rb')))
        for filename, varname in files_to_process))
    # and for instance...
    files.locations.writeline('my location\n')


回答2:

i would have a list of possible files = ['humidity','temperature',...]
make a dic that contain the possible file, a dataframe, a path to the file, for example:

main_dic = {}

for file in possible_files:

    main_dic[file][path] = '%s.csv' %file
    main_dic[file][data] = pd.DataFrame([], columns=['value','other_column','another_column', ....])

afterwards, i wld read whatever doc you are getting the values from and store em on the proper dictionary dataframe.

when finished just save the data on csv, example:

for file in main_dic:

     main_dic[file][data].to_csv('%s.csv' %file, index=False)

hope it helps



回答3:

If the data is not very huge, why not read in all the data and group the data by categories ( e.g. put all data about temperature into one group ), then write the grouped data into corresponding files at one go?



回答4:

It would be ok to open >20 files in this way.

# your list of file names
file_names = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u'] 
fh = [] # list of file handlers
for idx,f in enumerate(files):
    fileName = f + '.txt'
    fh.append(open(fileName,'w'))

# do what you need here
print "done"

for f in fh:
    f.close() 

though not sure if you really need to do so.