Generating pcolormesh images from very large data

2019-02-19 02:15发布

问题:

I am collecting a large amount of data that will be saved into individual H5 files using h5py. I would like to patch these images together into one pcolormesh plot to be saved as a single image.

A quick example I have been working on generates arrays of 2000x2000 random data points and saves them in H5 files using h5py. Then I try to import the data in these files and try to plot it in matplotlib as a pcolormesh, but I always run into a memoryError (which is expected).

import numpy
import h5py
arr = numpy.random.random((2000,2000))

with h5py.File("TEST_HDF5_SAVE_FILES\\Plot_0.h5", "w") as f:
    dset = f.create_dataset("Plot_0", data = arr)

for i in range(1,100):
    arr = numpy.random.random((2000,2000))
    with h5py.File("TEST_HDF5_SAVE_FILES\\Plot_" + str(i) + ".h5", "w") as f:
        dset = f.create_dataset("Plot_" + str(i), data = arr)

This script generates my files. I picked 100 as an arbitrary number just to have a large enough set of files to pull from.

Then I import them using the following script:

y = numpy.arange(0, 2000, 1)

for display_plot_num in range(0, 5):
    print display_plot_num
    x = numpy.arange(display_plot_num*2000, display_plot_num*2000 + 2000, 1)

    with h5py.File("TEST_HDF5_SAVE_FILES\\Plot_" + str(display_plot_num) + ".h5", "r+") as f:
        data = f["Plot_" + str(display_plot_num)]
        plt.pcolormesh(x, y, data)
plt.show()

The range value in the for loop can be altered up until 100, but the maximum value I can choose without a memory error is 5 (i.e. 5 plots can be patched on a pcolormesh plot in matplotlib) and it is extremely clunky and slow. I need to be able to patch together many images.

Is there any other technique I should use to plot this data? Or it would be nice if I could just convert the data from multiple H5 files into an image without going through matplotlib or a similar program (like scipy).

In summary, my problem is this:

  • I have a large number of HDF5 files with image data (2000x2000)
  • I need to patch together these files into a single image and save it

Any help is appreciated. Also, I would be glad to answer any further questions about my problem.


Edit (5.6.2013):

I feel a similar question would be how to deal (import, manipulate, edit, etc.) with very high resolution images in Python. This is essentially what I am trying to do; generate a very high resolution image from a collection of smaller images.

回答1:

One way to reduce the bloat of images in matplotlib (especially when saving to SVG) is to use the rasterized=True kwarg. This will essentially "flatten" your pcolormesh, which makes it much faster to save, uses less resources, etc.