I have a sensor unit which generates data in large binary files. File sizes can run into several tens of Gigabytes. I need to:
- Read the data.
- Process it to extract the necessary information that I want.
- Display / Visualize the data.
Data in the binary file is formatted as: Single precision float i.e. numpy.float32
I have written the code which is working well. I am now looking to optimize it for time. I observe that it is taking a very large time to read the binary data. The following is what I have right now :
def get_data(n):
'''
Function to get relevant trace data from the data file.
Usage :
get_data(n)
where n is integer containing relevant trace number to be read
Return :
data_array : Python array containing single wavelength data.
'''
with open(data_file, 'rb') as fid:
data_array = list(np.fromfile(fid, np.float32)[n*no_of_points_per_trace:(no_of_points_per_trace*(n+1))])
return data_array
This allows me to iterate the value for n and obtain different traces i.e. chunks of data. The variable no_of_points_per_trace
contains the number of points in every trace as the name suggests. I am obtaining this from a separate .info file.
Is there an optimal way to do this?