I need to open a .bi5
file and read the contents to cut a long story short. The problem: I have tens of thousands of .bi5
files containing time-series data that I need to decompress and process (read, dump into pandas).
I ended up installing Python 3 (I use 2.7 normally) specifically for the lzma
library, as I ran into compiling nightmares using the lzma
back-ports for Python 2.7, so I conceded and ran with Python 3, but with no success. The problems are too numerous to divulge, no one reads long questions!
I have included one of the .bi5
files, if someone could manage to get it into a Pandas Dataframe and show me how they did it, that would be ideal.
ps the fie is only a few kb, it will download in a second. Thanks very much in advance.
(The file) http://www.filedropper.com/13hticks
Did you try using numpy as to parse the data before transfer it to pandas. Maybe is a long way solution, but I will allow you to manipulate and clean the data before you made the analysis in Panda, also the integration between them are pretty straight forward,
The code below should do the trick. First, it opens a file and decodes it in lzma and then uses struct to unpack the binary data.
The most important thing is to know the right format. I googled around and tried to guess and
'>3i2f'
(or>3I2f
) works quite good. (It's big endian 3 ints 2 floats. What you suggest:'i4f'
doesn't produce sensible floats - regardless whether big or little endian.) Forstruct
and format syntax see the docs.Update
To compare the output of
bi5_to_df
with https://github.com/ninety47/dukascopy, I compiled and runtest_read_bi5
from there. The first lines of the output are:And
bi5_to_df
on the same input file gives:So everything seems to be fine (ninety47's code reorders columns).
Also, it's probably more accurate to use
'>3I2f'
instead of'>3i2f'
(i.e.unsigned int
instead ofint
).