I've got a bin file that was encoded in an application that I need to get access to and convert to a csv file. I've been given the documentation, but am not sure how to access the contents of this file in Python.
Here are some of the details about how the dataset was serialized
Datasets.bin is a list of DataSet classes serialized using Qt's QDataStream serialization using version QDataStream::Qt_4_7.
The format of the datasets.bin file is:
quint32 Magic Number 0x46474247
quint32 Version 1
quint32 DataSet Marker 0x44415441
qint32 # of DataSets n
DataSet DataSet 1
DataSet DataSet 2
.
.
.
.
DataSet DataSet n
The format of each DataSet is:
quint32 Magic Number 0x53455455
QString Name
quint32 Flags Bit field (Set Table)
QString Id [Optional]
QColor Color [Optional]
qint32 Units [Optional]
QStringList Creator Ids [Optional]
bool Hidden [Optional]
QList<double> Thresholds [Optional]
QString Source [Optional]
qint32 Role [Optional]
QVector<QPointF> data points
I've been looking in to the PyQt4 datastream documentation, but I can't seem to find any specific examples. Any help pointing me in the right direction would be great
PyQt cannot read all of the data the same way as in C++, because it cannot handle template classes (like QList<double>
and QVector<QPointF>
), which would require language-specific support that is not available in Python. This means a work-around must be used. Fortunately, the datastream format is quite straightforward, so reading arbitrary template classes can be reduced to a simple algorithm: read the length as a uint32
, then iterate over a range
and read the contained elements one-by-one into a list
:
points = []
length = stream.readUInt32()
for index in range(length):
point = QPoint()
stream >> point
points.append(point)
Below is a script that shows how to read the whole dataset format correctly:
from PyQt4 import QtCore, QtGui
FLAG_HASSOURCE = 0x0001
FLAG_HASROLE = 0x0002
FLAG_HASCOLOR = 0x0004
FLAG_HASID = 0x0008
FLAG_COMPRESS = 0x0010
FLAG_HASTHRESHOLDS = 0x0020
FLAG_HASUNITS = 0x0040
FLAG_HASCREATORIDS = 0x0080
FLAG_HASHIDDEN = 0x0100
FLAG_HASMETADATA = 0x0200
MAGIC_NUMBER = 0x46474247
FILE_VERSION = 1
DATASET_MARKER = 0x44415441
DATASET_MAGIC = 0x53455455
def read_data(path):
infile = QtCore.QFile(path)
if not infile.open(QtCore.QIODevice.ReadOnly):
raise IOError(infile.errorString())
stream = QtCore.QDataStream(infile)
magic = stream.readUInt32()
if magic != MAGIC_NUMBER:
raise IOError('invalid magic number')
version = stream.readUInt32()
if version != FILE_VERSION:
raise IOError('invalid file version')
marker = stream.readUInt32()
if marker != DATASET_MARKER:
raise IOError('invalid dataset marker')
count = stream.readInt32()
if count < 1:
raise IOError('invalid dataset count')
stream.setVersion(QtCore.QDataStream.Qt_4_7)
rows = []
while not stream.atEnd():
row = []
magic = stream.readUInt32()
if magic != DATASET_MAGIC:
raise IOError('invalid dataset magic number')
row.append(('Name', stream.readQString()))
flags = stream.readUInt32()
row.append(('Flags', flags))
if flags & FLAG_HASID:
row.append(('ID', stream.readQString()))
if flags & FLAG_HASCOLOR:
color = QtGui.QColor()
stream >> color
row.append(('Color', color))
if flags & FLAG_HASUNITS:
row.append(('Units', stream.readInt32()))
if flags & FLAG_HASCREATORIDS:
row.append(('Creators', stream.readQStringList()))
if flags & FLAG_HASHIDDEN:
row.append(('Hidden', stream.readBool()))
if flags & FLAG_HASTHRESHOLDS:
thresholds = []
length = stream.readUInt32()
for index in range(length):
thresholds.append(stream.readDouble())
row.append(('Thresholds', thresholds))
if flags & FLAG_HASSOURCE:
row.append(('Source', stream.readQString()))
if flags & FLAG_HASROLE:
row.append(('Role', stream.readInt32()))
points = []
length = stream.readUInt32()
for index in range(length):
point = QtCore.QPointF()
stream >> point
points.append(point)
row.append(('Points', points))
rows.append(row)
infile.close()
return rows
rows = read_data('datasets.bin')
for index, row in enumerate(rows):
print('Row %s:' % index)
for key, data in row:
if isinstance(data, list) and len(data):
print(' %s = [%s ... ] (%s items)' % (
key, repr(data[:3])[1:-1], len(data)))
else:
print(' %s = %s' % (key, data))