How to go about serializing a large, complex objec

2019-02-19 13:24发布

问题:

I have a "User" class with 40+ private variables including complex objects like private/public keys (QCA library), custom QObjects etc. The idea is that the class has a function called sign() which encrypts, signs, serializes itself and returns a QByteArray which can then be stored in a SQLite blob.

What's the best approach to serialize a complex object? Iterating though the properties with QMetaObject? Converting it to a protobuf object?

Could it be casted to a char array?

回答1:

Could it be casted to a char array?

No, because you'd be casting QObject's internals that you know nothing about, pointers that are not valid the second time you run your program, etc.

TL;DR: Implementing it manually is OK for explicit data elements, and leveraging metaobject system for QObject and Q_GADGET classes will help some of the drudgery.

The simplest solution might be to implement QDataStream operators for the object and the types you use. Make sure to follow good practice: each class that could conceivably ever change the format of data it holds must emit a format identifier.

For example, let's take the following classes:

class User {
  QString m_name;
  QList<CryptoKey> m_keys;
  QList<Address> m_addresses;
  QObject m_props;
  ...
  friend QDataStream & operator<<(QDataStream &, const User &);
  friend QDataStream & operator>>(QDataStream &, User &);
public:
  ...
};
Q_DECLARE_METATYPE(User) // no semi-colon

class Address {
  QString m_line1;
  QString m_line2;
  QString m_postCode;
  ...
  friend QDataStream & operator<<(QDataStream &, const Address &);
  friend QDataStream & operator>>(QDataStream &, Address &);
public:
  ...
};
Q_DECLARE_METATYPE(Address) // no semi-colon!

The Q_DECLARE_METATYPE macro makes the classes known to the QVariant and the QMetaType type system. Thus, for example, it's possible to assign an Address to a QVariant, convert such a QVariant to Address, to stream the variant directly to a datastream, etc.

First, let's address how to dump the QObject properties:

QList<QByteArray> publicNames(QList<QByteArray> names) {
  names.erase(std::remove_if(names.begin(), names.end(),
              [](const QByteArray & v){ return v.startsWith("_q_"); }), names.end());
  return names;
}

bool isDumpable(const QMetaProperty & prop) {
  return prop.isStored() && !prop.isConstant() && prop.isReadable() && prop.isWritable();
}

void dumpProperties(QDataStream & s, const QObject & obj)
{
  s << quint8(0); // format
  QList<QByteArray> names = publicNames(obj.dynamicPropertyNames());
  s << names;
  for (name : names) s << obj.property(name);
  auto mObj = obj.metaObject();
  for (int i = 0; i < mObj->propertyCount(), ++i) {
    auto prop = mObj->property(i);
    if (! isDumpable(prop)) continue;
    auto name = QByteArray::fromRawData(prop.name(), strlen(prop.name());
    if (! name.isEmpty()) s << name << prop.read(&obj);
  }
  s << QByteArray();
}

In general, if we were to deal with data from a User that didn't have the m_props member, we'd need to be able to clear the properties. This idiom will come up every time you extend the stored object and upgrade the serialization format.

void clearProperties(QObject & obj)
{
  auto names = publicNames(obj.dynamicPropertyNames());
  const QVariant null;
  for (name : names) obj.setProperty(name, null);
  auto const mObj = obj.metaObject();
  for (int i = 0; i < mObj->propertyCount(), ++i) {
    auto prop = mObj->property(i);
    if (! isDumpable(prop)) continue;
    if (prop.isResettable()) {
      prop.reset(&obj);
      continue;
    }
    prop.write(&obj, null);
  }
}

Now we know how to restore the properties from a stream:

void loadProperties(QDataStream & s, QObject & obj)
{
  quint8 format;
  s >> format;
  // We only support one format at the moment.
  QList<QByteArray> names;
  s >> names;
  for (name : names) {
    QVariant val;
    s >> val;
    obj.setProperty(name, val);
  }
  auto const mObj = obj.metaObject();
  forever {
    QByteArray name;
    s >> name;
    if (name.isEmpty()) break;
    QVariant value;    
    s >> value;
    int idx = mObj->indexOfProperty(name);
    if (idx < 0) continue;
    auto prop = mObj->property(idx);
    if (! isDumpable(prop)) continue;
    prop.write(&obj, value);
  }
}

We can thus implement the stream operators to serialize our objects:

#define fallthrough

QDataStream & operator<<(QDataStream & s, const User & user) {
  s << quint8(1) // format
    << user.m_name << user.m_keys << user.m_addresses;
  dumpProperties(s, &m_props);
  return s;
}

QDataStream & operator>>(QDataStream & s, User & user) {
  quint8 format;
  s >> format;
  switch (format) {
  case 0:
    s >> user.m_name >> user.m_keys;
    user.m_addresses.clear();
    clearProperties(&user.m_props);
    fallthrough;
  case 1:
    s >> user.m_addresses;
    loadProperties(&user.m_props);
    break;
  }
  return s;
}

QDataStream & operator<<(QDataStream & s, const Address & address) {
  s << quint8(0) // format
    << address.m_line1 << address.m_line2 << address.m_postCode;
  return s;
}

QDataStream & operator>>(QDataStream & s, Address & address) {
  quint8 format;
  s >> format;
  switch (format) {
  case 0:
    s >> address.m_line1 >> address.m_line2 >> address.m_postCode;
    break;
  }
  return s;
}

The property system will also work for any other class, as long as you declare its properties and add the Q_GADGET macro (instead of Q_OBJECT). This is supported from Qt 5.5 onwards.

Suppose that we declared our Address class as follows:

class Address {
  Q_GADGET
  Q_PROPERTY(QString line1 MEMBER m_line1)
  Q_PROPERTY(QString line2 MEMBER m_line2)
  Q_PROPERTY(QString postCode MEMBER m_postCode)

  QString m_line1;
  QString m_line2;
  QString m_postCode;
  ...
  friend QDataStream & operator<<(QDataStream &, const Address &);
  friend QDataStream & operator>>(QDataStream &, Address &);
public:
  ...
};

Let's then declare the datastream operators in terms of [dump|clear|load]Properties modified for dealing with gadgets:

QDataStream & operator<<(QDataStream & s, const Address & address) {
  s << quint8(0); // format
  dumpProperties(s, &address);
  return s;
}

QDataStream & operator>>(QDataStream & s, Address & address) {
  quint8 format;
  s >> format;
  loadProperties(s, &address);
  return s;
}

We do not need to change the format designator even if the property set has been changed. We should retain the format designator in case we had other changes that couldn't be expressed as a simple property dump anymore. This is unlikely in most cases, but one must remember that a decision not to use a format specifier immediately sets the format of the streamed data in stone. It's not subsequently possible to change it!

Finally, the property handlers are slightly cut-down and modified variants of the ones used for the QObject properties:

template <typename T> void dumpProperties(QDataStream & s, const T * gadget) {
  dumpProperties(s, T::staticMetaObject, gadget);
}

void dumpProperties(QDataStream & s, const QMetaObject & mObj, const void * gadget)
{
  s << quint8(0); // format
  for (int i = 0; i < mObj.propertyCount(), ++i) {
    auto prop = mObj.property(i);
    if (! isDumpable(prop)) continue;
    auto name = QByteArray::fromRawData(prop.name(), strlen(prop.name());
    if (! name.isEmpty()) s << name << prop.readOnGadget(gadget);
  }
  s << QByteArray();
}

template <typename T> void clearProperties(T * gadget) {
  clearProperties(T::staticMetaObject, gadget);
}

void clearProperties(const QMetaObject & mObj, void * gadget)
{
  const QVariant null;
  for (int i = 0; i < mObj.propertyCount(), ++i) {
    auto prop = mObj.property(i);
    if (! isDumpable(prop)) continue;
    if (prop.isResettable()) {
      prop.resetOnGadget(gadget);
      continue;
    }
    prop.writeOnGadget(gadget, null);
  }
}

template <typename T> void loadProperties(QDataStream & s, T * gadget) {
  loadProperties(s, T::staticMetaObject, gadget);
}

void loadProperties(QDataStream & s, const QMetaObject & mObj, void * gadget)
{
  quint8 format;
  s >> format;
  forever {
    QByteArray name;
    s >> name;
    if (name.isEmpty()) break;
    QVariant value;    
    s >> value;
    auto index = mObj.indexOfProperty(name);
    if (index < 0) continue;
    auto prop = mObj.property(index);
    if (! isDumpable(prop)) continue;
    prop.writeOnGadget(gadget, value);
  }
}

TODO An issue that was not addressed in the loadProperties implementations is to clear the properties that are present in the object but not present in the serialization.

It is very important to establish how the entire data stream is versioned when it comes to the internal version of QDataStream formats. The documentation is a required reading.

One also has to decide how is the compatibility handled between the versions of the software. There are several approaches:

  1. (Most typical and unfortunate) No compatiblity: No format information is stored. New members are added to the serialization in an ad-hoc fashion. Older versions of the software will exhibit undefined behavior when faced with newer data. Newer versions will do the same with older data.

  2. Backward compatibility: Format information is stored in the serialization of each custom type. New versions can properly deal with older versions of the data. Older versions must detect an unhandled format, abort deserialization, and indicate an error to the user. Ignoring newer formats leads to undefined behavior.

  3. Full backward-and-forward compatibility: Each serialized custom type is stored in a QByteArray or a similar container. By doing this, you have information on how long the data record for the entire type is. The QDataStream version must be fixed. To read a custom type, its byte array is read first, then a QBuffer is set up that you use a QDataStream to read from. You read the elements you can parse in the formats you know of, and ignore the rest of the data. This forces an incremental approach to formats, where a newer format can only append elements over an existing format. But, if a newer format abandons some data element from an older format, it must still dump it, but with a null or otherwise safe default value that keeps the older versions of your code "happy".

If you think that the format bytes may ever run out, you can employ a variable-length encoding scheme, known as extension or extended octets, familiar across various ITU standards (e.g. Q.931 4.5.5 Bearer Capability information element). The idea is as follows: the highest bit of an octet (byte) is used to indicate whether the value needs more octets for representation. This makes the byte have 7 bits to represent the value, and 1 bit to mark extension. If the bit is set, you read the subsequent octets and concatenate them in little-endian fashion to the existing value. Here is how you might do this:

class VarLengthInt {
public:
  quint64 val;
  VarLengthInt(quint64 v) : val(v) { Q_ASSERT(v < (1ULL<<(7*8))); }
  operator quint64() const { return val; }
};

QDataStream & operator<<(QDataStream & s, VarLengthInt v) {
  while (v.val > 127) {
    s << (quint8)((v & 0x7F) | 0x80);
    v.val = v.val >> 7;
  }
  Q_ASSERT(v.val <= 127);
  s << (quint8)v.val;
  return s;
}

QDataStream & operator>>(QDataStream & s, VarLengthInt & v) {
  v.val = 0;
  forever {
    quint8 octet;
    s >> octet;
    v.val = (v.val << 7) | (octet & 0x7F);
    if (! (octet & 0x80)) break;
  }
  return s;
}

The serialization of VarLengthInt has variable length and always uses the minimum number of bytes possible for a given value: 1 byte up to 0x7F, 2 bytes up to 0x3FFF, 3 bytes up to 0x1F'FFFF, 4 bytes up to 0x0FFF'FFFF, etc. Apostrophes are valid in C++14 integer literals.

It would be used as follows:

QDataStream & operator<<(QDataStream & s, const User & user) {
  s << VarLengthInt(1) // format
    << user.m_name << user.m_keys << user.m_addresses;
  dumpProperties(s, &m_props);
  return s;
}

QDataStream & operator>>(QDataStream & s, User & user) {
  VarLengthInt format;
  s >> format;
  ...
  return s;
}


回答2:

Binary dump serialization is a bad idea, it will include a lot of stuff you don't need like the object's v-table pointer, as well as other pointers, contained directly or from other class members, which make no sense to be serialized, since they do not persist between application sessions.

If it is just a single class, just implement it by hand, it certainly won't kill you. If you have a family of classes, and they are QObject derived, you could use the meta system, but that will only register properties, whereas a int something member which is not tied to a property will be skipped. If you have a lot of data members which are not Qt properties, it will take you more typing to expose them as Qt properties, unnecessarily I might add, than it would take to write the serialization method by hand.