I have a BitSet and want to write it to a file- I came across a solution to use a ObjectOutputStream using the writeObject method.
I looked at the ObjectOutputStream in the java API and saw that you can write other things (byte, int, short etc)
I tried to check out the class so I tried to write a byte to a file using the following code but the result gives me a file with 7 bytes instead of 1 byte
my question is what are the first 6 bytes in the file? why are they there?
my question is relevant to a BitSet because i don't want to start writing lots of data to a file and realize I have random bytes inserted in the file without knowing what they are.
here is the code:
byte[] bt = new byte[]{'A'};
File outFile = new File("testOut.txt");
FileOutputStream fos = new FileOutputStream(outFile);
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.write(bt);
oos.close();
thanks for any help
Avner
The other bytes will be type information.
Basically ObjectOutputStream is a class used to write Serializable objects to some destination (usually a file). It makes more sense if you think about InputObjectStream. It has a readObject() method on it. How does Java know what Object to instantiate? Easy: there is type information in there.
You could be writing any objects out to an ObjectOutputStream
, so the stream holds information about the types written as well as the data needed to reconstitute the object.
If you know that the stream will always contain a BitSet, don't use an ObjectOutputStream
- and if space is a premium, then convert the BitSet
to a set of bytes where each bit corresponds to a bit in the BitSet
, then write that directly to the underlying stream (e.g. a FileOutputStream
as in your example).
The serialisation format, like many others, includes a header with magic number and version information. When you use DataOutput
/OutputStream
methods on ObjectOutputStream
are placed in the middle of the serialised data (with no type information). This is typically only done in writeObject
implementations after a call to defaultWriteObject
or use of putFields
.
If you only use the saved BitSet in Java, the serialization works fine. However, it's kind of annoying if you want share the bitset across multi platforms. Besides the overhead of Java serialization, the BitSet is stored in units of 8-bytes. This can generate too much overhead if your bitset is small.
We wrote this small class so we can exract byte arrays from BitSet. Depending on your usecase, it might work better than Java serialization for you.
public class ExportableBitSet extends BitSet {
private static final long serialVersionUID = 1L;
public ExportableBitSet() {
super();
}
public ExportableBitSet(int nbits) {
super(nbits);
}
public ExportableBitSet(byte[] bytes) {
this(bytes == null? 0 : bytes.length*8);
for (int i = 0; i < size(); i++) {
if (isBitOn(i, bytes))
set(i);
}
}
public byte[] toByteArray() {
if (size() == 0)
return new byte[0];
// Find highest bit
int hiBit = -1;
for (int i = 0; i < size(); i++) {
if (get(i))
hiBit = i;
}
int n = (hiBit + 8) / 8;
byte[] bytes = new byte[n];
if (n == 0)
return bytes;
Arrays.fill(bytes, (byte)0);
for (int i=0; i<n*8; i++) {
if (get(i))
setBit(i, bytes);
}
return bytes;
}
protected static int BIT_MASK[] =
{0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01};
protected static boolean isBitOn(int bit, byte[] bytes) {
int size = bytes == null ? 0 : bytes.length*8;
if (bit >= size)
return false;
return (bytes[bit/8] & BIT_MASK[bit%8]) != 0;
}
protected static void setBit(int bit, byte[] bytes) {
int size = bytes == null ? 0 : bytes.length*8;
if (bit >= size)
throw new ArrayIndexOutOfBoundsException("Byte array too small");
bytes[bit/8] |= BIT_MASK[bit%8];
}
}