Suppose I have following json:
[
{"id":1,"text":"some text","user_id":1},
{"id":1,"text":"some text","user_id":2},
...
]
What would be an appropriate avro schema for this array of objects?
Suppose I have following json:
[
{"id":1,"text":"some text","user_id":1},
{"id":1,"text":"some text","user_id":2},
...
]
What would be an appropriate avro schema for this array of objects?
[short answer]
The appropriate avro schema for this array of objects would look like:
const type = avro.Type.forSchema({
type: 'array',
items: { type: 'record', fields:
[ { name: 'id', type: 'int' },
{ name: 'text', type: 'string' },
{ name: 'user_id', type: 'int' } ]
}
});
[long answer]
We can use Avro to help us build the above schema by given data object.
Let's use npm package "avsc", which is "Pure JavaScript implementation of the Avro specification".
Since Avro can infer a value's schema we can use following trick to get schema by given data (unfortunately it seems can't show nested schemas, but we can ask twice - for top level structure (array) and then for array element):
// don't forget to install avsc
// npm install avsc
//
const avro = require('avsc');
// avro can infer a value's schema
const type = avro.Type.forValue([
{"id":1,"text":"some text","user_id":1}
]);
const type2 = avro.Type.forValue(
{"id":1,"text":"some text","user_id":1}
);
console.log(type.getSchema());
console.log(type2.getSchema());
Output:
{ type: 'array',
items: { type: 'record', fields: [ [Object], [Object], [Object] ] } }
{ type: 'record',
fields:
[ { name: 'id', type: 'int' },
{ name: 'text', type: 'string' },
{ name: 'user_id', type: 'int' } ] }
Now let's compose proper schema and try to use it to serialize object and then de-serialize it back!
const avro = require('avsc');
const type = avro.Type.forSchema({
type: 'array',
items: { type: 'record', fields:
[ { name: 'id', type: 'int' },
{ name: 'text', type: 'string' },
{ name: 'user_id', type: 'int' } ]
}
});
const buf = type.toBuffer([
{"id":1,"text":"some text","user_id":1},
{"id":1,"text":"some text","user_id":2}]); // Encoded buffer.
const val = type.fromBuffer(buf);
console.log("deserialized object: ", JSON.stringify(val, null, 4)); // pretty print deserialized result
var fs = require('fs');
var full_filename = "/tmp/avro_buf.dat";
fs.writeFile(full_filename, buf, function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved to '" + full_filename + "'");
});
Output:
deserialized object: [
{
"id": 1,
"text": "some text",
"user_id": 1
},
{
"id": 1,
"text": "some text",
"user_id": 2
}
]
The file was saved to '/tmp/avro_buf.dat'
We can even enjoy the compact binary representation of the above exercise:
hexdump -C /tmp/avro_buf.dat
00000000 04 02 12 73 6f 6d 65 20 74 65 78 74 02 02 12 73 |...some text...s|
00000010 6f 6d 65 20 74 65 78 74 04 00 |ome text..|
0000001a
Nice, isn't she?-)