I am trying to avro binary encode my JSON String. Below is my JSON String and I have created a simple method which will do the conversion but I am not sure whether the way I am doing is correct or not?
public static void main(String args[]) throws Exception{
try{
Schema schema = new Parser().parse((TestExample.class.getResourceAsStream("/3233.avsc")));
String json="{"+
" \"location\" : {"+
" \"devices\":["+
" {"+
" \"did\":\"9abd09-439bcd-629a8f\","+
" \"dt\":\"browser\","+
" \"usl\":{"+
" \"pos\":{"+
" \"source\":\"GPS\","+
" \"lat\":90.0,"+
" \"long\":101.0,"+
" \"acc\":100"+
" },"+
" \"addSource\":\"LL\","+
" \"add\":["+
" {"+
" \"val\":\"2123\","+
" \"type\" : \"NUM\""+
" },"+
" {"+
" \"val\":\"Harris ST\","+
" \"type\" : \"ST\""+
" }"+
" ],"+
" \"ei\":{"+
" \"ibm\":true,"+
" \"sr\":10,"+
" \"ienz\":true,"+
" \"enz\":100,"+
" \"enr\":10"+
" },"+
" \"lm\":1390598086120"+
" }"+
" }"+
" ],"+
" \"ver\" : \"1.0\""+
" }"+
"}";
byte[] avroByteArray = fromJsonToAvro(json,schema);
} catch (Exception ex) {
// log an exception
}
Below method will convert my JSON String to Avro Binary encoded -
private static byte[] fromJsonToAvro(String json, Schema schema) throws Exception {
InputStream input = new ByteArrayInputStream(json.getBytes());
DataInputStream din = new DataInputStream(input);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
Object datum = reader.read(null, decoder);
GenericDatumWriter<Object> w = new GenericDatumWriter<Object>(schema);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
w.write(datum, e);
e.flush();
return outputStream.toByteArray();
}
Can anyone take a look and let me know whether the way I am trying to avro binary my JSON String is correct or not?
I think OP is correct. This will write Avro records themselves without the schema that would be present if this were an Avro data file.
Here's a couple examples within Avro itself (useful if you are working with files.
• From JSON to Avro: DataFileWriteTool
• From Avro to JSON: DataFileReadTool
Here's a complete example going both ways.
@Grapes([
@Grab(group='org.apache.avro', module='avro', version='1.7.7')
])
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.EOFException;
import java.io.IOException;
import java.io.InputStream;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonEncoder;
String schema = '''{
"type":"record",
"namespace":"foo",
"name":"Person",
"fields":[
{
"name":"name",
"type":"string"
},
{
"name":"age",
"type":"int"
}
]
}'''
String json = "{" +
"\"name\":\"Frank\"," +
"\"age\":47" +
"}"
assert avroToJson(jsonToAvro(json, schema), schema) == json
public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
InputStream input = null;
GenericDatumWriter<GenericRecord> writer = null;
Encoder encoder = null;
ByteArrayOutputStream output = null;
try {
Schema schema = new Schema.Parser().parse(schemaStr);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
input = new ByteArrayInputStream(json.getBytes());
output = new ByteArrayOutputStream();
DataInputStream din = new DataInputStream(input);
writer = new GenericDatumWriter<GenericRecord>(schema);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
encoder = EncoderFactory.get().binaryEncoder(output, null);
GenericRecord datum;
while (true) {
try {
datum = reader.read(null, decoder);
} catch (EOFException eofe) {
break;
}
writer.write(datum, encoder);
}
encoder.flush();
return output.toByteArray();
} finally {
try { input.close(); } catch (Exception e) { }
}
}
public static String avroToJson(byte[] avro, String schemaStr) throws IOException {
boolean pretty = false;
GenericDatumReader<GenericRecord> reader = null;
JsonEncoder encoder = null;
ByteArrayOutputStream output = null;
try {
Schema schema = new Schema.Parser().parse(schemaStr);
reader = new GenericDatumReader<GenericRecord>(schema);
InputStream input = new ByteArrayInputStream(avro);
output = new ByteArrayOutputStream();
DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
Decoder decoder = DecoderFactory.get().binaryDecoder(input, null);
GenericRecord datum;
while (true) {
try {
datum = reader.read(null, decoder);
} catch (EOFException eofe) {
break;
}
writer.write(datum, encoder);
}
encoder.flush();
output.flush();
return new String(output.toByteArray());
} finally {
try { if (output != null) output.close(); } catch (Exception e) { }
}
}
For the sake of completeness, here's an example if you were working with streams (Avro calls these container files) instead of records. Note that when you go back from JSON to Avro, you don't need to pass the schema. This is because it is present in the stream.
@Grapes([
@Grab(group='org.apache.avro', module='avro', version='1.7.7')
])
// writes Avro as a http://avro.apache.org/docs/current/spec.html#Object+Container+Files rather than a sequence of records
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.EOFException;
import java.io.IOException;
import java.io.InputStream;
import org.apache.avro.Schema;
import org.apache.avro.file.DataFileStream;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonEncoder;
String schema = '''{
"type":"record",
"namespace":"foo",
"name":"Person",
"fields":[
{
"name":"name",
"type":"string"
},
{
"name":"age",
"type":"int"
}
]
}'''
String json = "{" +
"\"name\":\"Frank\"," +
"\"age\":47" +
"}"
assert avroToJson(jsonToAvro(json, schema)) == json
public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
InputStream input = null;
DataFileWriter<GenericRecord> writer = null;
Encoder encoder = null;
ByteArrayOutputStream output = null;
try {
Schema schema = new Schema.Parser().parse(schemaStr);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
input = new ByteArrayInputStream(json.getBytes());
output = new ByteArrayOutputStream();
DataInputStream din = new DataInputStream(input);
writer = new DataFileWriter<GenericRecord>(new GenericDatumWriter<GenericRecord>());
writer.create(schema, output);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
GenericRecord datum;
while (true) {
try {
datum = reader.read(null, decoder);
} catch (EOFException eofe) {
break;
}
writer.append(datum);
}
writer.flush();
return output.toByteArray();
} finally {
try { input.close(); } catch (Exception e) { }
}
}
public static String avroToJson(byte[] avro) throws IOException {
boolean pretty = false;
GenericDatumReader<GenericRecord> reader = null;
JsonEncoder encoder = null;
ByteArrayOutputStream output = null;
try {
reader = new GenericDatumReader<GenericRecord>();
InputStream input = new ByteArrayInputStream(avro);
DataFileStream<GenericRecord> streamReader = new DataFileStream<GenericRecord>(input, reader);
output = new ByteArrayOutputStream();
Schema schema = streamReader.getSchema();
DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
for (GenericRecord datum : streamReader) {
writer.write(datum, encoder);
}
encoder.flush();
output.flush();
return new String(output.toByteArray());
} finally {
try { if (output != null) output.close(); } catch (Exception e) { }
}
}
To add to Keegan's answer, this discussion is likely useful:
http://mail-archives.apache.org/mod_mbox/avro-user/201209.mbox/%3CCALEq1Z8s1sfaAVB7YE2rpZ=v3q1V_h7Vm39h0HsOzxJ+qfQRSg@mail.gmail.com%3E
The gist is that there is a special Json schema and you can use JsonReader/Writer to get to and from that. The Json schema you should use is defined here:
https://github.com/apache/avro/blob/trunk/share/schemas/org/apache/avro/data/Json.avsc
You can use the avro-tools
to convert the json file({input_file}.json
) to avro file({output_file}.avro
) when you know the schema({schema_file}.avsc
) of the json file. Just like below:
java -jar the/path/of/avro-tools-1.8.1.jar fromjson {input_file}.json --schema-file {schema_file}.avsc > {output_file}.avro
By the way, the contents of {schema_file}.avsc
file is as belows:
{"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
Download avro-tools-1.8.1
Download others avro-tools