I am using Apache Avro.
My schema has map type:
{"name": "MyData",
"type" : {"type": "map",
"values":{
"type": "record",
"name": "Person",
"fields":[
{"name": "name", "type": "string"},
{"name": "age", "type": "int"},
]
}
}
}
After compile the schema, the genated Java class use CharSequence
as the key for the Map
MyData
.
It is very inconvenient to use CharSequence
in Map
as key, is there a way to generate String
type key for Map
in Apache Avro?
P.S.
Problem is that, for example dataMap.containsKey("SOME_KEY")
will returns false
even though there is such key there, just because it is CharSequence
. Besides, put an map entry with a existing key doesn't relpace the old one. That's why I say it is inconvenient to use CharSequence
as key.
Apparently, there is a workaround for this problem in Avro 1.6. You specify the string type in your project's POM file:
<stringType>String</stringType>
This is mentioned in this issue is AVRO-803 ... though the plugin's web documentation doesn't reflect this.
This JIRA discussion is relevant. The main point of CharSequence still being used is backwards-compatability.
And like Charles Forsythe pointed out, there has been added a workaround for when String is necessary, by setting the string property in the schema.
{ "type": "string", "avro.java.string": "String" }
The default type here is their own Utf8 class. In addition to manual specification and the pom.xml setting, there is even an avro-tools compile option for it, the -string
option:
java -jar avro-tools.1.7.5.jar compile -string schema /path/to/schema .
Apparently, by default, Avro uses CharSequence
. I found a way to configure it to convert to String
From Avro 1.6.0 onward, there is an option to have Avro always perform the conversion to String. There are a couple of ways to achieve this. The first is to set the avro.java.string property in the schema to String:
{ "type": "string", "avro.java.string": "String" }
I have not tested this.
Regardless of whether it's possible to force Avro to use a String
, using CharSequence
directly is a bad implementation because CharSequence
isn't Comparable<CharSequence>
and doesn't even specify equality of two identical sequences. I suggest filing this as a bug against Avro.
a quick solution(the value type could be other Objects, now I am):
Map<String, String> convertToStringMap(Map<CharSequence, CharSequence> map){
if (null == map){
return null;
}
HashMap<String, String> result = new HashMap<String, String>();
for(CharSequence key: map.keySet()){
CharSequence k_value = map.get(key);
String s_key = key.toString();
String s_value = k_value.toString();
result.put(s_key, s_value);
}
return result;
}
I think explicitly convert String to Utf8 will work.
"some_key" -> new Utf8("some_key") and use this as your key for the map.