Apache Avro: map uses CharSequence as key

2019-01-14 18:37发布

问题:

I am using Apache Avro.

My schema has map type:

{"name": "MyData", 
  "type" :  {"type": "map", 
              "values":{
                   "type": "record",
                   "name": "Person",
                   "fields":[
                      {"name": "name", "type": "string"},
                      {"name": "age", "type": "int"},

                ]
                }
               }
}

After compile the schema, the genated Java class use CharSequence as the key for the Map MyData.

It is very inconvenient to use CharSequence in Map as key, is there a way to generate String type key for Map in Apache Avro?

P.S.

Problem is that, for example dataMap.containsKey("SOME_KEY") will returns false even though there is such key there, just because it is CharSequence. Besides, put an map entry with a existing key doesn't relpace the old one. That's why I say it is inconvenient to use CharSequence as key.

回答1:

Apparently, there is a workaround for this problem in Avro 1.6. You specify the string type in your project's POM file:

  <stringType>String</stringType>

This is mentioned in this issue is AVRO-803 ... though the plugin's web documentation doesn't reflect this.



回答2:

This JIRA discussion is relevant. The main point of CharSequence still being used is backwards-compatability.

And like Charles Forsythe pointed out, there has been added a workaround for when String is necessary, by setting the string property in the schema.

 { "type": "string", "avro.java.string": "String" }

The default type here is their own Utf8 class. In addition to manual specification and the pom.xml setting, there is even an avro-tools compile option for it, the -string option:

java -jar avro-tools.1.7.5.jar compile -string schema /path/to/schema .


回答3:

Apparently, by default, Avro uses CharSequence. I found a way to configure it to convert to String

From Avro 1.6.0 onward, there is an option to have Avro always perform the conversion to String. There are a couple of ways to achieve this. The first is to set the avro.java.string property in the schema to String:

         { "type": "string", "avro.java.string": "String" }

I have not tested this.



回答4:

Regardless of whether it's possible to force Avro to use a String, using CharSequence directly is a bad implementation because CharSequence isn't Comparable<CharSequence> and doesn't even specify equality of two identical sequences. I suggest filing this as a bug against Avro.



回答5:

a quick solution(the value type could be other Objects, now I am):

Map<String, String> convertToStringMap(Map<CharSequence, CharSequence> map){
    if (null == map){
        return null;
    }
    HashMap<String, String> result = new  HashMap<String, String>();
    for(CharSequence key: map.keySet()){
        CharSequence k_value = map.get(key);
        String s_key = key.toString();
        String s_value = k_value.toString();
        result.put(s_key, s_value);
    }
    return result;
}


回答6:

I think explicitly convert String to Utf8 will work. "some_key" -> new Utf8("some_key") and use this as your key for the map.



标签: java avro