Parse a string with delimiters and load it in a ma

2020-05-05 16:50发布

问题:

I have below String which is in the format of key1=value1, key2=value2 which I need to load it in a map (Map<String, String>) as key=value so I need to split on comma , and then load cossn as key and 0 its value.

String payload = "cossn=0, abc=hello/=world, Agent=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36";

HashMap<String, String> holder = new HashMap();
String[] keyVals = payload.split(", ");
for(String keyVal:keyVals) {
  String[] parts = keyVal.split("=",2);
  holder.put(parts[0], parts[1]);
}   

I am getting java.lang.ArrayIndexOutOfBoundsException at this line holder.put(parts[0], parts[1]); and it is happening bcoz of this String Agent=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36 since it has an extra comma in the value KHTML, like Gecko.

How can I fix this? In general below should be my keys and value after loading it in a map.

Key         Value
cossn       0
abc         hello/=world
Agent       Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36

回答1:

As you said your keys only contain alphanumerics, the following would probably be a good heuristic for splitting:

payload.split("\\s*,\\s*(?=[a-zA-Z0-9_]+\\s*=|$)");

Which will split on probably whitespace framed commas that are followed by the end of the string or an alphanumeric key, optional whitespace and an equals sign.



回答2:

Given that you have no control over the payload, you need to do something to make the "illegal commas" not match your ", " regex.

Vampire provided a great regex. Since I've already gone down the road of manual parsing, I'll provide a non-regex solution below.

An alternate solution is to manually find the parse/split points yourself by iterating character by character and saving substrings. Keep track of the "last comma-space" until you get to the "next equals" in order to determine whether to split on that comma-space or not.

Here's some code that demonstrates what I'm trying to explain.

import java.util.Arrays;

public class ParseTest {

    static String payload = "cossn=0, abc=hello/=world, Agent=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36";

    public static void main(String[] args) {
        int lastCommaSpace = -2;
        int beginIndex = 0;

        // Iterate over string
        // We are looking for comma-space pairs so we stop one short of end of
        // string
        for (int i = 0; i < payload.length() - 1; i++) {
            if (payload.charAt(i) == ',' && payload.charAt(i + 1) == ' ') {
                // This is the point we want to split at
                lastCommaSpace = i;
            }
            if (payload.charAt(i) == '=' && lastCommaSpace != beginIndex - 2) {
                // We've found the next equals, split at the last comma we saw
                String pairToSplit = payload.substring(beginIndex, lastCommaSpace);
                System.out.println("Split and add this pair:" + Arrays.toString(pairToSplit.split("=", 2)));
                beginIndex = lastCommaSpace + 2;
            }
        }
        // We got to the end, split the last one
        String pairToSplit = payload.substring(beginIndex, payload.length());
        System.out.println("Split and add this pair:" + Arrays.toString(pairToSplit.split("=", 2)));
    }

}