Parsing query strings on Android

2018-12-31 19:58发布

Java EE has ServletRequest.getParameterValues().

On non-EE platforms, URL.getQuery() simply returns a string.

What's the normal way to properly parse the query string in a URL when not on Java EE?


<rant>

It is popular in the answers to try and make your own parser. This is very interesting and exciting micro-coding project, but I cannot say that it is a good idea :(

The code snippets below are generally flawed or broken, btw. Breaking them is an interesting exercise for the reader. And to the hackers attacking the websites that use them.

Parsing query strings is a well defined problem but reading the spec and understanding the nuances is non-trivial. It is far better to let some platform library coder do the hard work, and do the fixing, for you!

</rant>

24条回答
回忆,回不去的记忆
2楼-- · 2018-12-31 20:24

I have methods to achieve this:

1):

public static String getQueryString(String url, String tag) {
    String[] params = url.split("&");
    Map<String, String> map = new HashMap<String, String>();
    for (String param : params) {
        String name = param.split("=")[0];
        String value = param.split("=")[1];
        map.put(name, value);
    }

    Set<String> keys = map.keySet();
    for (String key : keys) {
        if(key.equals(tag)){
         return map.get(key);
        }
        System.out.println("Name=" + key);
        System.out.println("Value=" + map.get(key));
    }
    return "";
}

2) and the easiest way to do this Using Uri class:

public static String getQueryString(String url, String tag) {
    try {
        Uri uri=Uri.parse(url);
        return uri.getQueryParameter(tag);
    }catch(Exception e){
        Log.e(TAG,"getQueryString() " + e.getMessage());
    }
    return "";
}

and this is an example of how to use either of two methods:

String url = "http://www.jorgesys.com/advertisements/publicidadmobile.htm?position=x46&site=reform&awidth=800&aheight=120";      
String tagValue = getQueryString(url,"awidth");

the value of tagValue is 800

查看更多
笑指拈花
3楼-- · 2018-12-31 20:24

This works for me.. I'm not sure why every one was after a Map, List> All I needed was a simple name value Map.

To keep things simple I used the build in URI.getQuery();

public static Map<String, String> getUrlParameters(URI uri)
    throws UnsupportedEncodingException {
    Map<String, String> params = new HashMap<String, String>();
    for (String param : uri.getQuery().split("&")) {
        String pair[] = param.split("=");
        String key = URLDecoder.decode(pair[0], "UTF-8");
        String value = "";
        if (pair.length > 1) {
            value = URLDecoder.decode(pair[1], "UTF-8");
        }
        params.put(new String(key), new String(value));
    }
    return params;
}
查看更多
无与为乐者.
4楼-- · 2018-12-31 20:25

Here is BalusC's answer, but it compiles and returns results:

public static Map<String, List<String>> getUrlParameters(String url)
        throws UnsupportedEncodingException {
    Map<String, List<String>> params = new HashMap<String, List<String>>();
    String[] urlParts = url.split("\\?");
    if (urlParts.length > 1) {
        String query = urlParts[1];
        for (String param : query.split("&")) {
            String pair[] = param.split("=");
            String key = URLDecoder.decode(pair[0], "UTF-8");
            String value = "";
            if (pair.length > 1) {
                value = URLDecoder.decode(pair[1], "UTF-8");
            }
            List<String> values = params.get(key);
            if (values == null) {
                values = new ArrayList<String>();
                params.put(key, values);
            }
            values.add(value);
        }
    }
    return params;
}
查看更多
浮光初槿花落
5楼-- · 2018-12-31 20:26

Parsing the query string is a bit more complicated than it seems, depending on how forgiving you want to be.

First, the query string is ascii bytes. You read in these bytes one at a time and convert them to characters. If the character is ? or & then it signals the start of a parameter name. If the character is = then it signals the start of a paramter value. If the character is % then it signals the start of an encoded byte. Here is where it gets tricky.

When you read in a % char you have to read the next two bytes and interpret them as hex digits. That means the next two bytes will be 0-9, a-f or A-F. Glue these two hex digits together to get your byte value. But remember, bytes are not characters. You have to know what encoding was used to encode the characters. The character é does not encode the same in UTF-8 as it does in ISO-8859-1. In general it's impossible to know what encoding was used for a given character set. I always use UTF-8 because my web site is configured to always serve everything using UTF-8 but in practice you can't be certain. Some user-agents will tell you the character encoding in the request; you can try to read that if you have a full HTTP request. If you just have a url in isolation, good luck.

Anyway, assuming you are using UTF-8 or some other multi-byte character encoding, now that you've decoded one encoded byte you have to set it aside until you capture the next byte. You need all the encoded bytes that are together because you can't url-decode properly one byte at a time. Set aside all the bytes that are together then decode them all at once to reconstruct your character.

Plus it gets more fun if you want to be lenient and account for user-agents that mangle urls. For example, some webmail clients double-encode things. Or double up the ?&= chars (for example: http://yoursite.com/blah??p1==v1&&p2==v2). If you want to try to gracefully deal with this, you will need to add more logic to your parser.

查看更多
旧人旧事旧时光
6楼-- · 2018-12-31 20:28

On Android, I tried using @diyism answer but I encountered the space character issue raised by @rpetrich, for example: I fill out a form where username = "us+us" and password = "pw pw" causing a URL string to look like:

http://somewhere?username=us%2Bus&password=pw+pw

However, @diyism code returns "us+us" and "pw+pw", i.e. it doesn't detect the space character. If the URL was rewritten with %20 the space character gets identified:

http://somewhere?username=us%2Bus&password=pw%20pw

This leads to the following fix:

Uri uri = Uri.parse(url_string.replace("+", "%20"));
uri.getQueryParameter("para1");
查看更多
呛了眼睛熬了心
7楼-- · 2018-12-31 20:29

You say "Java" but "not Java EE". Do you mean you are using JSP and/or servlets but not a full Java EE stack? If that's the case, then you should still have request.getParameter() available to you.

If you mean you are writing Java but you are not writing JSPs nor servlets, or that you're just using Java as your reference point but you're on some other platform that doesn't have built-in parameter parsing ... Wow, that just sounds like an unlikely question, but if so, the principle would be:

xparm=0
word=""
loop
  get next char
  if no char
    exit loop
  if char=='='
    param_name[xparm]=word
    word=""
  else if char=='&'
    param_value[xparm]=word
    word=""
    xparm=xparm+1
  else if char=='%'
    read next two chars
    word=word+interpret the chars as hex digits to make a byte
  else
    word=word+char

(I could write Java code but that would be pointless, because if you have Java available, you can just use request.getParameters.)

查看更多
登录 后发表回答