I am looking around looking for a correct regualr expression for validating URI query strings. I found some answers here or here but I still have doubts on the edge cases, where the key or the value could be empty. For example, should be the following treated as valid query strings?
?&&
?=
?a=
?a=&
?=a
?&=a
Sure thing, no prob. As per RFC 3986, appendix B, here it is:
If you want something more elaborate, you can check section 3.4 for the allowed characters in addition to percent-encoded entities. The regex would look something like this:
As far as RFC 3986 is concerned, all your examples are valid so far. The RFC is telling us how the query string has to be encoded while saying little about how the query string has to be structured. Older RFCs are continuously shifting authority over the structure of query strings between CGI and HTTP without ever formally specifying a grammar (see e.g. RFC 3875, sec. 4.1.7, RFC 2396, sec. 3.4, RFC 1808, sec. 2.1, …).
An interesting note can be found in RFC 7230, section 2.4:
For a full validity check on such query strings, you would have to implement the algorithm for decoding formdata recommended by the W3C. Could be done in regex, but I would advise against it for reasons of sanity.
With regard to your examples: I believe they are all valid. How they are interpreted should be left to the receiving application. Some are not as much of a fringe case as you may think, though:
?&&
is simply an empty dictionary while?=a
could map to{ "": "a" }
.