Strange Base64 encode/decode problem

2020-02-23 07:00发布

问题:

I'm using Grails 1.3.7. I have some code that uses the built-in base64Encode function and base64Decode function. It all works fine in simple test cases where I encode some binary data and then decode the resulting string and write it to a new file. In this case the files are identical.

But then I wrote a web service that took the base64 encoded data as a parameter in a POST call. Although the length of the base64 data is identical to the string I passed into the function, the contents of the base64 data are being modified. I spend DAYS debugging this and finally wrote a test controller that passed the data in base64 to post and also took the name of a local file with the correct base64 encoded data, as in:

data=AAA-base-64-data...&testFilename=/name/of/file/with/base64data

Within the test function I compared every byte in the incoming data parameter with the appropriate byte in the test file. I found that somehow every "+" character in the input data parameter had been replaced with a " " (space, ordinal ascii 32). Huh? What could have done that?

To be sure I was correct, I added a line that said:

data = data.replaceAll(' ', '+')

and sure enough the data decoded exactly right. I tried it with arbitrarily long binary files and it now works every time. But I can't figure out for the life of me what would be modifying the data parameter in the post to convert the ord(43) character to ord(32)? I know that the plus sign is one of the 2 somewhat platform dependent characters in the base64 spec, but given that I am doing the encoding and decoding on the same machine for now I am super puzzled what caused this. Sure I have a "fix" since I can make it work, but I am nervous about "fixes" that I don't understand.

The code is too big to post here, but I get the base64 encoding like so:

def inputFile = new File(inputFilename)
def rawData =  inputFile.getBytes()
def encoded = rawData.encodeBase64().toString()

I then write that encoded string out to new a file so I can use it for testing later. If I load that file back in as so I get the same rawData:

def encodedFile = new File(encodedFilename)
String encoded = encodedFile.getText()
byte[] rawData = encoded.decodeBase64()

So all that is good. Now assume I take the "encoded" variable and add it to a param to a POST function like so:

String queryString = "data=$encoded"
String url = "http://localhost:8080/some_web_service"

def results = urlPost(url, queryString)

def urlPost(String urlString, String queryString) {
    def url = new URL(urlString)
    def connection = url.openConnection()
    connection.setRequestMethod("POST")
    connection.doOutput = true

    def writer = new OutputStreamWriter(connection.outputStream)
    writer.write(queryString)
    writer.flush()
    writer.close()
    connection.connect()

    return (connection.responseCode == 200) ? connection.content.text : "error                         $connection.responseCode, $connection.responseMessage"
}

on the web service side, in the controller I get the parameter like so:

String data = params?.data
println "incoming data parameter has length of ${data.size()}" //confirm right size

//unless I run the following line, the data does not decode to the same source
data = data.replaceAll(' ', '+')

//as long as I replace spaces with plus, this decodes correctly, why?
byte[] bytedata = data.decodeBase64()

Sorry for the long rant, but I'd really love to understand why I had to do the "replace space with plus sign" to get this to decode correctly. Is there some problem with the plus sign being used in a request parameter?

回答1:

Whatever populates params expects the request to be a URL-encoded form (specifically, application/x-www-form-urlencoded, where "+" means space), but you didn't URL-encode it. I don't know what functions your language provides, but in pseudo code, queryString should be constructed from

concat(uri_escape("data"), "=", uri_escape(base64_encode(rawBytes)))

which simplifies to

concat("data=", uri_escape(base64_encode(rawBytes)))

The "+" characters will be replaced with "%2B".



回答2:

You have to use a special base64encode which is also url-safe. The problem is that standard base64encode includes +, / and = characters which are replaced by the percent-encoded version.

http://en.wikipedia.org/wiki/Base64#URL_applications

I'm using the following code in php:

    /**
     * Custom base64 encoding. Replace unsafe url chars
     *
     * @param string $val
     * @return string
     */
    static function base64_url_encode($val) {

        return strtr(base64_encode($val), '+/=', '-_,');

    }

    /**
     * Custom base64 decode. Replace custom url safe values with normal
     * base64 characters before decoding.
     *
     * @param string $val
     * @return string
     */
    static function base64_url_decode($val) {

        return base64_decode(strtr($val, '-_,', '+/='));

    }


回答3:

Because it is a parameter to a POST you must URL encode the data.

See http://en.wikipedia.org/wiki/Percent-encoding



回答4:

paraquote from the wikipedia link

The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20"

another hidden pitfall everyday web developers like myself know little about