Why do we use Base64?

2020-01-23 10:29发布

Wikipedia says

Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remains intact without modification during transport.

But is it not that data is always stored/transmitted in binary because the memory that our machines have store binary and it just depends how you interpret it? So, whether you encode the bit pattern 010011010110000101101110 as Man in ASCII or as TWFu in Base64, you are eventually going to store the same bit pattern.

If the ultimate encoding is in terms of zeros and ones and every machine and media can deal with them, how does it matter if the data is represented as ASCII or Base64?

What does it mean "media that are designed to deal with textual data"? They can deal with binary => they can deal with anything.


Thanks everyone, I think I understand now.

When we send over data, we cannot be sure that the data would be interpreted in the same format as we intended it to be. So, we send over data coded in some format (like Base64) that both parties understand. That way even if sender and receiver interpret same things differently, but because they agree on the coded format, the data will not get interpreted wrongly.

From Mark Byers example

If I want to send

Hello
world!

One way is to send it in ASCII like

72 101 108 108 111 10 119 111 114 108 100 33

But byte 10 might not be interpreted correctly as a newline at the other end. So, we use a subset of ASCII to encode it like this

83 71 86 115 98 71 56 115 67 110 100 118 99 109 120 107 73 61 61

which at the cost of more data transferred for the same amount of information ensures that the receiver can decode the data in the intended way, even if the receiver happens to have different interpretations for the rest of the character set.

12条回答
爱情/是我丢掉的垃圾
2楼-- · 2020-01-23 10:41

One example of when I found it convenient was when trying to embed binary data in XML. Some of the binary data was being misinterpreted by the SAX parser because that data could be literally anything, including XML special characters. Base64 encoding the data on the transmitting end and decoding it on the receiving end fixed that problem.

查看更多
劫难
3楼-- · 2020-01-23 10:42

Most computers store data in 8-bit binary format, but this is not a requirement. Some machines and transmission media can only handle 7 bits (or maybe even lesser) at a time. Such a medium would interpret the stream in multiples of 7 bits, so if you were to send 8-bit data, you won't receive what you expect on the other side. Base-64 is just one way to solve this problem: you encode the input into a 6-bit format, send it over your medium and decode it back to 8-bit format at the receiving end.

查看更多
▲ chillily
4楼-- · 2020-01-23 10:43

Encoding binary data in XML

Suppose you want to embed a couple images within an XML document. The images are binary data, while the XML document is text. But XML cannot handle embedded binary data. So how do you do it?

One option is to encode the images in base64, turning the binary data into text that XML can handle.

Instead of:

<images>
  <image name="Sally">{binary gibberish that breaks XML parsers}</image>
  <image name="Bobby">{binary gibberish that breaks XML parsers}</image>
</images>

you do:

<images>
  <image name="Sally" encoding="base64">j23894uaiAJSD3234kljasjkSD...</image>
  <image name="Bobby" encoding="base64">Ja3k23JKasil3452AsdfjlksKsasKD...</image>
</images>

And the XML parser will be able to parse the XML document correctly and extract the image data.

查看更多
beautiful°
5楼-- · 2020-01-23 10:45

Media that is designed for textual data is of course eventually binary as well, but textual media often use certain binary values for control characters. Also, textual media may reject certain binary values as non-text.

Base64 encoding encodes binary data as values that can only be interpreted as text in textual media, and is free of any special characters and/or control characters, so that the data will be preserved across textual media as well.

查看更多
我欲成王,谁敢阻挡
6楼-- · 2020-01-23 10:47

It is more that the media validates the string encoding, so we want to ensure that the data is acceptable by a handling application (and doesn't contain a binary sequence representing EOL for example)

Imagine you want to send binary data in an email with encoding UTF-8 -- The email may not display correctly if the stream of ones and zeros creates a sequence which isn't valid Unicode in UTF-8 encoding.

The same type of thing happens in URLs when we want to encode characters not valid for a URL in the URL itself:

http://www.foo.com/hello my friend -> http://www.foo.com/hello%20my%20friend

This is because we want to send a space over a system that will think the space is smelly.

All we are doing is ensuring there is a 1-to-1 mapping between a known good, acceptable and non-detrimental sequence of bits to another literal sequence of bits, and that the handling application doesn't distinguish the encoding.

In your example, man may be valid ASCII in first form; but often you may want to transmit values that are random binary (ie sending an image in an email):

MIME-Version: 1.0
Content-Description: "Base64 encode of a.gif"
Content-Type: image/gif; name="a.gif"
Content-Transfer-Encoding: Base64
Content-Disposition: attachment; filename="a.gif"

Here we see that a GIF image is encoded in base64 as a chunk of an email. The email client reads the headers and decodes it. Because of the encoding, we can be sure the GIF doesn't contain anything that may be interpreted as protocol and we avoid inserting data that SMTP or POP may find significant.

查看更多
霸刀☆藐视天下
7楼-- · 2020-01-23 10:48

What does it mean "media that are designed to deal with textual data"?

Back in the day when ASCII ruled the world dealing with non-ASCII values was a headache. People jumped through all sorts of hoops to get these transferred over the wire without losing out information.

查看更多
登录 后发表回答