What exactly causes binary file “gibberish”?

2019-02-08 11:45发布

I haven't found an answer to this particular question; perhaps there isn't one. But I've been wondering for a while about it.

What exactly causes a binary file to display as "gibberish" when you look at it in a text editor? It's the same thing with encrypted files. Are the binary values of the file trying to be converted into ASCII? Is it possible to convert the view to display raw binary values, i.e. to show the 1s and 0s that make up the file?

Finally, is there a way to determine what program will properly open a data file? Many times, especially with Windows, a file is orphaned or otherwise not associated w/ a particular program. Opening it in a text editor sometimes tells you where it belongs but most of the time doesn't, due to the gibberish. If the extension doesn't provide any information, how can you determine what program it belongs to?

标签: binaryfiles
7条回答
倾城 Initia
2楼-- · 2019-02-08 12:04

The reason files that are binary display as gibberish when viewed in standard text editors such as notepad is because when displayed with the encodings commonly used by these types of applications (e.g. ASCII of UTF-8) the data is mapped to characters when it is encoded for display, the output of this process generally makes as little sense to humans as the binary data being mapped, ergo the gibberish you see

As previously mentioned these files make more sense when viewed in a different way such as with a hex edutor.

Certain file types can be recognized by data present in all files of a given type, for example all executable files (*.exe) begin with the letters MZ

查看更多
祖国的老花朵
3楼-- · 2019-02-08 12:06

Binary data is often very random. Encrypted data in particular, by definition. Each byte can be represented by one of 256 characters (leaving Unicode out of the equation). ASCII only covers 128 of these, and only 94 of these are actual printable characters. Outside the ASCII range, you have a number of international characters and strange symbols. There are certainly more than 128 of these, so one must specify a codepage to select a specific set of symbols.

Anyway, since binary files can be represented as a very random assortment of familiar and unfamiliar characters, the file will look like gibberish if you open it in an editor.

You could always open a file (binary or text file, there really is no difference) in a hex editor, and look at the raw binary data.

There is no way to tell which program created a specific file. In particular, if the program has encrypted its data, all hope is lost. Otherwise, it is often easy to recognize certain "signatures."

查看更多
霸刀☆藐视天下
4楼-- · 2019-02-08 12:07

A text editor makes very few assumptions about the data coming into it, besides things like character encodings. Thus, it will (as you say) read the file's data as ASCII and display it that way. Since binary data doesn't always fall within the alphanumeric range, you get gibberish. As for showing the raw binary values, you need a hex editor like XVI32.

Binary files often have no context outside of the program that uses them. Some binary formats contain a 4-byte magic sequence at the beginning (for example, Java .class files start with "CAFE"), but to recognize them without their program, you need a mapping of those 4-byte sequences. I believe some Linux distros contain this information for a wide variety of binary formats and will examine the beginning of the file to attempt to identify it. Other than that, there's not much you can do.

查看更多
地球回转人心会变
5楼-- · 2019-02-08 12:08

Yes, Wordpad and Notepad and many other text editors assume that any file you open with it is a text file and will try to display the ASCII characters represented by the bytes in the file.

Hex Editors are made to view and edit binary files. They usually display each byte as a pair of hexadecimal digits instead of "1s and 0s" because it's easier to read that way.

查看更多
不美不萌又怎样
6楼-- · 2019-02-08 12:09
  • Are the binary values of the file trying to be converted into ASCII?

Yes, that's exactly what's happening. Typically, the binary values of the file also include ASCII control characters that aren't printable, resulting in even more bizarre display in a typical text editor.

  • Is it possible to convert the view to display raw binary values, i.e. to show the 1s and 0s that make up the file?

It depends on your editor. What you want is a "hex editor", rather than a normal text editor. This will show you the raw contents of the file (typically in hexadecimal rather than binary, since the zeros and ones would take up a lot of space and be harder to read).

  • Finally, is there a way to determine what program will properly open a data file?

There is a Linux command-line program called "file" that will attempt to analyze the file (typically looking for common header patterns) and tell you what sort of file it is (for example text, or audio, or video, or XML, etc). I'm not sure if there is an equivalent program for Windows. Of course, the output of this program is just a guess, but it can be very useful when you don't know what the format of a file is.

查看更多
Lonely孤独者°
7楼-- · 2019-02-08 12:11

The display looks interesting, because a binary file can contain non-printable characters. It is up to the displaying program to replace such characters with something else.

This can be prevented by using a hex editor. Such a program displays each byte from the file as its hexadecimal value. That makes for a nice tabular view of the file, but it is not easy for the average person to decipher this view, because we are not used to look at data that way.

There are a few ways to find out what program a file might belong to. You can look at the beginning of the file and with some knowledge, you might recognize the file type. There are some types that begin with the same characters (RAR, GIF etc.). For other types it might not be as easy.

In Linux you can use the "file" command to help you determine file type. There are probably programs for Windows that will do the same.

查看更多
登录 后发表回答