How do computers differentiate 2 pieces of data? [

2019-05-23 03:09发布

站内文章 / 后端开发

42 0

爷的心禁止访问

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I was wondering that computers store all the info in the form of 1s and 0s/low and high voltage, yada yada...but then when we compile the program, it - or just any data stored on the comp - is in binary form...then how does the computer differentiate between 2 pieces of data, since all it consists is a stream of 0s and 1s...To make my question more clear let's take a ridiculously simple code from C:

void main() {
    int A = 0;
    int* pA = &A;
    char c = 'c';
    char* pC = &c;
    return;
}

it doesn't do anything - just makes 4 variables of types int, pointer to Int, char, and pointer to Char...Now these will be stored somewhere in the form of 0s and 1s...So, how does the computer know from which bit does such and such variable start and where it ends? For start you might say the computer has the address of it, okay, granted. But what about the end?...And what about complex data types like objects/structs?

And last but not the least, what about functions/procedures?

回答1:

The paragraph that you're reading right now is nothing but a stream of letters and punctuation. How do you know where one word starts and ends? How do you know what the words mean? How does this stream of text convey useful information?

You can say the same thing about mathematics. When you see mathematical expressions written on a page, they're just a series of numbers and symbols, but they're a powerful way to convey deep ideas in a compact form. And then there's music. How does that stream of dots, flags, and lines represent something as transient as music?

The answer, of course, is that there are rules. The letters aren't just combined randomly -- they have a specific sequence. When you follow the rules that you and I both know, you're able to discern the words, understand their individual meanings, and combine them into thoughts.

It's the same with binary data. The thing that distinguishes data from random bits is the existence of rules that, if followed, allow interpretation of the bits in a meaningful way. Now, you've asked a lot of questions that involve a variety of rules. Trying to explain them all would take up more space than is reasonable in an answer like this one (and more time than I'm willing to devote to the endeavor). But if you pick up a book on computer architecture, you'll find a full discussion of the rules, how they work, how they're organized, and how they're implemented. It's really interesting stuff!

If you're not ready to dive into actual computer architecture yet, one excellent book that will give you a lot of insight is Godel, Escher, Bach: An Eternal Golden Braid by Douglas Hofstadter. It's a thick book, and dense with ideas. But it's also well written and interesting, and you don't necessarily have to read it from cover to cover to learn a lot of fascinating stuff.

回答2:

You can answer all these questions(and many more regarding computers) by getting as close to the metal as possible:That is , learn assembly.I suggest reading the book Art of Assembly(freely available online) which covers these topics too. Also, read my answer on Assembly learning resources.Now, let me answer your questions briefly:

You are right in that the computer only sees an endless stream of bits.The operating system does the job of creating a file system.Even ram can be thought as a very simple file system (with pages or segments being the files).Now what this means is that the OS has a table somewhere where it keeps track of where each program has stored what, what is data, what is code etc.
Variables at the fundamental level are nothing more than bytes.Now, when you write a statement such as

a = b + 1

The compiler actually assigns an arbitrary address to the variable and hard-codes ( i.e writes the actual constant e.g 0xA3F0) this address to every statement that refers to it.

Data structures are stored in many different ways.However, when talking about c structures, things are simpler: They just store the variables that this structure contains one after the other, if we ignore things like padding and such.That is the reason why a structure's length is always known.
Functions are actually places in memory where code is stored.To 'call' a function, the arguments are loaded in the stack, or any other global memory space, and then a jump, ie goto, to the function's address is made.when the function is done, it jumps to the address which called it (the address is stored in the stack too.)
It is important to understand that the compiler does all the hard work of translating your code in the above-mentioned ways.All the features that high-level languages have are just abstractions in order to make your job easier.In the end however it is just bits and bytes, 0s and 1s, 5 volt and zero volt.

What is more, modern architectures do not let the OS do all that stuff by itself.Much of the housekeeping is happening at the hardware level too, e.g memory management , labeling what memory address serves what purpose etc.

回答3:

It doesn't. The same sequence of bits can be interpreted as numbers, strings, code, structs, whatever. The computer has no way of knowing what a bunch of bits was intended to be.

Try this:

int main() {
    int A = 0;
    char* pC = (char*)&S;
}

You'll find that it works. It takes the integer memory and says I want to treat it as a character array. The computer will happily go along with this. Its rarely useful, but it can be done.

The only things that are different for the different types is how they are treated. Floats are treated different from integers are treated differently from strings. If you look at the the low level version of your program, you'll find that every operation includes is specific to a certain type of data. The difference isn't in the bits, its in how the program operates on the bits.

回答4:

The computer does not know, and the computer does not care. All it does is follow instructions. One such instruction might say: "Take 32 bits from this address and another 32 bits from that address; combine these two 32-bit strings by using the method called 'two's complement addition'; and store the result in the 32 bits at the first mentioned address". Each instruction specifies:

the address(es) from which data is to be read and to which data is to be written
the number of bits to read or to write
the operation to be performed on the bits read

The computer doesn't care what the operation does. It's just that the computer designer was good enough to make the operation useful to us humans.

A program such as the one you give is in a very real sense at a high level. It takes translation to produce a form that the computer can understand. Such a translator knows what int is, what int * is, and knows for both how many bits they take in memory and which computer operations can be usefully applied to them.

Thus, you almost answered your own question:

For start you might say the computer has the address of it, okay, granted. But what about the end?

The end is known if you know the start and the length.

More complex data structures are generally composed of individual, simpler parts. So, when translating such code, you take the parts, assign them offsets, making sure that no part overlaps another, and then use the offsets to compute the address used to access the parts.

Procedures and functions are too complex to be explained here.

But a brief note at the end about your example program. As you say, it doesn't do anything. A clever translator will simply write an instruction "do nothing" to the computer. A less clever translator will assign addresses to each of the variables you declare, and writes two instructions: "reserve space for this many bits; and then do nothing" (the number of bits being the length of space required to store each of the variables). At no point the computer needs to know anything about the variables in your program.

回答5:

The compiled program will consist of machine instructions that access the data in patterns that reflect the high-level types. Most assembly languages have different instructions for loading and manipulating data of different sizes (loading bytes, words, longs, etc.) or types (signed and unsigned integers, floats and longs, etc.). Because the compiler has type information available to it during compilation, it can emit assembly instructions that treat the data in memory, which is all just zeros and ones, as having the appropriate structure by issuing commands to operate over data in a way that is consistent with the type system.

For structs and functions, there are many possible encodings depending on what language you're using. I taught a compilers course last summer and we spent two lectures on function and object layouts. The slides for the first and second lectures are available at the previous links.

Hope this helps!

回答6:

Writing in a high level language the rules of the language and the compiler embed that information into the program created. The cpu/processor could care less it is just bits, they have no meaning other than for a very brief period of time as an instruction is executed. For an add instruction the bits are operands to the addition or the result, for a load or store they might be the address or an offset to an address, etc but immediately after go back to being meaningless bits.

As mentioned by another post these words you are reading are just combinations of letters from the alphabet and have no meaning taken one at a time, have no meaning to the web browser or the video card displaying pixels, but to the high level user, they do have meaning. Same with the programs, zoom out a bit, look at the program as a whole and you will see that the combinations of instructions and bits form program sequences that implement the variable types and the high level program sequences you had written and compiled.

there is no magic to it