I have an assembler/c question. I just read about segment prefixes, for example ds:varX and so on. The prefix is important for the calculation of the logical address. I read too, that default is "ds" and as soon as you use the ebp register to calculate an address, "ss" is used. For code "cs" is default. That all makes sense. Now I have the following in c:
int x; // some static var in ds
void test(int *p){
...
*p =5;
}
... main(){
test(&x);
//now x is 5
}
If you now think about the implemention of test-function... you get the pointer to x on the stack. If you want to dereference the pointer, you first get the pointer-value(address of x) from the stack and save it in eax for example. Then you can dereference eax to change the value of x. But how does the c-compiler know if the given pointer(address) references memory on the stack (for example if i call test from another function and push the address of a localvariable as parameter for test) or the data segment? How is the full logical address calculated? The function cannot know which segment the given address offset relates to..?!
Segmentation is the legacy artifact of the Intel 16-bit 8086 processor. In reality, you probably operate in virtual memory, where everything is just a linear address. Compile with
-S
flag and see the resulting assembly.Under x86, direct usage of the stack will use the stack segment, but indirect usage treats it as a data segment. You can see this if you disassemble a pointer dereference and write to a stack section pointer. Under x86 cs, ss and ds are treated pretty much the same(atleast in non kernel modes) due to linear addressing. the intel reference manuals should also have a section on segment addressing
On a machine with a segmented memory model, the C implementation must do one of the following things to be conformant:
longjmp
!).Perhaps there are other ways of doing it, but these are the only ones I can think of. Segmented memory models were really pretty disagreeable with C, and they were abandoned for good reason.
In general case, on a segmented platform your can't just read the pointer value "into
eax
" as you suggest. On a segmented platform the pointer would generally hold both the segment value and offset value, meaning that reading such a pointer would imply initializing at least two registers - segment and offset - not just oneeax
.But in specific cases it depends on so called the memory model. Compilers on segmented platforms supported several memory models.
For starters, for obvious reasons it does not matter which segment register you use as long as the segment register holds the correct value. For example, if
DS
andES
registers hold the same value inside, thenDS:<offset>
will point to the same location in memory asES:<offset>
.In so called "tiny" memory model, for one example, all segment registers were holding the same value, i.e. everything - code, data, stack - would fit in one segment (which is why it was called "tiny"). In this memory model each pointer was just an offset in this segment and, of course, it simply didn't matter which segment register to use with that offset.
In "larger" memory models you could have separate segments for code (CS), stack (SS) and data (DS). But on such memory models pointer object would normally hold both the offset and segment part of the address inside of it. In your example pointer
p
would actually be a two-part object, holding both segment value and offset value at the same time. In order to dereference such pointer the compiler would generate the code that would read both segment and offset values fromp
and use both of them. For example, the segment value would be read intoES
register, while the offset value would be read intosi
register. The code would then accessES:[di]
in order to read*p
value.There were also "intermediate" memory models, when code would be stored in one segment (CS), while data and stack would both be stored in another segment, so
DS
andSS
would hold the same value. On that platform, obviously, there was no need to differentiate betweenDS
andSS
.In the largest memory models you could have multiple data segments. In this case it is rather obvious that proper data addressing in segmented mode is not really a matter of choosing the proper segment register (as you seem to believe), but rather a matter of taking pretty much any segment register and initializing it with the correct value before performing the access.
Since you move the address to eax before dereferencing it, it defaults to the ds segment. However, as Nikolai mentioned, in user level code the segments probably all point to the same address.
What AndreyT described was what happened on DOS days. These days, modern operating systems use the so called flat memory model (or rather something very similar), in which all (protected mode) segments are setup so that they all can access the whole address space (i.e: they have a base of 0 and a limit = the whole address space).