I have an assembler/c question. I just read about segment prefixes, for example ds:varX and so on. The prefix is important for the calculation of the logical address. I read too, that default is "ds" and as soon as you use the ebp register to calculate an address, "ss" is used. For code "cs" is default. That all makes sense.
Now I have the following in c:
int x; // some static var in ds
void test(int *p){
...
*p =5;
}
... main(){
test(&x);
//now x is 5
}
If you now think about the implemention of test-function... you get the pointer to x on the stack. If you want to dereference the pointer, you first get the pointer-value(address of x) from the stack and save it in eax for example. Then you can dereference eax to change the value of x. But how does the c-compiler know if the given pointer(address) references memory on the stack (for example if i call test from another function and push the address of a localvariable as parameter for test) or the data segment? How is the full logical address calculated? The function cannot know which segment the given address offset relates to..?!
In general case, on a segmented platform your can't just read the pointer value "into eax
" as you suggest. On a segmented platform the pointer would generally hold both the segment value and offset value, meaning that reading such a pointer would imply initializing at least two registers - segment and offset - not just one eax
.
But in specific cases it depends on so called the memory model. Compilers on segmented platforms supported several memory models.
For starters, for obvious reasons it does not matter which segment register you use as long as the segment register holds the correct value. For example, if DS
and ES
registers hold the same value inside, then DS:<offset>
will point to the same location in memory as ES:<offset>
.
In so called "tiny" memory model, for one example, all segment registers were holding the same value, i.e. everything - code, data, stack - would fit in one segment (which is why it was called "tiny"). In this memory model each pointer was just an offset in this segment and, of course, it simply didn't matter which segment register to use with that offset.
In "larger" memory models you could have separate segments for code (CS), stack (SS) and data (DS). But on such memory models pointer object would normally hold both the offset and segment part of the address inside of it. In your example pointer p
would actually be a two-part object, holding both segment value and offset value at the same time. In order to dereference such pointer the compiler would generate the code that would read both segment and offset values from p
and use both of them. For example, the segment value would be read into ES
register, while the offset value would be read into si
register. The code would then access ES:[di]
in order to read *p
value.
There were also "intermediate" memory models, when code would be stored in one segment (CS), while data and stack would both be stored in another segment, so DS
and SS
would hold the same value. On that platform, obviously, there was no need to differentiate between DS
and SS
.
In the largest memory models you could have multiple data segments. In this case it is rather obvious that proper data addressing in segmented mode is not really a matter of choosing the proper segment register (as you seem to believe), but rather a matter of taking pretty much any segment register and initializing it with the correct value before performing the access.
What AndreyT described was what happened on DOS days. These days, modern operating systems use the so called flat memory model (or rather something very similar), in which all (protected mode) segments are setup so that they all can access the whole address space (i.e: they have a base of 0 and a limit = the whole address space).
On a machine with a segmented memory model, the C implementation must do one of the following things to be conformant:
- Store the full address (with segment) in each pointer, OR
- Ensure that all stack addresses that will be used for variables whose addresses are taken can be accessed via the data segment, either at the same relative address or via some magic offset the compiler can apply when taking the address of local variables, OR
- Not use the stack for local variables whose addresses are taken, and perform a hidden malloc/free on every function entry/return (with special handling for
longjmp
!).
Perhaps there are other ways of doing it, but these are the only ones I can think of. Segmented memory models were really pretty disagreeable with C, and they were abandoned for good reason.
Segmentation is the legacy artifact of the Intel 16-bit 8086 processor. In reality, you probably operate in virtual memory, where everything is just a linear address. Compile with -S
flag and see the resulting assembly.
Since you move the address to eax before dereferencing it, it defaults to the ds segment. However, as Nikolai mentioned, in user level code the segments probably all point to the same address.
Under x86, direct usage of the stack will use the stack segment, but indirect usage treats it as a data segment. You can see this if you disassemble a pointer dereference and write to a stack section pointer. Under x86 cs, ss and ds are treated pretty much the same(atleast in non kernel modes) due to linear addressing. the intel reference manuals should also have a section on segment addressing