As the function is set of instruction stored in one contiguous block of memory.
And address of a function (entry point) is the address of the first instruction in the function. (from my knowledge)
And thus we can say that the address of function and the address of the first instruction in the function will be the same (In this case the first instruction is the initialization of a variable.).
But the program below contradicts the above line.
code:
#include<iostream>
#include<stdio.h>
#include<string.h>
using namespace std;
char ** fun()
{
static char * z = (char*)"Merry Christmas :)";
return &z;
}
int main()
{
char ** ptr = NULL;
char ** (*fun_ptr)(); //declaration of pointer to the function
fun_ptr = &fun;
ptr = fun();
printf("\n %s \n Address of function = [%p]", *ptr, fun_ptr);
printf("\n Address of first variable created in fun() = [%p]", (void*)ptr);
cout<<endl;
return 0;
}
One output example is:
Merry Christmas :)
Address of function = [0x400816]
Address of first variable created in fun() = [0x600e10]
So, here the address of function and the address of first variable in function is not same. Why so?
I searched on google but can't come up with the exact required answer and being new to this language I exactly can't catch some of contents on net.
In the text of your question you say:
but in the code you don't get the address of the first instruction in the function but the address of some local variable declared in the function.
A function is code, a variable is data. They are stored in different memory areas; they don't even reside in the same memory block. Due to security restrictions imposed by the OSes nowadays, the code is stored in memory blocks that are marked as read-only.
As far as I know, the C language doesn't provide any way to get the address of a statement in the memory. Even if it would provide such a mechanism, the start of the function (the function's address in memory) is not the same as the address of the machine code generated from the first C statement.
Before the code generated from the first C statement, the compiler generates a function prolog that (at least) saves the current value of the stack pointer and makes room for function's local variables. This means several assembly instructions before any code generated from the first statement of the C function.
variables declared within a function is not allocated where you see in the code automatic variables (variables defined locally in a function) is given a suitable place in the stack memory when function is about to be called , this is done during compile time by the compiler, thus the address of the first instruction is have nothing to do with variables it's about the executable instructions
Like other variables, an address of a function is the space allocated for it. In other words, it is the memory location where the instructions (machine code) for the operation performed by the function is stored.
To understand this, take a deep look on memory layout of a program.
A program's variables and executable code/instructions are stored at different segments of memory (RAM). Variables goes to any of STACK, HEAP, DATA and BSS segment while executable code goes to the CODE segment. Look at the general memory layout of a program
Now you can see that there are different memory segments for variables and instructions. They are stored at different memory locations. The function address is the address which is located at the CODE segment.
So, you are confusing the term first statement with first executable instruction. When function call is invoked, program counter is updated with the address of the function. Therefore, the function pointer points to the first instruction of the function stored in memory.
As you say, the address of function may be (it will depend on the system) the address of the first instruction of the function.
It is the answer. Instruction will not share address with variables in typical environment in which same address space is used for instructions and data.
If they share the samee address, instrution will be destroyed by assigning to the variables!
The address of a normal function is where the instructions start (if there is no vTable involved).
For the variables it depends:
Unless the function is inlined or optimized away.
Alright, this is going to be fun. We get to go from the extremely abstract concept of what a function pointer is in C++ all the way down to the assembly code level, and thanks to some of the particular confusions we're having, we even get to discuss stacks!
Let's start at the highly abstract side, because that's clearly the side of things you're starting from. you have a function
char** fun()
that you're playing with. Now, at this level of abstraction, we can look at what operations are permitted on function pointers:fun_ptr()
. The meaning of this is identical to calling whatever function is being pointed at.That's all they do at the abstract level. Underneath that, compilers are free to implement it however they see fit. If a compiler wanted to have a
FunctionPtrType
which is actually an index into some big table of every function in the program, they could.However, this is typically not how it is implemented. When compiling C++ down to assembly/machine code, we tend to take advantage of as many architecture-specific tricks as possible, to save runtime. On real life computers, there is almost always an "indirect jump" operation, which reads a variable (usually a register), and jumps over to begin executing the code that's stored at that memory address. Its almost univeral that functions are compiled into contiguous blocks of instructions, so if you ever jump to the first instruction in the block, it has the logical effect of calling that function. The address of the first instruction happens to satisfy every one of the comparisons required by C++'s abstract concept of a function pointer and it happens to be exactly the value the hardware needs to use an indirect jump to call the function! That's so convenient, that virtually every compiler chooses to implement it that way!
However, when we start talking about why the pointer you thought you were looking at was the same as the function pointer, we have to get into something a bit more nuanced: segments.
Static variables are stored separate from the code. There's a few reasons for that. One is that you want your code as tight as possible. You don't want your code speckled with the memory spaces to store variables. It'd be inefficient. You'd have to skip over all sorts of stuff, rather than just getting to plow through it. There's also a more modern reason: most computers allow you to mark some memory as "executable" and some "writable." Doing this helps tremendously for dealing with some really evil hacker tricks. We try to never mark something as both executable and writable at the same time, in case a hacker cleverly finds a way to trick our program into overwriting some of our functions with their own!
Accordingly, there is typically a
.code
segment (using that dotted notation simply because its a popular way to notate it in many architectures). In this segment, you find all of the code. The static data will go in somewhere like.bss
. So you may find your static string stored quite far away from the code that operates on it (typically at least 4kb away, because most modern hardware allows you to set execute or write permissions at the page level: pages are 4kb in lots of modern systems)Now the last piece... the stack. You mentioned storing things on the stack in a confusing way, which suggests it may be helpful to give it a quick going over. Let me make a quick recursive function, because they are more effective at demonstrating what is going on in the stack.
This function calculates the Fibonacci sequence using a rather inefficient but clear way of doing it.
We have one function,
fib
. This means&fib
is always a pointer to the same place, but we're clearly calling fib many times, so each one needs its own space right?On the stack we have what are called "frames." Frames are not the functions themselves, but rather they are sections of memory which this particular invocation of the function is allowed to use. Every time you call a function, like
fib
, you allocate a little more space on the stack for its frame (or, more pedantically, it will allocate it after you make the call).In our case,
fib(x)
clearly needs to store the result offib(x-1)
while executingfib(x-2)
. It can't store this in the function itself, or even in the.bss
segment because we don't know how many times it is going to get recursed. Instead, it allocates space on the stack to store its own copy of the result offib(x-1)
whilefib(x-2)
is operating in its own frame (using the exact same function, and the same function address). Whenfib(x-2)
returns,fib(x)
simply loads up that old value, which it is certain has not been touched by anyone else, adds the results, and returns it!How does it do this? Virtually every processor out there has some support for a stack in hardware. On x86, this is known as the ESP register (extended-stack pointer). Programs generally agree to treat this as a pointer to the next spot in the stack where you can start storing data. You're welcome to move this pointer around to build yourself space for a frame, and move in. When you finish executing, you are expected to move everything back.
In fact, on most platforms, the first instruction in your function is not the first instruction in the final compiled version. Compilers inject a few extra ops to manage this stack pointer for you, so that you never even have to worry about it. On some platforms, like x86_64, this behavior is often even mandatory and specified in the ABI!
So in all we have:
.code
segment - where your function's instructions are stored. The function pointer will point to the first instruction in here. This segment is typically marked "execute/read only," preventing your program from writing to it after it has been loaded..bss
segment - where your static data will get stored, because it can't be part of the "execute only".code
segment if it wants to be data.