Just to give you some context, here's what I'm trying to achieve:
I am embedding a const char* in a shared object file in order to have a version string in the .so file itself. I am doing data analysis and this string enables me to let the data know which version of the software produced it. This all works fine.
The issue I am having is when I try to read the string out of the .so library directly. I tried to use
nm libSMPselection.so | grep _version_info
and get
000000000003d968 D __SMPselection_version_info
this is all fine and as expected (the char* is called _SMPselection_version_info).
However I would have expected to now be able to open the file, seek to 0x3d968 and start reading my string, but all I get is garbage.
When I open the .so file and simply search for the contents of the string (I know how it starts), I can find it at address 0x2e0b4. At this address it's there, zero terminated and as expected. (I am using this method for now.)
I am not a computer scientist. Could someone please explain to me why the symbol value shown by nm isn't correct, or differently, what is the symbol value if it isn't the address of the symbol?
(By the way I am working on a Mac with OSX 10.7)
Nobody suggested the simplest way: Do a binary that dynamically loads your lib (give it the name on the command line) and does dlsym() for your symbol (or it can get that on the command line too) cast it to string pointer and prints it to stdout.
Assuming its an ELF or similarily structured binary, you have to take into account the address where stuff is loaded, which is influenced by things in the ELF header.
Using objdump -Fd
on your binary, you can have the disassembler also show the exact file offset of a symbol.
Using objdump -x
you can find this loader address, usually 0x400000 for standard linux executables.
The next thing you have to be careful with is to see if its an indirect string, this you can do most easily by using objdump -g
. When the string is found as being an indirect string, at the position output by objdump -Fd
you will not find the string, but the address. From this you need to subtract the loader address again. Let me show you an example for one of my binaries:
objdump -Fd BIN | grep VersionString
45152f: 48 8b 1d 9a df 87 00 mov 0x87df9a(%rip),%rbx # ccf4d0 <acVersionString> (File Offset: 0x8cf4d0)
objdump -x BIN
...
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**12
...
So we look at 0x8cf4d0 in the file and find in the hexeditor:
008C:F4D0 D8 C1 89 00 00 00 00 00 01 00 00 00 FF FF FF FF
So we take the 0x89C1D8 there, subtract 0x400000 and have 0x49c1d8 and when we look there in the hexeditor we find:
0049:C1D0 FF FF 7F 7F FF FF 7F FF 74 72 75 6E 6B 5F 38 30
0049:C1E0 34 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Which means "trunk_8043".
YMMV, especially when its some other file format, but that is the general way on how these things are structured, with lots of warts and details that deviate for special cases.
On Linux you have the 'strings' command which help you extract strings from binaries.
http://linux.about.com/library/cmd/blcmdl1_strings.htm
In HPUX (and I think in other Unix flavors too) there's a similar command called 'what'. It extracts only strings that start with "@(#)", but if you control the content of the string this is not a problem.
Why would you expect the offset displayed by nm
to be the offset in
the .so
file? .so
files are not simply memory images; they contain
a lot of other information as well, and have a more or less complicated
format. Under Unix (at least under most Unices), shared objects use the
elf format. To find the information, you will have to interpret the
various fields in the file, to find where the symbol you want is
located, in which segment, and where that segment starts in the file.
(You can probably find a library which will simplify reading them.)
Also, if you are correct in saying that you've embedded a char const*
,
i.e. that your code contained something like:
char const* version = "...";
then the address or offset of version
is the address or offset of the
pointer, not the string data it is pointed to. Defining it as:
char const version[] = "...";
will solve this.
Finally, the simplest solution might be to just make sure that the
string has some highly identifiable pattern, and scan the entire file
linearly looking for this pattern.