Finding the address range of the data segment

2019-01-06 13:41发布

问题:

As a programming exercise, I am writing a mark-and-sweep garbage collector in C. I wish to scan the data segment (globals, etc.) for pointers to allocated memory, but I don't know how to get the range of the addresses of this segment. How could I do this?

回答1:

The bounds for text (program code) and data for linux (and other unixes):

#include <stdio.h>
#include <stdlib.h>

/* these are in no header file, and on some
systems they have a _ prepended 
These symbols have to be typed to keep the compiler happy
Also check out brk() and sbrk() for information
about heap */

extern char  etext, edata, end; 

int
main(int argc, char **argv)
{
    printf("First address beyond:\n");
    printf("    program text segment(etext)      %10p\n", &etext);
    printf("    initialized data segment(edata)  %10p\n", &edata);
    printf("    uninitialized data segment (end) %10p\n", &end);

    return EXIT_SUCCESS;
}

Where those symbols come from: Where are the symbols etext ,edata and end defined?



回答2:

If you're working on Windows, then there are Windows API that would help you.

//store the base address the loaded Module
dllImageBase = (char*)hModule; //suppose hModule is the handle to the loaded Module (.exe or .dll)

//get the address of NT Header
IMAGE_NT_HEADERS *pNtHdr = ImageNtHeader(hModule);

//after Nt headers comes the table of section, so get the addess of section table
IMAGE_SECTION_HEADER *pSectionHdr = (IMAGE_SECTION_HEADER *) (pNtHdr + 1);

ImageSectionInfo *pSectionInfo = NULL;

//iterate through the list of all sections, and check the section name in the if conditon. etc
for ( int i = 0 ; i < pNtHdr->FileHeader.NumberOfSections ; i++ )
{
     char *name = (char*) pSectionHdr->Name;
     if ( memcmp(name, ".data", 5) == 0 )
     {
          pSectionInfo = new ImageSectionInfo(".data");
          pSectionInfo->SectionAddress = dllImageBase + pSectionHdr->VirtualAddress;

          **//range of the data segment - something you're looking for**
          pSectionInfo->SectionSize = pSectionHdr->Misc.VirtualSize;
          break;
      }
      pSectionHdr++;
}

Define ImageSectionInfo as,

struct ImageSectionInfo
{
      char SectionName[IMAGE_SIZEOF_SHORT_NAME];//the macro is defined WinNT.h
      char *SectionAddress;
      int SectionSize;
      ImageSectionInfo(const char* name)
      {
            strcpy(SectioName, name); 
       }
};

Here's a complete, minimal WIN32 console program you can run in Visual Studio that demonstrates the use of the Windows API:

#include <stdio.h>
#include <Windows.h>
#include <DbgHelp.h>
#pragma comment( lib, "dbghelp.lib" )

void print_PE_section_info(HANDLE hModule) // hModule is the handle to a loaded Module (.exe or .dll)
{
   // get the location of the module's IMAGE_NT_HEADERS structure
   IMAGE_NT_HEADERS *pNtHdr = ImageNtHeader(hModule);

   // section table immediately follows the IMAGE_NT_HEADERS
   IMAGE_SECTION_HEADER *pSectionHdr = (IMAGE_SECTION_HEADER *)(pNtHdr + 1);

   const char* imageBase = (const char*)hModule;
   char scnName[sizeof(pSectionHdr->Name) + 1];
   scnName[sizeof(scnName) - 1] = '\0'; // enforce nul-termination for scn names that are the whole length of pSectionHdr->Name[]

   for (int scn = 0; scn < pNtHdr->FileHeader.NumberOfSections; ++scn)
   {
      // Note: pSectionHdr->Name[] is 8 bytes long. If the scn name is 8 bytes long, ->Name[] will
      // not be nul-terminated. For this reason, copy it to a local buffer that's nul-terminated
      // to be sure we only print the real scn name, and no extra garbage beyond it.
      strncpy(scnName, (const char*)pSectionHdr->Name, sizeof(pSectionHdr->Name));

      printf("  Section %3d: %p...%p %-10s (%u bytes)\n",
         scn,
         imageBase + pSectionHdr->VirtualAddress,
         imageBase + pSectionHdr->VirtualAddress + pSectionHdr->Misc.VirtualSize - 1,
         scnName,
         pSectionHdr->Misc.VirtualSize);
      ++pSectionHdr;
   }
}

// For demo purpopses, create an extra constant data section whose name is exactly 8 bytes long (the max)
#pragma const_seg(".t_const") // begin allocating const data in a new section whose name is 8 bytes long (the max)
const char const_string1[] = "This string is allocated in a special const data segment named \".t_const\".";
#pragma const_seg() // resume allocating const data in the normal .rdata section

int main(int argc, const char* argv[])
{
   print_PE_section_info(GetModuleHandle(NULL)); // print section info for "this process's .exe file" (NULL)
}

This page may be helpful if you're interested in additional uses of the DbgHelp library.

You can read the PE image format here, to know it in details. Once you understand the PE format, you'll be able to work with the above code, and can even modify it to meet your need.

  • PE Format

Peering Inside the PE: A Tour of the Win32 Portable Executable File Format

An In-Depth Look into the Win32 Portable Executable File Format, Part 1

An In-Depth Look into the Win32 Portable Executable File Format, Part 2

  • Windows API and Structures

IMAGE_SECTION_HEADER Structure

ImageNtHeader Function

IMAGE_NT_HEADERS Structure

I think this would help you to great extent, and the rest you can research yourself :-)

By the way, you can also see this thread, as all of these are somehow related to this:

Scenario: Global variables in DLL which is used by Multi-threaded Application



回答3:

Since you'll probably have to make your garbage collector the environment in which the program runs, you can get it from the elf file directly.



回答4:

Load the file that the executable came from and parse the PE headers, for Win32. I've no idea about on other OSes. Remember that if your program consists of multiple files (e.g. DLLs) you may have multiple data segments.



回答5:

For iOS you can use this solution. It shows how to find the text segment range but you can easily change it to find any segment you like.