Does Symbol table for C++ code contain function na

2019-02-05 08:02发布

I have been searching through various posts regarding whether symbol table for a C++ code contains functions' name along with the class name. Something which i could find on a post is that it depends on the type of compiler,

if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table

but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.

I could not understand whether it is actually compiler dependent or not? I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes? I don't have such a great/deep knowledge. Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?

3条回答
对你真心纯属浪费
2楼-- · 2019-02-05 08:45

Short answer: yes, using 'nm --demangle' on linux

Long answer: The functions in the symbol table contain the function name plus the return value and if it is belongs to a class, the class name too. But the names,types (not always) and classes are not written with it's fulls names to use less space. This strings called demangle. But you know that this short name is unique and you can parse the full class name from it. To view the symbol table of your program you can use 'nm' on linux.

http://linux.about.com/library/cmd/blcmdl1_nm.htm

It got the --demangle flag to view the original names. You can compile random short programs to see what comes out.

查看更多
仙女界的扛把子
3楼-- · 2019-02-05 08:49

A symbol table maps names to constructs within the program. As such it is used to record the names of classes, functions, variables, and anything else that has a user-specified name within the program.

(There are two common kinds of symbol table - one that the compiler maintains when it is compiling your program, and another that exists in object file so that it can be linked to other objects. The two are strongly related, but need not have similar representation internally. Typically only some of the symbols from the compiler's symbol table will be output into the object).

Part of what you say makes no sense:

if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table

How can the compiler determine to what construct a name refers if it cannot look it up in the symbol table?

but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.

There's no reason it could not do this in a single pass.

I could not understand whether it is actually compiler dependent or not?

All compilers are going to use a symbol table, but its use will be hidden inside the implementation.

I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes?

How is what dependent on the passes? All names go in the symbol table - that's what it's for - and usually symbol resolution is important for just about everything else the compiler does, so it needs to be done early (i.e. in the first pass - and in fact the main purpose of the first pass in a multi-pass compiler compiler may well be just to build the symbol table!).

Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?

I'll give it a stab:

class A
{
    int a;
    void f(int, int);
};

Will yield a symbol table containing symbols "A", "a", and "f". Typically "a" and "f" would be marked with a scope to simplify lookup, eg:

"A"  -> (class)
"A::a"  ->  (class variable member)
"A::f(int,int)"  ->  (class function member)

It's also possible that the a and f symbols will not be stored in the top-level symbol table, but rather that each name space (including C++ namespaces and classes) will have its own symbol table, containing the symbols defined inside it. But this is, arguably, just a data structure choice. You can still abstractly view the symbol table as a flat table, where a name maps to a construct.

In general the "A::a" symbol would not be output to the object file, since it is not required for linking.

查看更多
不美不萌又怎样
4楼-- · 2019-02-05 08:52

Most compiler textbooks will tell you about symbol tables, and often show you details about a modest complexity langauge such as Pascal. You won't find information about C++ symbol tables in a textbook; it is too arcane.

We offer a complete C++14 front end for our DMS Software Reengineering Toolkit. It parses C++, builds detailed ASTs, and performs name-and-type resolution, which includes building a precise symbol table.

What follows are slides from our tutorial on how to use DMS, focused on the C++ symbol table structures.

OP asked specifically for a view of what happens with classes. The following diagram shows this for the tiny C++ program in the upper left corner. The rest of the diagram shows boxes, which represent what we call "symbol spaces" (or "scopes"), which are essentially hash tables mapping symbol names (each box lists the symbols it owns) to the information that DMS knows about that symbol (source file location of definition, list of AST nodes that reference the definition, and a complex union that represents the type, and that may in turn point to other types). The arrows show how symbol spaces are connected; an arrow from space A to space B means "scope A is contained within scope B". Typically the symbol space lookup process, searching scope A for a symbol x, will continue the search in scope B if x is not found in A. You'll note the arrows are numbered with an integer; this tells the search machinery to look in the least-numbered parent scope first, before trying to search scopes using arrows with larger numbers. This is how scopes are ordered (note Class C inherits from A and B; any lookup of a field in class C such as "b" will be forced to first look in the scope for A, and then in the scope for B. In this way, the C++ lookup rules are achieved.

Note the the class names are recorded in the (unique) global namespace because they is declared at top level. If they had been defined in some explicit namespace, then the namespace would have a corresponding symbol space of its own that recorded the declared classes, and the namespace itself would be recorded in the global symbol space.

C++ Symbol Table: Class Perspective

OP did not ask what the symbol table looks like for function bodies, but I just so happen to have an illustrative slide for that that, too, below. The symbol spaces work the same way. What is shown in this slide is the linkage between a symbol space, and the scoped region it represents. That linkage is actually implemented by a pointer associated with the symbol space, to the corresponding AST(s, namespace definitions can be scattered around in multiple places).

Note that in this case, the function name is recorded in the global namespace because it is declared at top level. If it had been defined inside the scope of a class, the function name would have been recorded in the symbol space for the class body (on previous diagram).

C++ Symbol Table: Function Perspective

As a general rule, the details of how the symbol table is organized is completely dependent on the compiler, and the choices the designers made. In our case, we designed a very general symbol table management package because we planned (and have) used the same package to handle multiple languages (C, C++, Java, COBOL, several legacy languages) in a uniform way. However, the abstract structures of symbol spaces and inheritance will have to implemented in essentially equivalent ways across C++ compilers; after all, they have to model the same information. I'd expect similar structures in the GCC and Clang compilers (well, the integer-numbered inheritance arcs, maybe not :)

As a practical matter, it doesn't matter how many "passes" your compiler has. It pretty much has to build these structures to remember what it knows about the symbols, within a pass, and across passes.

While building a C++ parser is very hard by itself, building such a symbol table is much harder. The effort dwarfs the effort to build the C++ parser. Our C++ name resolver is some 250K SLOC of attribute-grammar code compiled and executed by DMS. Getting the details rights is an enormous headache; the C++ reference manual is enormous, confusing, the facts are scattered everywhere across the document, and in a variety of places it is contradictory (we try to send complaints about this to the committee) and or inconsistent between compilers (we have versions for GCC and Visual Studio 201x).

Update March 2017: Now have symbol tables for C++2014. Update June 2018: Now have symbol tables for C++2017.

查看更多
登录 后发表回答