Is it at all possible to parse C++ with incomplete declarations with clang with its existing libclang API ? I.e. parse .cpp file without including all the headers, deducing declarations on the fly. so, e.g. The following text:
A B::Foo(){return stuff();}
Will detect unknown symbol A, call my callback that deducts A is a class using my magic heuristic, then call this callback the same way with B and Foo and stuff. In the end I want to be able to infer that I saw a member Foo of class B returning A, and stuff is a function.. Or something to that effect. context: I wanna see if I can do sensible syntax highlighting and on the fly code analysis without parsing all the headers very quickly.
[EDIT] To clarify, I'm looking for very heavily restricted C++ parsing, possibly with some heuristic to lift some of the restrictions.
C++ grammar is full of context dependencies. Is Foo() a function call or a construction of a temporary of class Foo? Is Foo<Bar> stuff; a template Foo<Bar> instantiation and declaration of variable stuff, or is it weird-looking 2 calls to overloaded operator < and operator > ? It's only possible to tell in context, and context often comes from parsing the headers.
What I'm looking for is a way to plug my custom convention rules. E.g. I know that I don't overload Win32 symbols, so I can safely assume that CreateFile is always a function, and I even know its signature. I also know that all my classes start with a capital letter and are nouns, and functions are usually verbs, so I can reasonably guess that Foo and Bar are class names. In a more complex scenario, I know I don't write side-effect-free expressions like a < b > c; so I can assume that a is always a template instantiation. And so on.
So, the question is whether it's possible to use Clang API to call back every time it encounters an unknown symbol, and give it an answer using my own non-C++ heuristic. If my heuristic fails, then the parse fails, obviously. And I'm not talking about parsing Boost library :) I'm talking about very simple C++, probably without templates, restricted to some minimum that clang can handle in this case.
I know the question is fairly old, but have a look here :
It is a sub-project from clang-highlight, an (experimental?) tool which seems to be no longer developed.
I'm only interested in the fuzzy parsing part and forked it on my github page where I fixed several minor issues and made the tool autonomous (it can be compiled outside clang's source tree). Don't try to compile it with C++14 (which G++ 6's default mode), because there will be conflicts with
make_unique
.According to this page, clang-format has its own fuzzy parser (and is actively developed), but the parser was (is ?) more tighly coupled to the tool.
OP doesn't want "fuzzy parsing". What he wants is full context-free parsing of the C++ source code, without any requirement for name and type resolution. He plans to make educated guesses about the types based on the result of the parse.
Clang proper tangles parsing and name/type resolution, which means it must have all that background type information available when it parses. Other answers suggest a LibFuzzy that produces incorrect parse trees, and some fuzzy parser for clang-format about which I know nothing. If one insists on producing a classic AST, none of these solutions will produce the "right" tree in the face of ambiguous parses.
Our DMS Software Reengineering Toolkit with its C++ front end can parse C++ source without the type information, and produces accurate "ASTs"; these are actually abstract syntax dags where forks in trees represent different possible interpretations of the source code according to a language-precise grammar (ambiguous (sub)parses).
What Clang tries to do is avoid producing these multiple sub-parses by using type information as it parses. What DMS does is produce the ambiguous parses, and in (optional) post-parsing (attribute-grammar-evaluation) pass, collect symbol table information and eliminate the sub-parses which are inconsistent with the types; for well-formed programs, this produces a plain AST with no ambiguities left.
If OP wants to make heuristic guesses about the type information, he will need to know these possible interpretations. If they are eliminated in advance, he cannot straightforwardly guess what types might be needed. An interesting possibility is the idea of modifying the attribute grammar (provided in source form as part of the DMS's C++ front end), which already knows all of C++ type rules, to do this with partial information. That would be an enormous head start over building the heuristic analyzer from scratch, given that it has to know about 600 pages of arcane name and type resolution rules from the standard.
You can see examples of the (dag) produced by DMS's parser.
Unless you heavily restrict the code that people are allowed to write, it is basically impossible to do a good job of parsing C++ (and hence syntax highlighting beyond keywords/regular expressions) without parsing all the headers. The pre-processor is particularly good at screwing things up for you.
There are some thoughts on the difficulties of fuzzy parsing (in the context of visual studio) here which might be of interest: http://blogs.msdn.com/b/vcblog/archive/2011/03/03/10136696.aspx
Another solution which I think will suit more the OP than fuzzy parsing.
When parsing, clang maintains Semantic information through the Sema part of the analyzer. When encountering an unknown symbol, Sema will fallback to ExternalSemaSource to get some information about this symbol. Through this, you could implement what you want.
Here is a quick example how to set up it. It is not entirely functional (I'm not doing anything in the LookupUnqualified method), you might need to do further investigations and I think it is a good start.
The idea and the DynamicIDHandler class are from cling project where unknown symbols are variable (hence the comments and the code).