How much time would it take to write a C++ compile

2019-02-04 08:49发布

How much time would it take to write a C++ compiler using lex/yacc?

Where can I get started with it?

13条回答
做自己的国王
2楼-- · 2019-02-04 09:02

A long time, and lex and yacc won't help

If you have the skills to write a compiler for such a large language, you will not need the small amount of help that lex and yacc give you. In fact, while lex is OK it may take longer to use yacc, as it's not really quite powerful enough for C or C++, and you can end up spending far more time getting it to work right than it would take to just write a recursive descent parser.

I believe lex and yacc are best used for simple grammars, or when it is worth the extra effort to have a nicely readable grammar file, perhaps because the grammar is experimental and subject to change.

For that matter, the entire parser is possibly not the major part of your job, depending on exactly what goals you have for the code generator.

查看更多
我想做一个坏孩纸
3楼-- · 2019-02-04 09:03

Well, what do you mean by write a compiler?

I doubt any one guy has made a true C++ compiler that took it down all the way to assembly code, but I have used lex and yacc to make a C compiler and I have done it without.

Using both you can make a compiler that leaves out the semantics in a couple days, but figuring out how to use them can take weeks or months easily. Figuring out how to make a compiler at all will take weeks or months no matter what, but the figure I remember is once you know how it works it took a few days with lex and yacc and a few weeks without but the second had better results and fewer bugs so really it's questionable whether they are worth using at all.

The 'semantics' is the actual code production. That can be very simple code that's just enough to work and might not take long at all, or you could spend your whole life doing optimization on it.

With C++ the big issue is templates, but there's so many little issues and rules I can't imagine someone ever wanting to do this. Even if you DO finish, the problem is you won't necessarily have binary compatibility ie be able to be recognized as a runnable program by a linker or the OS because there's more to it than just C++ and its hard to pin down standard but there's also yet more standards to worry about which are even less widely available.

查看更多
戒情不戒烟
4楼-- · 2019-02-04 09:05

Unless you have already written several other compilers; C++ is not a language you even want to start writing a compiler from scratch for, the language has a lot of places were the meaning requires a lot of context before the situation can be disambiguated.

Even if you have lots of experience writing compilers you are looking at several years for a team of developers. This is just to parse the code correctly into an intermediate format. Writing the backend to generate code is yet another specialized task (though you could steal the gcc backend).

If you do a google for "C++ grammars" there are a couple around to get you started.

C++ LEX  Tokens:   http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxLexer.l
C++ YACC Grammer:  http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxGrammar.y
                   http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxTester.y
查看更多
我命由我不由天
5楼-- · 2019-02-04 09:08

It will probably take you years, and you'll probably switch to some other parser generator in the process.

Parsing C++ is notoriously error-prone. The grammar is not fully LR-parsable, as many parts are context-sensitive. You won't be able to get it working right in flex/yacc, or at least it'll be really awkward to implement. There are only two front-ends I know of that get it right. Your best bet is to use one of these and focus on writing the back-end. That's where the interesting stuff is anyway :-).

Existing C++ Front Ends:

  1. The EDG front-end is used by most of the commercial vendors (Intel, Portland Group, etc.) in their compilers. It costs money, but it's very thorough. People pay big bucks for it because they don't want to deal with the pain of writing their own C++ parser.

  2. GCC's C++ front-end is thorough enough for production code, but you'd have to figure out how to integrate this into your project. I believe it's fairly involved to separate it from GCC. This would also be GPL, but I'm not sure whether that's a problem for you. You can use the GCC front-end in your project via gcc_xml, but this will only give you XML for classes, functions, namespaces, and typedefs. It won't give you a syntax tree for the code.

  3. Another possibility is to use clang, but their C++ support is currently spotty. It'll be nice to see them get all the bugs out, but if you look at their C++ status page you'll notice there are more than a few test cases that still break. Take heed -- clang is a big project. If it's taking these guys years to implement a C++ front-end, it's going to take you longer.

  4. Others have mentioned ANTLR, and there is a C++ grammar available for it, but I'm skeptical. I haven't heard of an ANTLR front end being used in any major compilers, though I do believe it's used in the NetBeans IDE. It might be suitable for an IDE, but I'm skeptical that you'd be able to use it on production code.

查看更多
欢心
6楼-- · 2019-02-04 09:14

Firstly, the "flex" tag on SO is about Adobe's product, not the lexer generator. Secondly, Bjarne Stroustrup is on record as saying he wished he had implemented Cfront (the first C++ compiler) using recursive descent rather than a table driven tool. And thirdly, to answer your question directly - lots. If you feel you need to write one, take a look at ANTLR - not my favourite tool, but there are already C++ parsers for it.

查看更多
Melony?
7楼-- · 2019-02-04 09:15

Recursive decent is a good choice to parse C++. GCC and clang use it.

The Elsa parser (and my ellcc compiler) use the Elkhound GLR compiler generator.

In either case, writing a C++ compiler is a BIG job.

查看更多
登录 后发表回答