Is it possible to know in what language an exe fil

2020-02-21 16:19发布

问题:

I have an exe file and I decompiled it with Ida. I was told the program coded in Delphi, so I tried to decompile with DeDe but it failed, no output and no error. And I'm wondering is it possible to find language used in creating exe by trying different decompilers written specifically for a programming language? Or can they fail for some other reason?

回答1:

In many cases it is possible to identify the compiler used to compile the code, and from that, the original language.

Most language implementations include some kind of runtime library to implement various high-level operations of the language. For example, C has the CRT which implements file I/O operations (fopen, fread etc.), Delphi has compiler helpers for its string type (concatenation, assignment and others), ADA has various low-level functions to ensure language safety and so on. By comparing the code of the program and the runtime libraries of the candidate compilers you may be able to find a match.

IDA implements this approach in the FLIRT technology. By using the signatures, IDA is able to determine most of the major compilers for DOS and Windows. It's somewhat more difficult on Linux because there's no single provider of compiler binaries for it, so signatures would have to be made for every distro.

However, even without resorting to the runtime library code, it may be possible to identify the compiler used. Many compilers use very distinct idioms to represent various operations. For example, I was able to guess that the compiler used for the Duqu virus was Visual C++, which was later confirmed.



回答2:

Compilation is a lossy process, so it is not, in general, possible to decompile an executable (or other compiled program module, such as a .so or .dll) and recover source code in the original language or even unambiguously determine what the original language was. It is not even necessarily the case that there is only one original source-code language, since it is possible that, before linking, different modules were written in different languages. Ordinarily, you can, disassemble a binary and recover assembly language, although that may be of very limited value.

In many cases, you can tell something about the original language provided that the binary has not been stripped (of symbols). For example, you can usually tell if a binary was originally written in C++ by looking at the symbols in the binary (on Linux, using objdump, no idea what the equivalent might be on Windows): C++ symbols are mangled in a particular way. It's not a 100% guarantee, but a high likelihood.

That said, some decompilers do a pretty reasonable job of a very difficult task. Inferring likely high-level constructs from a binary is not easy. In my (very limited) experience, they tend to work for fairly trivial programs or for software compiled with a narrow range of versions of the original compiler, but choke on anything substantial: it's very difficult for the author of a decompiler to keep up with changes in the compilers, and there may be very little incentive for her to do so.

Even in cases where decompilation is very successful, the result is essentially completely uncommented code with meaningless variable names that is extremely difficult to understand. Decompilation is one thing, extracting the intended semantic meaning from the result is another. Remember that many variables, branches, loops, and functions will have been completely optimized away, many functions will have been inlined, etc. So the “source code”, even if you can obtain it in this way, may not be a whole lot of use to you.