What would be the best way to detect what programming language is used in a snippet of code?
相关问题
- Is divide by zero an error or an exception?
- How do you measure the popularity of a programming
- When is post-decrement/increment vs. pre-decrement
- How do different languages implement sorting in th
- How programs written in interpreted languages are
相关文章
- Are there any reasons not to use “this” (“Self”, “
- Call/Return feature of classic C++(C with Classes)
- What are the major differences between C and C++ a
- Implement a mutex in Java using atomic variables
- A common set of problems to learn new languages
- How Does Calling Work In Python?
- What language can a junior programmer implement an
- How do you create a file format?
Prettify is a Javascript package that does an okay job of detecting programming languages:
http://code.google.com/p/google-code-prettify/
It is mainly a syntax highlighter, but there is probably a way to extract the detection part for the purposes of detecting the language from a snippet.
Nice puzzle.
I think it is imposible to detect all languages. But you could trigger on key tokens. (certain reserved words and often used character combinations).
Ben there are a lot of languages with similar syntax. So it depends on the size of the snippet.
Set up the random scrambler like
Guesslang is a possible solution:
http://guesslang.readthedocs.io/en/latest/index.html
There's also SourceClassifier:
https://github.com/chrislo/sourceclassifier/tree/master
I became interested in this problem after finding some code in a blog article which I couldn't identify. Adding this answer since this question was the first search hit for "identify programming language".
You might find some useful material here: http://alexgorbatchev.com/wiki/SyntaxHighlighter. Alex has spent a lot of time figuring out how to parse a large number of different languages, and what the key syntax elements are.
An alternative is to use highlight.js, which performs syntax highlighting but uses the success-rate of the highlighting process to identify the language. In principle, any syntax highlighter codebase could be used in the same way, but the nice thing about highlight.js is that language detection is considered a feature and is used for testing purposes.
UPDATE: I tried this and it didn't work that well. Compressed JavaScript completely confused it, i.e. the tokenizer is whitespace sensitive. Generally, just counting highlight hits does not seem very reliable. A stronger parser, or perhaps unmatched section counts, might work better.