What would be the best way to detect what programming language is used in a snippet of code?
相关问题
- Is divide by zero an error or an exception?
- How do you measure the popularity of a programming
- When is post-decrement/increment vs. pre-decrement
- How do different languages implement sorting in th
- How programs written in interpreted languages are
相关文章
- Are there any reasons not to use “this” (“Self”, “
- Call/Return feature of classic C++(C with Classes)
- What are the major differences between C and C++ a
- Implement a mutex in Java using atomic variables
- A common set of problems to learn new languages
- How Does Calling Work In Python?
- What language can a junior programmer implement an
- How do you create a file format?
First, I would try to find the specific keyworks of a language e.g.
Language detection solved by others:
Ohloh's approach: https://github.com/blackducksw/ohcount/
Github's approach: https://github.com/github/linguist
Best solution I have come across is using the linguist gem in a Ruby on Rails app. It's kind of a specific way to do it, but it works. This was mentioned above by @nisc but I will tell you my exact steps for using it. (Some of the following command line commands are specific to ubuntu but should be easily translated to other OS's)
If you have any rails app that you don't mind temporarily messing with, create a new file in it to insert your code snippet in question. (If you don't have rails installed there's a good guide here although for ubuntu I recommend this. Then run
rails new <name-your-app-dir>
and cd into that directory. Everything you need to run a rails app is already there).After you have a rails app to use this with, add
gem 'github-linguist'
to your Gemfile (literally just calledGemfile
in your app directory, no ext).Then install ruby-dev (
sudo apt-get install ruby-dev
)Then install cmake (
sudo apt-get install cmake
)Now you can run
gem install github-linguist
(if you get an error that says icu required, dosudo apt-get install libicu-dev
and try again)(You may need to do a
sudo apt-get update
orsudo apt-get install make
orsudo apt-get install build-essential
if the above did not work)Now everything is set up. You can now use this any time you want to check code snippets. In a text editor, open the file you've made to insert your code snippet (let's just say it's
app/test.tpl
but if know the extension of your snippet, use that instead of.tpl
. If you don't know the extension, don't use one). Now paste your code snippet in this file. Go to command line and runbundle install
(must be in your application's directory). Then runlinguist app/test.tpl
(more generallylinguist <path-to-code-snippet-file>
). It will tell you the type, mime type, and language. For multiple files (or for general use with a ruby/rails app) you can runbundle exec linguist --breakdown
in your application's directory.It seems like a lot of extra work, especially if you don't already have rails, but you don't actually need to know ANYTHING about rails if you follow these steps and I just really haven't found a better way to detect the language of a file/code snippet.
I believe that there is no single solution that could possibly identify what language a snippet is in, just based upon that single snippet. Take the keyword
print
. It could appear in any number of languages, each of which are for different purposes, and have different syntax.I do have some advice. I'm currently writing a small piece of code for my website that can be used to identify programming languages. Like most of the other posts, there could be a huge range of programming languages that you simply haven't heard, you can't account for them all.
What I have done is that each language can be identified by a selection of keywords. For example, Python could be identified in a number of ways. It's probably easier if you pick 'traits' that are also certainly unique to the language. For Python, I choose the trait of using colons to start a set of statements, which I believe is a fairly unique trait (correct me if I'm wrong).
If, in my example, you can't find a colon to start a statement set, then move onto another possible trait, let's say using the
def
keyword to define a function. Now this can causes some problems, because Ruby also uses the keyworddef
to define a function. The key to telling the two (Python and Ruby) apart is to use various levels of filtering to get the best match. Ruby use the keywordend
to finish a function, whereas Python doesn't have anything to finish a function, just a de-indent but you don't want to go there. But again,end
could also be Lua, yet another programming language to add to the mix.You can see that programming languages simply overlay too much. One keyword that could be a keyword in one language could happen to be a keyword in another language. Using a combination of keywords that often go together, like Java's
public static void main(String[] args)
helps to eliminate those problems.Like I've already said, your best chance is looking for relatively unique keywords or sets of keywords to separate one from the other. And, if you get it wrong, at least you had a go.
It's very hard and sometimes impossible. Which language is this short snippet from?
(Hint: It could be any one out of several.)
You can try to analyze various languages and try to decide using frequency analysis of keywords. If certain sets of keywords occur with certain frequencies in a text it's likely that the language is Java etc. But I don't think you will get anything that is completely fool proof, as you could name for example a variable in C the same name as a keyword in Java, and the frequency analysis will be fooled.
If you take it up a notch in complexity you could look for structures, if a certain keyword always comes after another one, that will get you more clues. But it will also be much harder to design and implement.
It would depend on what type of snippet you have, but I would run it through a series of tokenizers and see which language's BNF it came up as valid against.