Can you please let me know how to represent attribute or class for text classification in weka. By using what attribute can I do classification? word frequency or just word? What would be possible structure of ARFF format? Can you give me several lines of example of that structure?
Thank you very much in advance.
In weka, you can choose your own attribute. In this example, we only have 2 classes and all of the unique words are used as attributes. If you choose word frequency as your attribute, then you assign '2' if that word occurs twice in your text, and '0' if not, or '1' if that word occurs only once.
Here is the example .arff format.
One of the easiest alternatives is to start with an ARFF file for a two class problem like:
The text is represented as a String type and the class is a nominal with two values.
Then you could apply two filters:
You may find more info and other approaches to transform your data in this Weka wiki page: http://weka.wikispaces.com/Text+categorization+with+WEKA