weka batch filtering StringToWordVector

2019-08-06 10:18发布

问题:

I'm trying to use Weka for text classification. I have two ARFF files:

One for the training set (example of row in data):

"mouse",no,no,no,no,no,yes,no

and another one for test set (example of row in data:)

"cat",?,?,?,?,?,?,?

They have the same attribute declaration. But if I use batch filtering it tells me "Input file formats differ". Why?

Here is the command that I use:

C:\Programmi\Weka-3-6>java -cp C:\Programmi\Weka-3-6\weka.jar 
  weka.filters.unsupervised.attribute.StringToWordVector -b -i test1.arff
  -o output_training.arff -c last -r tent.arff -s output_tent.arff
  -R -O -C -T -I -N 0 -M 1

Here you are the headers: 1) training

@RELATION tent

@Attribute text                 string
@Attribute politica             {yes,no}
@Attribute sports               {yes,no}
@Attribute cinema/tv/musica     {yes,no}
@Attribute stato_personale      {yes,no}
@Attribute moda/stile           {yes,no}
@Attribute conversazione        {yes,no}
@Attribute attualità            {yes,no}

2)test

@RELATION test

@Attribute text                 string
@Attribute politica             {yes,no}
@Attribute sports               {yes,no}
@Attribute cinema/tv/musica     {yes,no}
@Attribute stato_personale      {yes,no}
@Attribute moda/stile           {yes,no}
@Attribute conversazione        {yes,no}
@Attribute attualità            {yes,no}

I also tried to set the same @RELATION name in both but it does the same error. Separately the two files work ok and I can perform the StringToWordVector correctly. Thanks again