Adding an instance to Instances in weka

2019-02-07 13:46发布

问题:

I have a few arff files. I would like to read them sequentially and create a large dataset. Instances.add(Instance inst) doesn't add string values to the instances, hence the attempt to setDataset() ... but even this fails. Is there a way to accomplish the intuitively correct thing for strings?

                ArffLoader arffLoader = new ArffLoader();
                arffLoader.setFile(new File(fName));
                Instances newData = arffLoader.getDataSet();
                for (int i = 0; i < newData.numInstances(); i++) {
                        Instance one = newData.instance(i);
                        one.setDataset(data);
                        data.add(one);
                }

回答1:

This is from mailing list. I saved it before

how to merge two data file a.arff and b.arff into one data list?

Depends what merge you are talking about. Do you just want to append the second file (both have the same attributes) or do you want to add the merge the attributes (both have the same number of instances)?

In the first case ("append"): 
java weka.core.Instances append filename1 filename2 > output-file 

and the latter case ("merge"): 
java weka.core.Instances merge filename1 filename2 > output-file 

Here's the relevant Javadoc: http://weka.sourceforge.net/doc.dev/weka/core/Instances.html#main(java.lang.String[])

Use mergeInstances to merge two datasets.

 public static Instances mergeInstances(Instances first,
                                   Instances second)

Your code would be something like below. For same instance numbers.

ArffLoader arffLoader = new ArffLoader();
arffLoader.setFile(new File(fName1));
Instances newData1 = arffLoader.getDataSet();
arffLoader.setFile(new File(fName2));
Instances newData2 = arffLoader.getDataSet();
Instances mergedData = Instances.mergeInstances( newData1 ,newData2);       

Your code would be something like below. For same attribute numbers. I do not see any java method in weka. If you read code there is something like below.

// Instances.java
//  public static void main(String[] args) {
// read two files, append them and print result to stdout
  else if ((args.length == 3) && (args[0].toLowerCase().equals("append"))) {
DataSource source1 = new DataSource(args[1]);
DataSource source2 = new DataSource(args[2]);
String msg = source1.getStructure().equalHeadersMsg(source2.getStructure());
if (msg != null)
  throw new Exception("The two datasets have different headers:\n" + msg);
Instances structure = source1.getStructure();
System.out.println(source1.getStructure());
while (source1.hasMoreElements(structure))
  System.out.println(source1.nextElement(structure));
structure = source2.getStructure();
while (source2.hasMoreElements(structure))
  System.out.println(source2.nextElement(structure));
  }


标签: java weka