How to create an ARFF file from an array in java?

2019-04-15 17:36发布

问题:

I want to get the coefficients of a weighted linear regression of an x-y pair represented by two arrays in java. I have zeroed in on weka, but it is asking an 'Instances' class object in the 'LinearRegression' class. To create an 'Instances' class file, an ARFF file is needed which contains the data. I have come across solutions that use the FastVector class but that has now been deprecated in the latest weka version. How do I create an ARFF file for the x-y pair and the corresponding weights all represented by arrays in java?

Here's my code based on Baz's answer. It's giving an exception on the last line "lr.buildClassifier(newDataset)" - Thread [main] (Suspended (exception UnassignedClassException))
Capabilities.testWithFail(Instances) line: 1302 . Here's the code -

public static void test() throws Exception
{
    double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0}, {19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}};

    int numInstances = data[0].length;

    ArrayList<Attribute> atts = new ArrayList<Attribute>();
    List<Instance> instances = new ArrayList<Instance>();
    for(int dim = 0; dim < 2; dim++)
    {
        Attribute current = new Attribute("Attribute" + dim, dim);

        if(dim == 0)
        {
            for(int obj = 0; obj < numInstances; obj++)
            {
                instances.add(new SparseInstance(numInstances));
            }
        }

        for(int obj = 0; obj < numInstances; obj++)
        {
            instances.get(obj).setValue(current, data[dim][obj]);
            //instances.get(obj).setWeight(weights[obj]);
        }
        atts.add(current);
    }

    Instances newDataset = new Instances("Dataset", atts, instances.size());

    for(Instance inst : instances)
        newDataset.add(inst);

    LinearRegression lr = new LinearRegression();

    lr.buildClassifier(newDataset);             
}

回答1:

I think this might help you:

FastVector atts = new FastVector();
List<Instance> instances = new ArrayList<Instance>();
for(int dim = 0; dim < numDimensions; dim++)
{
    // Create new attribute / dimension
    Attribute current = new Attribute("Attribute" + dim, dim);
    // Create an instance for each data object
    if(dim == 0)
    {
        for(int obj = 0; obj < numInstances; obj++)
        {
            instances.add(new SparseInstance(numDimensions));
        }
    }

    // Fill the value of dimension "dim" into each object
    for(int obj = 0; obj < numInstances; obj++)
    {
        instances.get(obj).setValue(current, data[dim][obj]);
    }

    // Add attribute to total attributes
    atts.addElement(current);
}

// Create new dataset
Instances newDataset = new Instances("Dataset", atts, instances.size());

// Fill in data objects
for(Instance inst : instances)
    newDataset.add(inst);

Afterwards Instances is you dataset.

Note: The current version (3.6.8) of Weka did not complain, even though I used FastVector.

However, for the Developer version (3.7.7), use this:

ArrayList<Attribute> atts = new ArrayList<Attribute>();
List<Instance> instances = new ArrayList<Instance>();
for(int dim = 0; dim < numDimensions; dim++)
{
    Attribute current = new Attribute("Attribute" + dim, dim);
    if(dim == 0)
    {
        for(int obj = 0; obj < numInstances; obj++)
        {
            instances.add(new SparseInstance(numDimensions));
        }
    }

    for(int obj = 0; obj < numInstances; obj++)
    {
        instances.get(obj).setValue(current, data[dim][obj]);
    }

    atts.add(current);
}

Instances newDataset = new Instances("Dataset", atts, instances.size());

for(Instance inst : instances)
    newDataset.add(inst);


回答2:

You want to construct an Instances object, that class overrides toString() to output in ARFF format. If FastVector is deprecated you could just use Vector.