I am using InstanceQuery , SQL queries, to construct my Instances. But my query results does not come in the same order always as it is normal in SQL. Beacuse of this Instances constucted from different SQL has different headers. A simple example can be seen below. I suspect my results changes because of this behavior.
Header 1
@attribute duration numeric
@attribute protocol_type {tcp,udp}
@attribute service {http,domain_u}
@attribute flag {SF}
Header 2
@attribute duration numeric
@attribute protocol_type {tcp}
@attribute service {pm_dump,pop_2,pop_3}
@attribute flag {SF,S0,SH}
My question is : How can I give correct header information to Instance construction.
Is something like below workflow is possible?
- get pre-prepared header information from arff file or another place.
- give instance construction this header information
- call sql function and get Instances (header + data)
I am using following sql function to get instances from database.
public static Instances getInstanceDataFromDatabase(String pSql
,String pInstanceRelationName){
try {
DatabaseUtils utils = new DatabaseUtils();
InstanceQuery query = new InstanceQuery();
query.setUsername(username);
query.setPassword(password);
query.setQuery(pSql);
Instances data = query.retrieveInstances();
data.setRelationName(pInstanceRelationName);
if (data.classIndex() == -1)
{
data.setClassIndex(data.numAttributes() - 1);
}
return data;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
I tried various approaches to my problem. But it seems that weka internal API does not allow solution to this problem right now. I modified weka.core.Instances append command line code for my purposes. This code is also given in this answer
According to this, here is my solution. I created a SampleWithKnownHeader.arff file , which contains correct header values. I read this file with following code.
After that , I use following code to create instances. I had to use StringBuilder and string values of instance, then I save corresponding string to file.
My save instances to file code is as below.
This solves my problem but I wonder if more elegant solution exists.
I solved a similar problem with the
Add
filter that allows adding attributes toInstances
. You need to add a correctAttibute
with proper list of values to both datasets (in my case - to test dataset only):Load train and test data:
Set a new attribute with
Add
filter:As a result, the headers of both "train" and "test" are the same (compatible for Weka machine learning)