I need to parse the following YAML file.
arguments:
- Database
- Fold
- MetaFeature
- Algorithm
- Config
processes:
- id: MetaFeatureCalculator
command: "python metaFeatCalc.py {Database} folds/{Fold} de/{MetaFeature}/{Algorithm}.csv"
in: [Database, Fold]
out: [MetaFeature, Algorithm]
log: "mf/{Fold}/{MetaFeature}.out"
- id: Tunner
command: "java -jar tunner.jar {MetaFeature} alg/{Algorithm} {config}"
in: [Metafeature, Algorithm]
out: [Config]
log: "mf/{Metafeature}/{Algorithm}.out"
recipeDefaults:
- Database: ["D1"]
recipes:
- id: Ex1
uses:
- Database: ["D1", "D2"]
- MetaFeature: ["M1", "M2"]
- Algorithm: ["A1", "A2"]
- Config: ["C1", "C4"]
- id: Ex2
uses:
- Folds: ["F1", "F2", "F5"]
- MetaFeature: ["M1", "M2"]
- Algorithm: ["A1", "A2"]
- Config: ["C1", "C4"]
And I created the following POJOs to receive this data.
Repo: https://github.com/Pacheco95/ExperimentLoader
@Data
public class Experiment {
private HashSet<String> arguments;
private HashSet<Process> processes;
private HashSet<HashMap<String, HashSet<String>>> recipeDefaults;
private HashSet<Recipe> recipes;
}
@Data
public class Process {
private String id;
private String command;
private HashSet<String> in;
private HashSet<String> out;
private String log;
}
@Data
public class Recipe {
private String id;
private HashSet<HashMap<String, HashSet>> uses;
}
And this class to test the parser:
public class ExperimentLoader {
public static void main(String[] args) throws IOException {
InputStream is = args.length == 0 ? System.in : Files.newInputStream(Paths.get(args[0]));
Yaml yaml = new Yaml();
Experiment experiment = yaml.loadAs(is, Experiment.class);
Gson gson = new GsonBuilder().setPrettyPrinting().serializeNulls().create();
System.out.println(gson.toJson(experiment));
}
}
The parser seems to work well, but running this code in debugging mode some fields was instantiated with the correct type (HashSet) while others dont. They was instantiated as ArrayLists (I have no idea what kind of magic happened here).
This is a snapshot of degubbing window:
Output from my test class:
{
"arguments": [
"Fold",
"MetaFeature",
"Config",
"Database",
"Algorithm"
],
"processes": [
{
"id": "MetaFeatureCalculator",
"command": "python metaFeatCalc.py {Database} folds/{Fold} de/{MetaFeature}/{Algorithm}.csv",
"in": [
"Fold",
"Database"
],
"out": [
"MetaFeature",
"Algorithm"
],
"log": "mf/{Fold}/{MetaFeature}.out"
},
{
"id": "Tunner",
"command": "java -jar tunner.jar {MetaFeature} alg/{Algorithm} {config}",
"in": [
"Metafeature",
"Algorithm"
],
"out": [
"Config"
],
"log": "mf/{Metafeature}/{Algorithm}.out"
}
],
"recipeDefaults": [
{
"Database": [
"D1"
]
}
],
"recipes": [
{
"id": "Ex2",
"uses": [
{
"MetaFeature": [
"M1",
"M2"
]
},
{
"Folds": [
"F1",
"F2",
"F5"
]
},
{
"Config": [
"C1",
"C4"
]
},
{
"Algorithm": [
"A1",
"A2"
]
}
]
},
{
"id": "Ex1",
"uses": [
{
"MetaFeature": [
"M1",
"M2"
]
},
{
"Config": [
"C1",
"C4"
]
},
{
"Database": [
"D1",
"D2"
]
},
{
"Algorithm": [
"A1",
"A2"
]
}
]
}
]
}
My dependencies:
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.8</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>1.24</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.5</version>
</dependency>
</dependencies>
Has anyone had this problem? I cant find a solution.
Your problem is probably type erasure:
While you do not use abstract classes or interfaces, I assume that SnakeYaml has a problem with discovering the nested generic types of
HashSet<HashMap<String, HashSet>>
. The documentation suggests adding a TypeDescription; however that does not solve your problem because the interface is designed so that you can only specify the type inside the outerHashSet
, but not inside the innerHashMap
. The fact that the interface does not expect nested containers also hints that this is your problem.A workaround would be to add explicit tags inside your YAML to the sets that fail to load properly:
If you don't want to do that, you basically have two other options: Patch this feature into SnakeYAML, or use the low-level API and generate your types manually from the parser events.