I have scratched my head over this problem for a while now. I am basically trying to generate a tree hierarchy from a set of CSV data. The CSV data is not necessarily ordered. This is like something as follows:
Header: Record1,Record2,Value1,Value2
Row: A,XX,22,33
Row: A,XX,777,888
Row: A,YY,33,11
Row: B,XX,12,0
Row: A,YY,13,23
Row: B,YY,44,98
I am trying to make the way the grouping is performed as flexible as possible. The simplest for of grouping would to do it for Record1 and Record2 with the Value1 and Value2 stored under Record2 so that we get the following output:
Record1
Record2
Value1 Value2
Which would be:
A
XX
22,33
777,888
YY
33,11
13,23
B
XX
12,0
YY
44,98
I am storing my group settings in a List at present - which I don't know if this is hindering my thoughts. This list contains a hierarchy of the groups for example:
Record1 (SchemaGroup)
.column = Record1
.columns = null
.childGroups =
Record2 (SchemaGroup)
.column = Record1
.columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation)
.childGroups = null
The code for this looks like as follows:
private class SchemaGroup {
private SchemaGroupType type = SchemaGroupType.StaticText; // default to text
private String text;
private CSVColumnInformation column = null;
private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>();
private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>();
}
private enum SchemaGroupType {
/** Allow fixed text groups to be added */
StaticText,
/** Related to a column with common value */
ColumnGroup
}
I am stuggling producing an algorithm for this, trying to think of the underlying structure to use. At present I am parsing the CSV top to bottom, using my own wrapper class:
CSVParser csv = new CSVParser(content);
String[] line;
while((line = csv.readLine()) != null ) {
...
}
I am just trying to kick start my coding brain.
Any thoughts?
The basic idea isn't difficult: group by the first record, then by the second record, etc. until you get something like this:
and then work backwards to build the trees.
However, there is a recursive component that makes it somewhat hard to reason about this problem, or show it step by step, so it's actually easier to write pseudocode.
I'll assume that every row in your csv is represented like a tuple. Each tuple has "records" and "values", using the same terms you use in your question. "Records" are the things that must be put into a hierarchic structure. "Values" will be the leaves of the tree. I'll use quotations when I use these terms with these specific meanings.
I also assume that all "records" come before all "values".
Without further ado, the code:
Now you'd have to think about the specific data structures to use, but hopefully this shouldn't be too difficult if you understand the algorithm (as you mention, I think deciding on a data structure early on may have been hindering your thoughts).
Here is the basic working solution in the form of junit (no assertions though) simplified by using google-guava collections. The code is self-explanatory and instead of file io you use csv libraries for reading the csv. This should give you the basic idea.
If you know you'll have just two levels of
Record
s, I would use something likeWhen you read new line, you look into the outer map to check whether that value for
Record1
already exists and if not, create new empty innerMap
for it.Then check the inner map whether a value for that
Record2
exists. If not, create newList
.Then read the values and add them to the list.
I recently had a need to do pretty much the same thing and wrote tree-builder.com to accomplish the task. The only difference is that as you have your CSV laid out, the last two parameters will be parent and child instead of peers. Also, my version doesn't accept a header row.
The code is all in JavaScript; it uses jstree to build the tree. You can use firebug or just view the source on the page to see how it's done. It would probably be pretty easy to tweak it to escape the comma in your CSV in order to keep the last two parameters is a single child.
Based upon how this problem is posed, I would do the following:
Does that help?