Pentaho DI - JSON Nested File Output

2020-03-26 13:34发布

I have a requirement where I need to fetch records from multiple tables. The primary table is having one-to-many relationship to other tables.

My data source is Oracle DB. Oracle db is having the specified tables. One called Student other one is Subjects.

For sample, I have a Student Table where "Student_Id" is the Primary Key and other columns like firstname, lastName etc. Each student have registered for multiple subjects so we have student_id is the foreign key to the Subjects table. Subjects table is having subject name, Status, Teacher Name etc i.e. a student can have multiple subjects. In the Student table, I have students phone numbers like his home phone, cell phone and parent's contact number. These 3 numbers should come as one object under student node as given below.

So requirement is to show all the students from the student table and their corresponding subjects for each student as an array and the Phone Numbers for each student. The output should be in Json format.

I have given the structure below. Please let me know how to achieve this using Pentaho data integration tool. I am very much new to this technology.

    {
  "data": [
    {
      "Student_ID": "1",
      "FirstName": "fname1",
      "LastName": "lname1",
      "subjects": [
        {
          "Name": "Physics",
          "Status": "Active",
          "Teacher": "Teacher1"
        },
        {
          "Name": "History",
          "Status": "InActive",
          "Teacher": "Teacher2"
        }
      ],
      "Phone": {
        "Home": "123456",
        "Cell": "3456790",
      }
    },
    {
      "Student_ID": "2",
      "FirstName": "fname2",
      "LastName": "lname2",
      "subjects": [
        {
          "Name": "Geography",
          "Status": "Active",
          "Teacher": "Teacher1"
        },
        {
          "Name": "English",
          "Status": "InActive",
          "Teacher": "Teacher2"
        }
      ],
      "Phone": {
        "Home": "123456",
        "Cell": "3456790",
      }
    }
  ]
}

1条回答
戒情不戒烟
2楼-- · 2020-03-26 14:12

In Pentaho DI, JSON Output Step doesn't support nested dataset. In order to achieve a nested JSON structure, you need to use Javascript step to build the nested structure and eventually pass onto the output.

The usual flow follows: Input -> Group By -> Modified Javascript (JSON.Stringfy) -> Text file output (stored as .js extension without header)

A sample screenshot of the flow is as:

enter image description here

Note: This is not an exact solution but gives a clearer idea of the steps and flow.

Modified JS Step to build the nested structure. Note this is based on a default working database from MySQL (sakila). I am using actor -> film(movie) relation; similar dataset as your student -> subject.

//Initialization Steps -> Change this according to your data set
var json = {};
json.movie={};
Child_Accounts = []; 
var split_film_id   =[];
var split_title     =[];
var split_descr     =[];

//Non Nested Structure -> In your scenario these would be Student ID and Names
json.movie.actor_id     = actor_id;
json.movie.first_name   = first_name;
json.movie.last_name    = last_name;

//splitting the data
split_film_id   = film_id.split(';');
split_title     = title.split(';');
split_descr     = description.split(';');

// Loop through the splitted data and build the child structure
for(i=0; i<split_film_id.length; i++){
    var childCol = {};
    childCol.film_id=split_film_id[i];
    childCol.title=split_title[i];
    childCol.description=split_descr[i];
    Child_Accounts.push(childCol);
 }
 json.movie.films=Child_Accounts;

 // JSON Stringify the data object
 var JsonOutput = JSON.stringify(json);

Rest of the steps are straightforward.

Sample Output

enter image description here

Hope this helps :)

Gist is uploaded here.

查看更多
登录 后发表回答