-->

How to deploy scheduled Kettle jobs on Pentaho BI

2019-05-26 06:22发布

问题:

I have a server running Pentaho BI server v6 Community Edition. We've developed a Kettle job to extract from one database to another, exported as a KJB file. I would like to run this job every 12 or so hours.

I noticed that the BI server already included Kettle, and has the ability to upload and schedule jobs. Do I need to install the DI server if the BI server already has Kettle installed?

If not, how can I publish the KJB file into the BI server? I'd like to use a file system repository. If I upload the file directly through the user console the log shows that the import was a success, but I cannot select or run the job anywhere.

回答1:

I use Pentaho BI server 5, but it should work same on Pentaho BI 6.

My Kettle job runs many sub-transformations. Transformation files are stored on file system directory e.g. /opt/etl.

So lets say I have one job (daily_job.kjb) with two sub-transformations.

To run a Kettle job on Pentaho BI CE I use those steps:

  1. set up a transformation location properly in job file
  2. upload sub-transformations to proper directory on server (/opt/etl)
  3. create xaction file which executes Kettle job on BI server (daily.xaction)
  4. upload daily.xaction and daily_job.kjb files to Pentaho BI server (same folder)
  5. schedule daily.xaction file on Pentaho BI server

Job settings in daily_job.kjb:

Xaction code daily.xaction (simply it executes daily_job.kjb located in same folder in BI server as where xaction is):

<?xml version="1.0" encoding="UTF-8"?>
<action-sequence> 
  <title>My scheduled job</title>
  <version>1</version>
  <logging-level>ERROR</logging-level>
  <documentation> 
    <author>mzy</author>  
    <description>Sequence for running daily job.</description>  
    <help/>  
    <result-type/>  
    <icon/> 
  </documentation>

  <inputs> 
  </inputs>

  <outputs> 
    <logResult type="string">
      <destinations>
        <response>content</response>
      </destinations>
    </logResult>
  </outputs>

  <resources>
    <job-file>
      <solution-file> 
        <location>daily_job.kjb</location>  
        <mime-type>text/xml</mime-type> 
      </solution-file>     
    </job-file>
  </resources>

  <actions> 
    <action-definition>
      <component-name>KettleComponent</component-name>
      <action-type>Pentaho Data Integration Job</action-type>
      <action-inputs>   
      </action-inputs>
      <action-resources>
        <job-file type="resource"/>
      </action-resources>
      <action-outputs> 
        <kettle-execution-log type="string" mapping="logResult"/>  
        <kettle-execution-status type="string" mapping="statusResult"/> 
      </action-outputs>   
      <component-definition>
        <kettle-logging-level><![CDATA[info]]></kettle-logging-level>           
      </component-definition>
    </action-definition>

  </actions> 
</action-sequence>

Scheduling Kettle job (xaction file) on Pentaho BI CE:



回答2:

You can deploy the .kjb file as a Kettle endpoint as part of a Sparkl plugin and then call it with a simple API request. This should help:

http://fcorti.com/pentaho-sparkl/kettle-endpoint-sparkl-pentaho/

There are probably other ways to do this but that's the one I'm most familiar with. As for scheduling, you could just schedule a cronjob that makes the API request?



回答3:

Proceed with the following steps:

  1. Login to pentaho console after starting the pentaho bi server as Administrator:

  1. Click on Browse Files button and a new page will open. In this page, Select a folder under Folders section and then click upload in the right side pane.

  1. Select a file and click ok.

  1. Now refresh the page and then the file will be getting reflected in your respective folder.

  2. Now to schedule the job. Click you respective folder in left pane, select your main job file in middle pane and then click on Schedule in the right pane.

  3. In the new pop up, select you generated file path and click next. Select the recurrence schedule, job time, and job start date.

  1. Select yes in the next pop and you will be redirected to Manage Schedules page, where you can see you job you just scheduled. And it will be running at the schedules time.

  2. You can check the logs of you job in pentaho.log file in pentaho-server/tomcat/logs directory:

    tail -1000f /Users/kv/pentaho-server/tomcat/logs/pentaho.log