How to schedule Pentaho Kettle transformations?

2019-04-02 08:59发布

问题:

I've set up four transformations in Kettle. Now, I would like to schedule them so that they will run daily at a certain time and one after the another. For example,

tranformation1 -> transformation2 -> transformation3 -> transformation4

should run daily at 8.00 am. How can I do that?

回答1:

You can execute transformation from the command line using the tool Pan:

Pan.bat /file:transform.ktr /param:name=value

The syntax might be different depending on your system - check out the link above for more information. When you have a batch file executing your transformation you can just schedule it to run using any scheduling tool on the whatever system you are running.

Also, you could put all the transformation in a job and execute that from the command line with Kitchen.



回答2:

There are basically two ways of scheduling jobs in PDI.

1. You can use the command line (as correctly written by Anders):

for transformation scheduling: <pentaho-installation directory>/pan.sh -file:"your-transformation.ktr"

for job scheduling: <pentaho-installation directory>/kitchen.sh -file:"your-transformation.kjb"

2. You can also use the inbuilt scheduler in Pentaho Spoon.

If you are using the EE version of PDI, you will have a inbuilt scheduler in the spoon itself. Its an UI interface which you can use it to easily schedule jobs. You can also read this section of doc for more.



回答3:

I'd like to add another answer that many first-time spoon users miss. Let's say you have a transformation exampleTrafo.ktr that you want to run in a certain interval. Then what you could do is create a job exampleJob.kjb which merely runs the transformation. If you do so, you will have to create something that looks like this:

The START node here is the important thing: right klick on it and choose Edit... and you'll be presented with a job scheduling window where you can specify your desired job schedule. Then save and run this job (either locally or eventually remote on a slave using PDI's carte server). Basically what you will end up with is a indefinitely running job called exampleJob that will execute your exampleTrafo in the desired intervals.