I am trying to learn in a very simple way how luigi works. Just as a newbie I came up with this code
import luigi
class class1(luigi.Task):
def requires(self):
return class2()
def output(self):
return luigi.LocalTarget('class1.txt')
def run(self):
print 'IN class A'
class class2(luigi.Task):
def requires(self):
return []
def output(self):
return luigi.LocalTarget('class2.txt')
if __name__ == '__main__':
luigi.run()
Running this in command prompt gives error saying
raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ',
This happens because you define an output for class2
but never create it.
Let's break it down...
When running
python file.py class2 --local-scheduler
luigi will ask:
- is the output of
class2
already on disk? NO
- check dependencies of
class2
: NONE
- execute the
run
method (by default it's and empty method pass
)
- run method didn't return errors, so job finishes successfully.
However, when running
python file.py class1 --local-scheduler
luigi will:
- is the output of
class1
already on disk? NO
- check task dependencies: YES:
class2
- pause to check status of class2
- is the output of
class2
on disk? NO
- run
class2
-> running -> done without errors
- is the output of
class2
on disk? NO -> raise error
luigi never runs a task unless all of its previous dependencies are met. (i.e. their output is on the file system)