Distributed Job scheduling, management, and report

2019-02-09 05:57发布

I recently had a play around with Hadoop and was impressed with it's scheduling, management, and reporting of MapReduce jobs. It appears to make the distribution and execution of new jobs quite seamless, allowing the developer to concentrate on the implementation of their jobs.

I am wondering if anything exists in the Java domain for the distributed execution of jobs that are not easily expressed as MapReduce problems? For example:

  • Jobs that require task co-ordination and synchronization. For example, they may involve sequential execution of tasks yet it is feasible to execute some tasks concurrently:

                   .-- B --.
            .--A --|       |--.
            |      '-- C --'  |
    Start --|                 |-- Done
            |                 |
            '--D -------------'
    
  • CPU intensive tasks that you'd like to distribute but don't provide any outputs to reduce - image conversion/resizing for example.

So is there a Java framework/platform that provides such a distributed computing environment? Or is this sort of thing acceptable/achievable using Hadoop - and if so are there any patterns/guidelines for these sorts of jobs?

6条回答
我命由我不由天
2楼-- · 2019-02-09 06:21

Try Redisson framework. It provides easy api to execute and schedule java.util.concurrent.Callable and java.lang.Runnable tasks. Here is documentation about distributed Executor service and Scheduler service

查看更多
欢心
3楼-- · 2019-02-09 06:22

I have since found Spring Batch and Spring Batch Integration which appear to address many of my requirements. I will let you know how I get on.

查看更多
做个烂人
4楼-- · 2019-02-09 06:24

I believe quite a few problems can be expressed as map-reduce problems.

For problems that you can't modify to fit the structure your can look at setting up your own using Java's ExecutorService. But it will be limited to one JVM and it will be quite low level. It will allow for easy coordination and synchronization however.

查看更多
我命由我不由天
5楼-- · 2019-02-09 06:28

I guess you are looking for a workflow engine for CPU intensive tasks (also know "scientific workflow", e.g. http://www.extreme.indiana.edu/swf-survey). But I'm not sure how distributed do you want it to be. Usually all workflow engines have a "single point of failure".

查看更多
狗以群分
6楼-- · 2019-02-09 06:34

ProActive Scheduler seems to fit your requirements, especially the complex workflows you mentionned with tasks coordination. It is open source and Java based. You can use it to run anything, Hadoop jobs, scripts, Java code,...

Disclaimer: I work for the company behind it

查看更多
▲ chillily
7楼-- · 2019-02-09 06:36

Take a look at Quartz. I think it supports stuff like managing jobs remotely and clustering several machines to run jobs.

查看更多
登录 后发表回答