I will be setting up a Mesos cluster to run single-use docker jobs, e.g. long rapidminer computations. Of course I want to get the result of the computation, so I think I should use Docker volumes for that.
Now, when I send a docker job to a cluster, specifying the volume for example in a JSON job file for Marathon or Chronos, where does the result of my computation land?
I am guessing that it is put into the respective directory on the slave node, but do I really have to go into the Mesos interface, look up which node executed my job, ssh into that node and copy my resulting file out? This seems very counterintuitive to the whole idea of Mesos of abstracting from single computers.
What would be the elegant solution for this scenario? I am very new to cluster management, so the only good solution I could think of was a distributed filesystem, although I don't know if this would be supported in the jobfile of Marathon or Chronos.
The other answers from rukletsov and js84 are both good options, but I'd like to point out an easy alternative. When using Mesos' Docker containerizer, the task sandbox is mounted as a volume in $MESOS_SANDBOX, by default
/mnt/mesos/sandbox/
inside the container, so you could store your results there or just write to stdout/stderr which is also redirected there. No need to create your own volume.Then you could use the mesos-cli to
mesos tail --follow task-id file
ormesos cat task-id file [file]
It is safe to say that Mesos assumes that all your final data is stored somewhere when you task finishes, and it's your, or if you want, your task's or your framework's responsibility to ensure this. If you want to persist intermediate results, or share results between tasks, you can look at persistent volumes, which are currently under development and will—hopefully—land in the next Mesos release. Be advised, that they are considered part of node resources and are not replicated, hence will be lost in case of node failure.
As an alternative to distributed file system, you can modify your task so that it sends the result of the computation to a certain storage, e.g. a database, a ftp server, etc.
there is ongoing work to support distributed file systems better in mesos. As of right now one potential solution could be to use hdfs and write your output there.
Hope this helps!