I have a project that uses java, scala and Apache Spark to do distributed computations on genomic data. Using py4j and mimicking the PySpark model, we expose a python API that calls into the JVM. Our goal has been to bring this model into jupyter notebooks, which has been pretty easy so far, with one lingering problem: logging.
The problem
We (and Spark) use log4j to write log messages to a log file and stderr. This stderr is the stderr for the java process, so if I run two commands from the jupyter notebook:
print('foo')
info('bar') # calls log4j logger.info in JVM
I see 'foo' written to the jupyter cell, but 'bar' is written to the terminal running the jupyter process.
My goal
Connect log4j to the jupyter notebook so that log4j messages are written to jupyter cells, instead of the terminal.
What I've tried
The java log4j.ConsoleAppender is writing to the java stderr. So, we're going to need to route the java stderr through jupyter somehow, right? This may involve using System.setOut(...)
with a PrintStream
object hooked up to the jupyter process, but I'm not yet sure how to do that.
We solved this by using a separate socket to communicate between Java and Python. Here's the commit diff: https://github.com/hail-is/hail/commit/93d7e95a82ab39501eede7ecb301538bcd013ea8