Spark Streaming with Neo4j hangs while running wit

I have created a docker image of my application when I simply run it from the bash script, it works properly. However, when I run it as part of the docker-compose file the application hangs on the message:

18/06/27 13:17:18 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint

And even after I wait for a while streaming heartbeat times out. What may be the reason for such a Spark Streaming+Neo4j application performance with Docker and how it can be improved?

The docker-compose file for my application:

version: '3.3'
services:
  consumer-demo:
    build:
      context: .
      dockerfile: Dockerfile
      args:
        - ARG_CLASS=consumer
        - HOST=neo4jdb
    volumes:
      - ./:/workdir
    working_dir: /workdir
    restart: always

Overall docker-compose file for all the applications:

version: '3.3'
services:
  kafka:
    image: spotify/kafka
    ports:
     - "9092:9092"
    networks:
      - docker_elk
    environment:
    - ADVERTISED_HOST=localhost
  neo4jdb:
    image: neo4j:latest
    container_name: neo4jdb
    ports:
      - "7474:7474"
      - "7473:7473"
      - "7687:7687"
    networks:
      - docker_elk
    volumes:
      - /var/lib/neo4j/import:/var/lib/neo4j/import
      - /var/lib/neo4j/data:/data
      - /var/lib/neo4j/conf:/conf
    environment:
      - NEO4J_dbms_active__database=graphImport.db
  elasticsearch:
    image: elasticsearch:latest
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - docker_elk
    volumes:
        - esdata1:/usr/share/elasticsearch/data
  kibana:
    image: kibana:latest
    ports:
      - "5601:5601"
    networks:
      - docker_elk
volumes:
  esdata1:
    driver: local

networks:
  docker_elk:
    driver: bridge

The bash script using which an application works properly:

#!/usr/bin/env bash
if [ "$1" = "consumer" ]
then
    java -cp "jars/spark_consumer.jar" consumer.SparkConsumer 
else
    echo "Wrong parameter. It should be consumer or producer, but it is $1"
fi

Application Dockerfile which may be the reason of slowdown of the application execution:

FROM java:8
ARG ARG_CLASS
ARG HOST
ENV MAIN_CLASS $ARG_CLASS
ENV SCALA_VERSION 2.11.8
ENV SBT_VERSION 1.1.1
ENV SPARK_VERSION 2.2.0
ENV SPARK_DIST spark-$SPARK_VERSION-bin-hadoop2.6
ENV SPARK_ARCH $SPARK_DIST.tgz
ENV HOSTNAME bolt://$HOST:7687
VOLUME /workdir

WORKDIR /opt

# Install Scala
RUN \
  cd /root && \
  curl -o scala-$SCALA_VERSION.tgz http://downloads.typesafe.com/scala/$SCALA_VERSION/scala-$SCALA_VERSION.tgz && \
  tar -xf scala-$SCALA_VERSION.tgz && \
  rm scala-$SCALA_VERSION.tgz && \
  echo >> /root/.bashrc && \
  echo 'export PATH=~/scala-$SCALA_VERSION/bin:$PATH' >> /root/.bashrc

# Install SBT
RUN \
  curl -L -o sbt-$SBT_VERSION.deb https://dl.bintray.com/sbt/debian/sbt-$SBT_VERSION.deb && \
  dpkg -i sbt-$SBT_VERSION.deb && \
  rm sbt-$SBT_VERSION.deb


# Install Spark
RUN \
    cd /opt && \
    curl -o $SPARK_ARCH http://d3kbcqa49mib13.cloudfront.net/$SPARK_ARCH && \
    tar xvfz $SPARK_ARCH && \
    rm $SPARK_ARCH && \
    echo 'export PATH=$SPARK_DIST/bin:$PATH' >> /root/.bashrc

EXPOSE 9851 9852 4040 9092 9200 9300 5601 7474 7687 7473

CMD /workdir/runDemo.sh "$MAIN_CLASS"

标签： apache-spark docker neo4j docker-compose

1条回答

唯我独甜

2楼-- · 2019-01-20 19:08

The problem was that another Spark process was running on the machine blocking Spark data streaming. I checked all the processes with ps aux | grep spark and found another running process. Simply killing that process and restarting Spark Streaming application solved the problem.

0人赞添加讨论(0) 举报

Spark Streaming with Neo4j hangs while running wit

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间