Neo4j in Docker - Max Heap Size Causes Hard crash

2019-02-24 09:54发布

问题:

I'm trying to spin up a Neo4j 3.1 instance in a Docker container (through Docker-Compose), running on OSX (El Capitan). All is well, unless I try to increase the max-heap space available to Neo above the default of 512MB.

According to the docs, this can be achieved by adding the environment variable NEO4J_dbms_memory_heap_maxSize, which then causes the server wrapper script to update the neo4j.conf file accordingly. I've checked and it is being updated as one would expect.

The problem is, when I run docker-compose up to spin up the container, the Neo4j instance crashes out with a 137 status code. A little research tells me this is a linux hard-crash, based on heap-size maximum limits.

$ docker-compose up
Starting elasticsearch
Recreating neo4j31
Attaching to elasticsearch, neo4j31
neo4j31          | Starting Neo4j.
neo4j31 exited with code 137

My questions:

  1. Is this due to a Docker or an OSX limitation?
  2. Is there a way I can modify these limits? If I drop the requested limit to 1GB, it will spin up, but still crashes once I run my heavy query (which is what caused the need for increased Heap space anyway).
  3. The query that I'm running is a large-scale update across a lot of nodes (>150k) containing full-text attributes, so that they can be syncronised to ElasticSearch using the plug-in. Is there a way I can get Neo to step through doing, say, 500 nodes at a time, using only cypher (I'd rather avoid writing a script if I can, feels a little dirty for this).

My docker-compose.yml is as follows:

---
version: '2'
services:
 # ---<SNIP>

  neo4j:
    image: neo4j:3.1
    container_name: neo4j31
    volumes:
      - ./docker/neo4j/conf:/var/lib/neo4j/conf
      - ./docker/neo4j/mnt:/var/lib/neo4j/import
      - ./docker/neo4j/plugins:/plugins 
      - ./docker/neo4j/data:/data
      - ./docker/neo4j/logs:/var/lib/neo4j/logs
    ports:
        - "7474:7474"
        - "7687:7687"
    environment:
        - NEO4J_dbms_memory_heap_maxSize=4G

 # ---<SNIP>

回答1:

  1. Is this due to a Docker or an OSX limitation?

    NO Increase the amount of available RAM to Docker to resolve this issue.

  2. Is there a way I can modify these limits? If I drop the requested limit to 1GB, it will spin up, but still crashes once I run my heavy query (which is what caused the need for increased Heap space anyway).

  3. The query that I'm running is a large-scale update across a lot of nodes (>150k) containing full-text attributes, so that they can be syncronised to ElasticSearch using the plug-in. Is there a way I can get Neo to step through doing, say, 500 nodes at a time, using only cypher (I'd rather avoid writing a script if I can, feels a little dirty for this).

    N/A This is a NEO4J specific question. It might be better to seperate this from the Docker questions listed above.



回答2:

3.The query that I'm running is a large-scale update across a lot of nodes (>150k) containing full-text attributes, so that they can be syncronised to ElasticSearch using the plug-in. Is there a way I can get Neo to step through doing, say, 500 nodes at a time, using only cypher (I'd rather avoid writing a script if I can, feels a little dirty for this).

you can do this with the help of apoc plugin for neo4j , more specifically apoc.periodic.iterate or apoc.periodic.commit .

if you will use apoc.periodic.commit your first match should be specific like in example you mark which nodes have you already synced, because it sometimes fall in the loop:

call apoc.periodic.commit("
match (user:User) WHERE user.synced = false
with user limit {limit}
MERGE (city:City {name:user.city})
MERGE (user)-[:LIVES_IN]->(city)
SET user.synced =true
RETURN count(*)
",{limit:10000})

If you use apoc.periodic.iterate you can run it in parallel mode:

CALL apoc.periodic.iterate(
"MATCH (o:Order) WHERE o.date > '2016-10-13' RETURN o",
"with {o} as o MATCH (o)-[:HAS_ITEM]->(i) WITH o, sum(i.value) as value 
CALL apoc.es.post(host-or-port,index-or-null,type-or-null,
query-or-null,payload-or-null) yield value return *", {batchSize:100, parallel:true})

Note that there is no need for second MATCH clause and apoc.es.post is a function for apoc that can send post requests to elastic search. see documentation for more info