Titan 1.0[Berkeley+ES] - Delayed update of ES inde

2019-07-28 01:36发布

问题:

Titan 1.0 [Berkeley+ Remote Elastic Search] we are using this combination of Titan. Following is the properties file -

storage.backend=berkeleyje
storage.directory=D:/other-projects/graph-db/titan/enron-tk3/db/berkeley
index.search.backend=elasticsearch
index.search.index-name=akshayatitan
index.search.hostname=localhost
index.search.elasticsearch.client-only=true
index.search.elasticsearch.local-mode=false

We are only using Mixed Index.
Now we add a node with few properties through Java code and then fetch it.

We query by a property on which a mixed index is created.
When we query the node back by the key (one which mixed index is created) , we don't get the node immediately. However it get available after a delay.
What are we doing wrong ? Or is this delayed update of ES instance is expected?
Here is the Java code.

public static void main(String[] args) throws Exception {
    GraphTest test = new GraphTest();
    test.init();
    Thread.sleep(10000);

    String emailId = "emailId" + System.nanoTime();
    test.createNode(emailId);
    System.out.println("Create " + emailId);
    System.out.println("First time " + test.getNode(emailId));
    Thread.sleep(2000);
    System.out.println("After a delay of 2 sec " + test.getNode(emailId));
}

public void createNode(String emailid) {
    Vertex vertex = graph.addVertex("person");
    vertex.property("emailId", emailid);
    vertex.property("firstName", "First Name");
    vertex.property("lastName", "Last Name");
    vertex.property("address", "Address");
    vertex.property("hometown", "Noida");
    vertex.property("city", "Noida");
    vertex.property("spousename", "Preeti");

    graph.tx().commit();

}

public Object getNode(String emailId) {
    Vertex vert = null;
    String reString = null;
    try {
        vert = graph.traversal().V().has("emailId", emailId).next();
        reString = vert.value("emailId");
    } catch (NoSuchElementException e) {
        e.printStackTrace();
    } finally {
        graph.tx().close();
    }

    return reString;
}

The code to create index is -

    private void createMixedIndexForVertexProperty(String indexName, String propertyKeyName, Class<?> propertyType) {

    TitanManagement mgmt = ((TitanGraph) graph).openManagement();
    try {
        PropertyKey propertyKey = makePropertyKey(propertyKeyName, propertyType, mgmt);
        TitanGraphIndex graphIndex = mgmt.getGraphIndex(indexName);
        if (graphIndex == null) {
            graphIndex = mgmt.buildIndex(indexName, Vertex.class)
                    .addKey(propertyKey, Parameter.of("mapping", Mapping.STRING)).buildMixedIndex("search");
        } else {
            mgmt.addIndexKey(graphIndex, propertyKey);
        }
        mgmt.commit();
    } catch (Exception e) {
        mgmt.rollback();
    } finally {
    }

}

public PropertyKey makePropertyKey(String propertyKeyName, Class<?> propertyType, TitanManagement mgmt) {

    PropertyKey propertyKey = mgmt.getPropertyKey(propertyKeyName);
    if (propertyKey == null) {
        propertyKey = mgmt.makePropertyKey(propertyKeyName).dataType(String.class).make();
    }
    return propertyKey;
}

public void init() throws Exception {
    graph = TitanFactory
            .open(new PropertiesConfiguration(new File("src/test/resources/titan-berkeleydb-es.properties")));
    createMixedIndexForVertexProperty("personnode", "emailId", String.class);
    createMixedIndexForVertexProperty("personnode", "firstName", String.class);
    createMixedIndexForVertexProperty("personnode", "lastName", String.class);
    createMixedIndexForVertexProperty("personnode", "address", String.class);
    createMixedIndexForVertexProperty("personnode", "hometown", String.class);
    createMixedIndexForVertexProperty("personnode", "city", String.class);
    createMixedIndexForVertexProperty("personnode", "spousename", String.class);

}

回答1:

This has been discussed before on the Titan mailing list.

note that there's a latency in ES indexing (index.refresh_interval). You can't update / insert a value and immediately query it. Wait at least 1 second before you query the value, otherwise the result may be empty.

You should also read up on the Elasticsearch documentation (modifying your data and near real-time search) on the indexing behavior:

Elasticsearch provides data manipulation and search capabilities in near real time. By default, you can expect a one second delay (refresh interval) from the time you index/update/delete your data until the time that it appears in your search results. This is an important distinction from other platforms like SQL wherein data is immediately available after a transaction is completed.

CAUTION The refresh_interval expects a duration such as 1s (1 second) or 2m (2 minutes). An absolute number like 1 means 1 millisecond--a sure way to bring your cluster to its knees.