How to use elasticsearch to get JOIN functionality

2019-01-20 14:56发布

问题:

I have an SQL query that does JOIN operations over 3-4 tables to get the data. Now we are shifting to elasticsearch for better performance. How can I replicate the same JOIN query using elasticsearch? I have read about parent/child documents but my data doesn't have any strict parent/child kind of data.

回答1:

Elasticsearch does not support JOINs, that's the whole purpose of NoSQL technologies in the first place. There are ways to create some relationships between your data using parent/child relationships (as you've noticed), nested objects and also using a special terms lookup optimization, but that's about it.

However, in order to take the most out of Elasticsearch, the main idea is to denormalize your data as much as possible and store self-contained documents. This means that you can take a single document and it contains all the info it needs. You don't care about the fact that data is going to be duplicated.

For instance, suppose you have the following JOIN query in your SQL database for retrieving all people with their address, city and country (i.e. 4 tables):

SELECT p.first_name, p.last_name, 
       a.street_name, a.street_num,
       c.name, c2.name
  FROM person p
  JOIN address a ON a.id = p.addr_id
  JOIN city c ON c.id = p.city_id
  JOIN country c2 ON c2.id = p.country_id

In Elasticsearch, you would create a document containing exactly the fields that are returned by the above query, i.e.

 {
     "first_name": "John",
     "last_name": "Doe",
     "street_num": 34,
     "street_name": "Main Street",
     "city": "San Francisco",
     "country": "United States"
 }

So, one way to see this, is that in Elasticsearch you're going to store the same (or very similar) set of fields that you would get as a result of running your query in your relational database.

Taking the step from RDBMS to Elasticsearch is a paradigm shift. If you are ever going to take that step, you need to think different in terms of your data.