What are the options to store relational data in elasticsearch. I know the following approaches
Nested object :- I don't want to store data in nested format because I want to update the one document without changing the other document and if I use nested object then there will be repetition of child data in parent documents.
Parent-child :- I don't want to store data in single index, but for using Parent-child data needs to be present in one index(different types). I know this restriction will be removed in future release as mentioned in https://github.com/elastic/elasticsearch/issues/15613 issue, but I want a solution that should work with 5.5 version.
Is there any other approach other then above.
Nested Object is a perfect approach for it. There will be no repetition of child objects in parent document if you update the child objects correctly.
I'm using the same approach for one of my use case where I need to maintain relational data of Master-Child One-to-Many relationship.
I've written a Painless script for Update API to Add & Update existing nested child objects inside parent document without creating duplicates or repetitive entries.
Updated Answer:
Below is the structure of Parent-Child Nested Type document with embedded nested type documents "childs".
{
"parent_id": 1,
"parent_name": "ABC",
"parent_number": 123,
"parent_addr": "123 6th St. Melbourne, FL 32904"
"childs": [
{
"child_id": 1,
"child_name": "PQR",
"child_number": 456,
"child_age": 10
},
{
"child_id": 2,
"child_name": "XYZ",
"child_number": 789,
"child_age": 12
},
{
"child_id": 3,
"child_name": "QWE",
"child_number": 234,
"child_age": 16
}
]
}
Mapping would be as below:
PUT parent/
{
"parent": {
"mappings": {
"parent": {
"properties": {
"parent_id": {
"type": "long"
},
"parent_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"parent_number": {
"type": "long"
},
"parent_addr": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"child_tickets": {
"type": "nested",
"properties": {
"child_id": {
"type": "long"
},
"child_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"child_number": {
"type": "long"
},
"child_age": {
"type": "long"
}
}
}
}
}
}
}
}
In RDMS, these both entities(parent, child) are two different tables with One to Many relation between Parent -> Child.
Parent's id is foreign key for Child's row. (id is must for both tables)
Now in Elasticsearch, to index the parent document we must have id to index it, in this case it is parent_id.
Index Parent document Query(parent_id is the id which i was talking about and have index the document with id(_id) = 1):
POST parent/parent/1
{
"parent_id": 1,
"parent_name": "ABC",
"parent_number": 123,
"parent_addr": "123 6th St. Melbourne, FL 32904"
}
Now, adding child(s) to the parent. For that you will require child document which should have child id plus parent id.
To add a child, parent id is must. Below is the update query to add new childs or update already present childs.
POST parent/parent/1/_update
{
"script":{
"lang":"painless",
"inline":"if (!ctx._source.containsKey(\"childs\")) {
ctx._source.childs = [];
ctx._source.childs.add(params.child);
} else {
int flag=0;
for(int i=0;i<ctx._source.childs.size();i++){
if(ctx._source.childs[i].child_id==params.child.child_id){
ctx._source.childs[i]=params.child;
flag++;
}
}
if(flag==0){
ctx._source.childs.add(params.child);
}
}",
"params":{
"child":{
"child_id": 1,
"child_name": "PQR",
"child_number": 456,
"child_age": 10
}
}
}
}
Give it a shot. Cheers!
Let me know if you need anything else.
There are two more approaches: Denormalization and running multiple queries for joins.
Denormalization will eat up some more space and increase your write time, but you will just need to run one query to retrieve your data, hence, your read time will improve. Since you don't want to store data in a single index, so joining might help you out.