Given a huge array of integers, optimize the functions sum(i,j) and update(i,value), so that both the functions take less than O(n).
Update
Its an interview question. I have tried O(n) sum(i,j) and O(1) update(i, value). Other solution is preprocess the input array into 2-d array to give O(1) answer for sum(i,j). But that makes the update function of O(n).
For example, given an array:
A[] = {4,5,4,3,6,8,9,1,23,34,45,56,67,8,9,898,64,34,67,67,56,...}
Operations are to be defined are sum(i,j)
and update(i,value)
.
sum(i,j)
gives sum of numbers from indexi
toj
.update(i, value)
updates the value at indexi
with the givenvalue
.
The very straight answer is that sum(i,j)
can be calculated in O(n)
time and update(i,value)
in O(1)
time.
The second approach is that precompute the sum(i,j)
and store it in a 2-d array SUM[n][n]
and when queried for, give the answer in O(1)
time. But then the update function update(i,value)
becomes of order O(n)
as an entire row/column corresponding to index i
has to be updated.
The interviewer gave me hint to do some preprocessing and use some data structure, but I couldn't think of.
What you need is a Segment Tree. A Segment tree can perform
sum(i, j)
andupdate(i, value)
inO(log(n))
time.Quote from Wikipedia:
The leaves of the tree will be the initial array elements. Their parents will be the sum of their children. For example: Suppose
data[] = {2, 5, 4, 7, 8, 9, 5}
, then our tree will be as follows:This tree structure is represented using arrays. Let's call this array
seg_tree
. So the root of the seg_tree[] will be stored at index 1 of the array. It's two children will be stored at indexes 2 and 3. The general trend for1-indexed
representation will be:i
is at index2*i
.i
is at index2*i+1
.i
is at indexi/2
.and for
0-indexed
representation:i
is at index2*i+1
.i
is at index2*i+2
.i
is at index(i-1)/2
.Each interval
[i, j]
in the above picture denotes the sum of all elements in the intervaldata[i, j]
. The root node will denote the sum of the whole data[], i.e.,sum(0, 6)
. Its two children will denote thesum(0, 3)
andsum(4, 6)
and so on.The length of seg_tree[], MAX_LEN, will be (if
n = length of data[]
):2*n-1
when n is a power of 22*(2^(log_2(n)+1) - 1
when n is not a power of 2.Construction of seg_tree[] from data[]:
We will assume a
0-indexed
construction in this case. Index 0 will be the root of the tree and the elements of the initial array will be stored in the leaves.data[0...(n-1)]
is the initial array andseg_tree[0...MAX_LEN]
is the segment tree representation of data[]. It will be easier to understand how to construct the tree from the pseudo code:Here,
[start, end]
denotes the interval in data[] for which segment tree representation is to be formed. Initially, this is(0, n-1)
.We start the building process by calling
build(0, 0, n-1)
. The first argument denotes the position of the root in seg_tree[]. Second and third argument denotes the interval in data[] for which segment tree representation is to be formed. In each subsequent call node will represent the index of seg_tree[] and (start, end) will denote the interval for which seg_tree[node] will store the sum.There are three cases:
start > end
is an invalid interval and we simply return from this call.start == end
, represents the leaf of the seg_tree[] and hence, we initializetree[node] = data[start]
build(node, start, (start + end)/2)
, then the right subtree by callingbuild(node, 1+(start+end)/2, end)
. Then we initialize the current index in seg_tree[] by the sum of its child nodes.For sum(i, j):
We need to check whether the intervals at nodes overlap (partial/complete) with the given interval (i, j) or they do not overlap at all. These are the three cases:
Suppose we need to find the value of
sum(1, 5)
. We proceed as follows:Let us take an empty container (Q) which will store the intervals of interest. Eventually all these ranges will be replaced by the values that they return. Initially, Q =
{(0, 6)}
.We notice that (1, 5) does not completely overlap (0, 6), so we remove this range and add its children ranges. Q =
{(0, 3), (4, 6)}
Now, (1, 5) partially overlaps (0, 3). So we remove (0, 3) and insert its two children. Q =
{(0, 1), (2, 3), (4, 6)}
(1, 5) partially overlaps (0, 1), so we remove this and insert its two children range. Q =
{(0, 0), (1, 1), (2, 3), (4, 6)}
Now (1, 5) does not overlap (0, 0), so we replace (0, 0) with the value that it will return (which is 0 because of no overlap). Q =
{(0, (1, 1), (2, 3), (4, 6)}
Next, (1, 5) completely overlaps (1, 1), so we return the value stored in the node that represents this range (i.e., 5). Q =
{0, 5, (2, 3), (4, 6)}
Next, (1, 5) again completely overlaps (2, 3), so we return the value 11. Q =
{0, 5, 11, (4, 6)}
Next, (1, 5) partially overlaps (4, 6) so we replace this range by its two children. Q =
{0, 5, 11, (4, 5), (6, 6)}
Fast forwarding the operations, we notice that (1, 5) completely overlaps (4, 5) so we replace this by 17 and (1, 5) does not overlap (6, 6), so we replace it with 0. Finally,
Q = {0, 5, 11, 17, 0}
. The answer of the query is the sum of all the elements in Q, which is 33.For update(i, value):
For
update(i, value)
, the process is somewhat similar. First we will search for the range (i, i). All the nodes that we encounter in this path will also need to be updated. Letchange = (new_value - old_value)
. Then while traversing the tree in the search of the range (i, i), we will add this change to all those nodes except the last node which will simply be replaced by the new value. For example, let the query beupdate(5, 8)
.change = 8-9 = -1
.The path encountered will be
(0, 6) -> (4, 6) -> (4, 5) -> (5, 5)
.Final Value of
(0, 6) = 40 + change = 40 - 1 = 39
.Final Value of
(4, 6) = 22 + change = 22 - 1 = 21
.Final Value of
(4, 5) = 17 + change = 17 - 1 = 16
.Final Value of
(5, 5) = 8
. The final tree will look like this:We can create a segment tree representation using arrays in
O(n)
time and both of these operations have a time complexity ofO(log(n))
.In general Segment Trees can perform the following operations efficiently:
Another data structure Interval Tree can also be used to solve this problem. Suggested Readings: