Optimize sum(i,j) and update(i,value) for an arrya

Given a huge array of integers, optimize the functions sum(i,j) and update(i,value), so that both the functions take less than O(n).

Update

Its an interview question. I have tried O(n) sum(i,j) and O(1) update(i, value). Other solution is preprocess the input array into 2-d array to give O(1) answer for sum(i,j). But that makes the update function of O(n).

For example, given an array:

A[] = {4,5,4,3,6,8,9,1,23,34,45,56,67,8,9,898,64,34,67,67,56,...}

Operations are to be defined are sum(i,j) and update(i,value).

sum(i,j) gives sum of numbers from index i to j.
update(i, value) updates the value at index i with the given value.

The very straight answer is that sum(i,j) can be calculated in O(n) time and update(i,value) in O(1) time.

The second approach is that precompute the sum(i,j) and store it in a 2-d array SUM[n][n] and when queried for, give the answer in O(1) time. But then the update function update(i,value) becomes of order O(n) as an entire row/column corresponding to index i has to be updated.

The interviewer gave me hint to do some preprocessing and use some data structure, but I couldn't think of.

标签： optimization segment-tree

1条回答

等我变得足够好

2楼-- · 2019-06-15 00:06

What you need is a Segment Tree. A Segment tree can perform sum(i, j) and update(i, value) in O(log(n)) time.

Quote from Wikipedia:

In computer science, a segment tree is a tree data structure for storing intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure; that is, its structure cannot be modified once it is built.

The leaves of the tree will be the initial array elements. Their parents will be the sum of their children. For example: Suppose data[] = {2, 5, 4, 7, 8, 9, 5}, then our tree will be as follows:

This tree structure is represented using arrays. Let's call this array seg_tree. So the root of the seg_tree[] will be stored at index 1 of the array. It's two children will be stored at indexes 2 and 3. The general trend for 1-indexed representation will be:

Left child of index i is at index 2*i.
Right child of index i is at index 2*i+1.
Parent of index i is at index i/2.

and for 0-indexed representation:

Left child of index i is at index 2*i+1.
Right child of index i is at index 2*i+2.
Parent of index i is at index (i-1)/2.

Each interval [i, j] in the above picture denotes the sum of all elements in the interval data[i, j]. The root node will denote the sum of the whole data[], i.e., sum(0, 6) . Its two children will denote the sum(0, 3) and sum(4, 6) and so on.

The length of seg_tree[], MAX_LEN, will be (if n = length of data[]):

2*n-1 when n is a power of 2
2*(2^(log_2(n)+1) - 1 when n is not a power of 2.

Construction of seg_tree[] from data[]:

We will assume a 0-indexed construction in this case. Index 0 will be the root of the tree and the elements of the initial array will be stored in the leaves.

data[0...(n-1)] is the initial array and seg_tree[0...MAX_LEN] is the segment tree representation of data[]. It will be easier to understand how to construct the tree from the pseudo code:

build(node, start, end) {

    // invalid interval
    if start > end:
        return

    // leaf nodes
    if start == end:
        tree[node] = data[start]
        return

    // build left and right subtrees
    build(2*node+1, start, (start + end)/2);
    build(2*node+2, 1+(start+end)/2, end);

    // initialize the parent with the sum of its children
    tree[node] = tree[2*node+1] + tree[2*node+2]
}

Here,

[start, end] denotes the interval in data[] for which segment tree representation is to be formed. Initially, this is (0, n-1).
node represents the current index in the seg_tree[].

We start the building process by calling build(0, 0, n-1). The first argument denotes the position of the root in seg_tree[]. Second and third argument denotes the interval in data[] for which segment tree representation is to be formed. In each subsequent call node will represent the index of seg_tree[] and (start, end) will denote the interval for which seg_tree[node] will store the sum.

There are three cases:

start > end is an invalid interval and we simply return from this call.
if start == end, represents the leaf of the seg_tree[] and hence, we initialize tree[node] = data[start]
Otherwise, we are in a valid interval which is not a leaf. So we first build the left child of this node by calling build(node, start, (start + end)/2), then the right subtree by calling build(node, 1+(start+end)/2, end). Then we initialize the current index in seg_tree[] by the sum of its child nodes.

For sum(i, j):

We need to check whether the intervals at nodes overlap (partial/complete) with the given interval (i, j) or they do not overlap at all. These are the three cases:

For no overlap we can simply return 0.
For complete overlap, we will return the value stored at that node.
For partial overlap, we will visit both the children and continue this check recursively.

Suppose we need to find the value of sum(1, 5). We proceed as follows:

Let us take an empty container (Q) which will store the intervals of interest. Eventually all these ranges will be replaced by the values that they return. Initially, Q = {(0, 6)}.

We notice that (1, 5) does not completely overlap (0, 6), so we remove this range and add its children ranges. Q = {(0, 3), (4, 6)}

Now, (1, 5) partially overlaps (0, 3). So we remove (0, 3) and insert its two children. Q = {(0, 1), (2, 3), (4, 6)}

(1, 5) partially overlaps (0, 1), so we remove this and insert its two children range. Q = {(0, 0), (1, 1), (2, 3), (4, 6)}

Now (1, 5) does not overlap (0, 0), so we replace (0, 0) with the value that it will return (which is 0 because of no overlap). Q = {(0, (1, 1), (2, 3), (4, 6)}

Next, (1, 5) completely overlaps (1, 1), so we return the value stored in the node that represents this range (i.e., 5). Q = {0, 5, (2, 3), (4, 6)}

Next, (1, 5) again completely overlaps (2, 3), so we return the value 11. Q = {0, 5, 11, (4, 6)}

Next, (1, 5) partially overlaps (4, 6) so we replace this range by its two children. Q = {0, 5, 11, (4, 5), (6, 6)}

Fast forwarding the operations, we notice that (1, 5) completely overlaps (4, 5) so we replace this by 17 and (1, 5) does not overlap (6, 6), so we replace it with 0. Finally, Q = {0, 5, 11, 17, 0}. The answer of the query is the sum of all the elements in Q, which is 33.

For update(i, value):

For update(i, value), the process is somewhat similar. First we will search for the range (i, i). All the nodes that we encounter in this path will also need to be updated. Let change = (new_value - old_value). Then while traversing the tree in the search of the range (i, i), we will add this change to all those nodes except the last node which will simply be replaced by the new value. For example, let the query be update(5, 8).

change = 8-9 = -1.

The path encountered will be (0, 6) -> (4, 6) -> (4, 5) -> (5, 5).

Final Value of (0, 6) = 40 + change = 40 - 1 = 39.

Final Value of (4, 6) = 22 + change = 22 - 1 = 21.

Final Value of (4, 5) = 17 + change = 17 - 1 = 16.

Final Value of (5, 5) = 8. The final tree will look like this:

We can create a segment tree representation using arrays in O(n) time and both of these operations have a time complexity of O(log(n)).

In general Segment Trees can perform the following operations efficiently:

Update: It can update:
- an element at a given index.
- all the elements in an interval to a given value. Lazy Propagation technique is generally employed in this case to achieve efficiency.
Query: We query for some value in a given interval. The few basic types of queries are:
- Minimum element in an interval
- Maximum element in an interval
- Sum/Product of all elements in an interval

Another data structure Interval Tree can also be used to solve this problem. Suggested Readings:

0人赞添加讨论(0) 举报