Is it possible to query number of distinct integer

2020-06-01 19:19发布

I have read through some tutorials about two common data structure which can achieve range update and query in O(lg N): Segment tree and Binary Indexed Tree (BIT / Fenwick Tree).

Most of the examples I have found is about some associative and commutative operation like "Sum of integers in a range", "XOR integers in a range", etc.

I wonder if these two data structures (or any other data structures / algorithm, please propose) can achieve the below query in O(lg N)? (If no, how about O(sqrt N))

Given an array of integer A, query the number of distinct integer in a range [l,r]

PS: Assuming the number of available integer is ~ 10^5, so used[color] = true or bitmask is not possible

For example: A = [1,2,3,2,4,3,1], query([2,5]) = 3, where the range index is 0-based.

5条回答
Fickle 薄情
2楼-- · 2020-06-01 19:43

The given problem can also be solved using Mo's (offline) algorithm also called Square Root decomposition algorithm.

Overall time complexity is O(N*SQRT(N)).

Refer mos-algorithm for detailed explanation, it even has complexity analysis and a SPOJ problem that can be solved with this approach.

查看更多
Luminary・发光体
3楼-- · 2020-06-01 19:45

Yes, this is possible to do in O(log n), even if you should answer queries online. However, this requires some rather complex techniques.

First, let's solve the following problem: given an array, answer the queries of form "how many numbers <= x are there within indices [l, r]". This is done with a segment-tree-like structure which is sometimes called Merge Sort Tree. It is basically a segment tree where each node stores a sorted subarray. This structure requires O(n log n) memory (because there are log n layers and each of them requires storing n numbers). It is built in O(n log n) as well: you just go bottom-up and for each inner vertex merge sorted lists of its children.

Here is an example. Say 1 5 2 6 8 4 7 1 be an original array.

|1 1 2 4 5 6 7 8|
|1 2 5 6|1 4 7 8|
|1 5|2 6|4 8|1 7|
|1|5|2|6|8|4|7|1|

Now you can answer for those queries in O(log^2 n time): just make a reqular query to a segment tree (traversing O(log n) nodes) and make a binary search to know how many numbers <= x are there in that node (additional O(log n) from here).

This can be speed up to O(log n) using Fractional Cascading technique, which basically allows you to do the binary search not in each node but only in the root. However it is complex enough to be described in the post.

Now we return to the original problem. Assume you have an array a_1, ..., a_n. Build another array b_1, ..., b_n, where b_i = index of the next occurrence of a_i in the array, or ∞ if it is the last occurrence.

Example (1-indexed):

a = 1 3 1 2 2 1 4 1
b = 3 ∞ 6 5 ∞ 8 ∞ ∞

Now let's count numbers in [l, r]. For each unique number we'll count its last occurrence in the segment. With b_i notion you can see that the occurrence of the number is last if and only if b_i > r. So the problem boils down to "how many numbers > r are there in the segment [l, r]" which is trivially reduced to what I described above.

Hope it helps.

查看更多
疯言疯语
4楼-- · 2020-06-01 19:55

kd-trees provide range queries in O(logn), where n is the number of points.

If you want faster query than a kd-tree, and you are willing to pay the memory cost, then Range trees are your friends, offering a query of:

O(logdn + k)

where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.


Bentley is an important name when it comes to this field. :)

查看更多
倾城 Initia
5楼-- · 2020-06-01 19:55

There is a well-known offline method to solve this problem. If you have n size array and q queries on it and in each query, you need to know the count of distinct number in that range then you can solve this whole thing in O(n log n + q log n) time complexity. Which is similar to solve every query in O(log n) time.

Let's solve the problem using the RSQ( Range sum query) technique. For the RSQ technique, you can use a segment tree or BIT. Let's discuss the segment tree technique.

For solving this problem you need an offline technique and a segment tree. Now, what is an offline technique?? The offline technique is doing something offline. In problem-solving an example of the offline technique is, You take input all queries first and then reorder them is a way so that you can answer them correctly and easily and finally output the answers in the given input order.

Solution Idea:

First, take input for a test case and store the given n numbers in an array. Let the array name is array[] and take input q queries and store them in a vector v. where every element of v hold three field- l, r, idx. where l is the start point of a query and r is the endpoint of a query and idx is the number of queries. like this one is n^th query. Now sort the vector v on the basis of the endpoint of a query. Let we have a segment tree which can store the information of at least 10^5 element. and we also have an areay called last[100005]. which stores the last position of a number in the array[].

Initially, all elements of the tree are zero and all elements of the last are -1. now run a loop on the array[]. now inside the loop, you have to check this thing for every index of array[].

last[array[i]] is -1 or not? if it is -1 then write last[array[i]]=i and call update() function of which will add +1 in the last[array[i]] th position of segment tree. if last[array[i]] is not -1 then call update() function of segment tree which will subtract 1 or add -1 in the last[array[i]] th position of segment tree. Now you need to store current position as last position for future. so that you need to write last[array[i]]=i and call update() function which will add +1 in the last[array[i]] th position of segment tree.

Now you have to check whether a query is finished in the current index. that is if(v[current].r==i). if this is true then call query() function of segment tree which will return and sum of the range v[current].l to v[current].r and store the result in the v[current].idx^th index of the answer[] array. you also need to increment the value of current by 1. 6. Now print the answer[] array which contains your final answer in the given input order.

the complexity of the algorithm is O(n log n).

查看更多
男人必须洒脱
6楼-- · 2020-06-01 19:57

If you're willing to answer queries offline, then plain old Segment Trees/ BIT can still help.

  • Sort queries based on r values.
  • Make a Segment Tree for range sum queries [0, n]
  • For each value in input array from left to right:

    1. Increment by 1 at current index i in the segment tree.
    2. For current element, if it's been seen before, decrement by 1 in
      segment tree at it's previous position.

    3. Answer queries ending at current index i, by querying for sum in range [l, r == i].

The idea in short is to keep marking rightward indexes, the latest occurrence of each individual element, and setting previous occurrences back to 0. The sum of range would give the count of unique elements.

Overall time complexity again would be nLogn.

查看更多
登录 后发表回答