Here is another spoj problem that asks how to find the number of distinct subsequences of a string ?
For example,
Input
AAA
ABCDEFG
CODECRAFTOutput
4
128
496
How can I solve this problem ?
Here is another spoj problem that asks how to find the number of distinct subsequences of a string ?
For example,
Input
AAA
ABCDEFG
CODECRAFTOutput
4
128
496
How can I solve this problem ?
Here is my CODE:
Explanation :
I scan from the back of a string ie- from the last element to the first and therefore send the first
n-1
characters for further scanning in the recursion.Once
n==-1 or n<0(both are same)
, I reach on the empty string and return 1 because no. of subsequences of an empty string is 1.So, on returning back from recursion, we know that adding the current non-duplicate character to the previous string doubles the no. of subsequences. Doubling happens because now I can add this character at the end of all the previous subsequences. So,
with
andwithout
this character means double of all previous subsequences.Assuming that the current character is not a duplicate, I multiply the previous no. of subsequences with 2.
After the total no. of subsequences of the first
n-1
characters has been computed, we double them for the firstn
characters.But, suppose the character currently encountered(nth character) has already been present in the first
n-1
characters before(ie - found within the string s[0....n-1] (Note: s[n] is the current character)), then we have to subtract those no. of subsequences possible from up to (excluding) that part of s when the last time this current character was encountered and which has already been computed and stored in L['this particular character'].ie -
BACA
- for the given string, the 2ndA
has already been encountered before(while returning from the recursion, we first encounterB
, thenA
, thenC
and at lastA
) and so we deduct the no. of subsequences calculated upto (excluding) the 2ndA
(which is 2 (no. of subseq. beforeA
is 2)).So, every time we have calculated the no. of subsequences for the first
n-1
characters, we store them in the array L.Notice : L[k] store the no. of subsequences before the kth index.
I've used the visited array in order to check whether the given character that I'm currently present at has already been scanned through or not.
On encountering the current character, I update the visited array with the position of current position as
n
. This need to be done because we have to exclude the duplicate sequences.Note:
visited[]
is initialized with all -1 because the position of any character in the strings
is non-negative (0 based indexing).Summary:
How do you arrive at the number of duplicates? Let's say the last occurrence of current character at i, was at j'th position. Then, we will have duplicate subsequences: consider starting with i'th character and then all subsequences possible from [0,j-1] vs. starting at j'th character and then all subsequences possible from [0,j-1]. So, to eliminate this, you subtract the number of subsequences possible from upto (excluding) j with L[0]=1 mean that upto(excluding 0), no. of subseq are 1(empty string has 1 subsequence).
It's a classic dynamic programming problem.
Let:
A null string has one subsequence, so
dp[0] = 1
.Explanation
Initially, we assume we can append
a[i]
to all subsequences ending on previous characters, but this might violate the condition that the counted subsequences need to be distinct. Remember thatlast[a[i]]
gives us the last positiona[i]
appeared on until now. The only subsequences we overcount are those that the previousa[i]
was appended to, so we subtract those.Update these values as per their definition.
If your indexing starts from 0, use
a[i - 1]
wherever I useda[i]
. Also remember to wrap your computations in amod
function if you're going to submit code. This should be implemented like this:In order to correctly handle negative values in some languages (such as C/C++).
There exists an easier solution to this problem.
The idea is: If all character of the string are distinct, total number of subsequences is
2^n.
Now, if we find any character that have already occurred before, we should consider its last occurrence only (otherwise sequence won't be distinct). So we have to subtract the number of subsequences due to its previous occurrence.My implementation is like this: