Remove all duplicates from a string and select the lexicographical smallest string possible. For example, the string cbacdcbc would return acdb, not adcb.
So this has a relatively simple solution if we don't have to select the string that's lexicographical smallest, but considering that fact, I'm not sure how to come to an efficient solution. Here's what I have so far:
string removeDuplicateLetters(string s)
{
vector<bool> v(26,0);
for(int i = 0; i < s.size(); i++) {
v[s[i]-'a'] = 1;
}
string ss = "";
for(int i = 0; i < s.size(); i++) {
if(v[s[i]-'a']) {
ss += s[i];
v[s[i]-'a'] = 0;
}
}
return ss;
}
Algorithm
a,b,c,d
.a
that has all ofb,c,d
after it.Or if that's not possible, find the first
b
that has all ofa,c,d
after it.Or if that's not possible, find the first
c
that has all ofa,b,d
after it.Or if that's not possible, find the first
d
.Code example
(in Javascript; my C++ is rusty). It creates a bit pattern
chars
to store which characters are still to be found, and an arrayafter
of bit patterns to store which characters are still available after each position.Sketch of an algorithm.
Pass over the string, build a map of how many times each character appears, and the position of the rightmost (and possibly the only) occurrence of every character.
Find the smallest character that can appear in the first position. To do this, go left to right, noting the smallest character encountered; stop when you hit the rightmost occurrence of any character. Remove all characters that precede the smallest one, and all other copies of the smallest one; update the map accordingly.
Repeat starting from the character that immediately follows the smallest one from step #2.
Can terminate early once all counters in the map reach 1. Removal of other copies can be combined with normal iteration (just mark characters to be removed with 0 in the map-of-counters, skip them over in the normal search, remove them when removing the prefix).
This algorithm is quadratic in the worst case, at least in the size of the alphabet (the worst case is
abc...zabc...
; the algorithm examines half the string for each character, only to decide to keep it). I think this can be fixed by keeping track not just of the smallest, but also second smallest and third smallest and so on, in a kind of priority queue structure (details are left as n exercise for the reader).m69's javascript in c++: