A string is named 2-consistent if each word has at least 2 letters in common with the next word.
For example
"Atom another era" [
atom
hasa
andt
in common withanother
andanother
hase
anda
in common withera
(the answer is not unique).
First of all I need a data structure which takes 2 words and answers in constant time at the question "Do these words have at least 2 letters in common?"
Now, given a string of n
words I need to find the longest 2-consistent substring.
I can't figure out what data structure to use. I thought to radix tree
or prefix tree
, but I could not find the answer. Can you help me?
Part 1: Are
wordOne
andwordTwo
2-consistent ?Part 2: Longest 2-consistent substring
Assuming unaccented letters and ignoring capitalization, for each word you can store a bit-field in a 32-bit integer where bits 0-25 are set to 1 if the corresponding letter from a-z is present.
The integer can be computed in linear time like this:
If the words are assumed to be words in English or some other language, with a maximum word length then the difference between constant and linear time is fairly meaningless because (for the sake of argument) all words less than the maximum length can be padded out with non-matching characters, which will result in a constant time algorithm.
Once you have the bit fields for two words you can test if they are 2-consistent in constant time by ANDing them together and checking if the result is not zero (which would indicate no letters in common) and not a power of 2 (which would indicate only one letter in common as only a single bit is set). You can test for a power of 2 by ANDing a number with itself minus 1.
This won't work if you intend to define words like 'meet' and 'beef' which have repeated letters as 2-consistent.
If you wanted to test for 3-consistency, you just need to add an extra line to the function:
ANDing an integer with itself minus one just removes the least significant bit, so you could apply it an arbitrary number of times to test for 4-consistency, 5-consistency etc.
Easy. First compute the adjacency matrix for the dictionary you are using where 'adjacent' is defined to mean 'having at least two letters in common'. I disagree with the comments above, storing even a comprehensive English dictionary isn't very much data these days. Storing the full adjacency matrix might take too much space for your liking, so use sparse array facilities.
Now, bear in mind that an English word is just a number in base-26 (or base-52 if you insist on distinguishing capital letters) so looking up the row and column for a pair of words is a constant-time operation and you have the solution to your question.
Oh sure, this consumes space and takes a fair amount of pre-computation but OP asks about a data structure for answering the question in constant time.