I have a program that shows you whether two words are anagrams of one another. There are a few examples that will not work properly and I would appreciate any help, although if it were not advanced that would be great, as I am a 1st year programmer. "schoolmaster" and "theclassroom" are anagrams of one another, however when I change "theclassroom" to "theclafsroom" it still says they are anagrams, what am I doing wrong?
import java.util.ArrayList;
public class AnagramCheck
{
public static void main(String args[])
{
String phrase1 = "tbeclassroom";
phrase1 = (phrase1.toLowerCase()).trim();
char[] phrase1Arr = phrase1.toCharArray();
String phrase2 = "schoolmaster";
phrase2 = (phrase2.toLowerCase()).trim();
ArrayList<Character> phrase2ArrList = convertStringToArraylist(phrase2);
if (phrase1.length() != phrase2.length())
{
System.out.print("There is no anagram present.");
}
else
{
boolean isFound = true;
for (int i=0; i<phrase1Arr.length; i++)
{
for(int j = 0; j < phrase2ArrList.size(); j++)
{
if(phrase1Arr[i] == phrase2ArrList.get(j))
{
System.out.print("There is a common element.\n");
isFound = ;
phrase2ArrList.remove(j);
}
}
if(isFound == false)
{
System.out.print("There are no anagrams present.");
return;
}
}
System.out.printf("%s is an anagram of %s", phrase1, phrase2);
}
}
public static ArrayList<Character> convertStringToArraylist(String str) {
ArrayList<Character> charList = new ArrayList<Character>();
for(int i = 0; i<str.length();i++){
charList.add(str.charAt(i));
}
return charList;
}
}
Lots of people have presented solutions, but I just want to talk about the algorithmic complexity of some of the common approaches:
The simple "sort the characters using
Arrays.sort()
" approach is going to beO(N log N)
.If you use radix sorting, that reduces to
O(N)
withO(M)
space, whereM
is the number of distinct characters in the alphabet. (That is 26 in English ... but in theory we ought to consider multi-lingual anagrams.)The "count the characters" using an array of counts is also
O(N)
... and faster than radix sort because you don't need to reconstruct the sorted string. Space usage will beO(M)
.A "count the characters" using a dictionary, hashmap, treemap, or equivalent will be slower that the array approach, unless the alphabet is huge.
The elegant "product-of-primes" approach is unfortunately
O(N^2)
in the worst case This is because for long-enough words or phrases, the product of the primes won't fit into along
. That means that you'd need to useBigInteger
, and N times multiplying aBigInteger
by a small constant isO(N^2)
.For a hypothetical large alphabet, the scaling factor is going to be large. The worst-case space usage to hold the product of the primes as a
BigInteger
is (I think)O(N*logM)
.A
hashcode
based approach is usuallyO(N)
if the words are not anagrams. If the hashcodes are equal, then you still need to do a proper anagram test. So this is not a complete solution.A simple method to figure out whether the testString is an anagram of the baseString.
Some other solution without sorting.
I am a C++ developer and the code below is in C++. I believe the fastest and easiest way to go about it would be the following:
Create a vector of ints of size 26, with all slots initialized to 0, and place each character of the string into the appropriate position in the vector. Remember, the vector is in alphabetical order and so if the first letter in the string is z, it would go in myvector[26]. Note: This can be done using ASCII characters so essentially your code will look something like this:
So inserting all the elements would take O(n) time as you would only traverse the list once. You can now do the exact same thing for the second string and that too would take O(n) time. You can then compare the two vectors by checking to see if the counters in each slot are the same. If they are, that means you had the same number of EACH character in both the strings and thus they are anagrams. The comparing of the two vectors should also take O(n) time as you are only traversing through it once.
Note: The code only works for a single word of characters. If you have spaces, and numbers and symbols, you can just create a vector of size 96 (ASCII characters 32-127) and instead of saying - 'a' you would say - ' ' as the space character is the first one in the ASCII list of characters.
I hope that helps. If i have made a mistake somewhere, please leave a comment.
A similar answer may have been posted in C++, here it is again in Java. Note that the most elegant way would be to use a Trie to store the characters in sorted order, however, that's a more complex solution. One way is to use a hashset to store all the words we are comparing and then compare them one by one. To compare them, make an array of characters with the index representing the ANCII value of the characters (using a normalizer since ie. ANCII value of 'a' is 97) and the value representing the occurrence count of that character. This will run in O(n) time and use O(m*z) space where m is the size of the currentWord and z the size for the storedWord, both for which we create a Char[].