I am trying to compare strings like PRABHAKAR SHARMA
and SHARMA KUMAR PRABHAKAR
. the intention is to check if all the characters of the shorter string exist in the other string. If that is the case, I should get a 100% match otherwise a percentage representing the percentage of characters that matched.
I tried using levenshteinSim
in RecordLinkage
package but it gives a number corresponding to the number of changes required to change one string to another.
install.packages("RecordLinkage")
require(RecordLinkage)
levenshteinSim("PRABHAKAR SHARMA","SHARMA KUMAR PRABHAKAR")
#[1] 0.3636364
I want a 100% match in such a case. Also, this has to be replicated for over 1,000,000 records.
If the characters to be considered are only letters you could use:
Here is one approach
It may be a little slow, though. And it considers the space character as character, too. Use
Vectorize
to apply on a column: