Splitting a full name into first and last names is an unsolvable problem because names are really, really complicated. As a result, my model, which represents authors and other contributors to a book, includes both name
and filingName
fields, where filingName
should usually be "Last, First" (for Western names).
However, as a convenience for my users, I'd like to have my app make a reasonable guess at the filing name when the user fills in the regular name. The user can edit the filing name if the guess is wrong, of course, but if I guess right, I'll have saved them some time. Currently I'm simply assuming the last space-separated "word" is the last name and moving it to the front with a comma:
NSMutableArray * parts = [self.name componentsSeparatedByCharactersInSet:NSCharacterSet.whitespaceCharacterSet].mutableCopy;
if(parts.count < 2) {
return self.name;
}
NSString * lastName = parts.lastObject;
[parts removeLastObject];
return [NSString stringWithFormat:@"%@, %@", lastName, [parts componentsJoinedByString:@" "]];
I can immediately think of one case where this will lead me astray: suffixes like "Jr". But I'm sure there are many others. Are there any good resources explaining common naming caveats, or good examples of code tackling this problem, that I can use to improve my heuristic? I'm using Objective-C on the Mac (in case there's some obscure corner of a framework that could help me), but I'm willing to learn from code written in any language.
This sort of question has been asked before, but most answers either focus on the mechanics of splitting apart a string, or devolve into "design your model differently". I am designing my model differently; I'm just looking to let the computer do most of my users' work for them.
As I said earlier, this code is mainly handling the names of authors and other contributors to books. Some of the specific ramifications of that include:
- There should only be one name in
name
, because I support attaching multiple authors to a book. - Most names will not have titles, but professional titles like "Dr." could show up. Ideally these would be discarded, not treated as part of the first name.
- The names will usually be of people, but could sometimes be of organizations. I'm perfectly willing to risk mangling organization names to get better person name handling.
- I expect I will mostly be handling European names, although detecting the orthography of the name should not be difficult.
- The code should not be particularly sensitive to the user's locale.