Find and replace long words in an NSString? [close

2020-02-13 08:09发布

问题:

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 6 years ago.

I'm trying to write a method that will search an NSString, determine if an individual word within the string is over 6 characters long and replace that word with some other word (something arbitrary like 'hello').

I am starting with a long paragraph and I need to end up with a single NSString object whose format and spacing has not been affected by the find and replace.

回答1:

Why another answer?

There are a couple of subtle problems with the simple solutions using componentsSeparatedByString::

  1. Punctuation is not handled as word delimiters.
  2. Whitespace other that the space character (newline, tab) is simply dropped.
  3. On long strings a lot of memory is wasted.
  4. It's slow.

Example

Assuming a substitution word of "–" a string like ...

“Essentially,” the D.H.C. concluded,
”bokanovskification consists of a series of arrests of development.”

... would result in ...

– the D.H.C. – – of a series of – of –

... while the correct output would be:

“–,” the D.H.C. –,
”– – of a series of – of –.”

Solution

Fortunately there's a much better, yet simple solution in Cocoa: -[NSString enumerateSubstringsInRange:options:usingBlock:]

It provides fast iteration over substrings defined by the options argument. One possibility is the NSStringEnumerationByWords which enumerates all substrings that are actually real words (in the current locale). It even detects individual words in languages that don't use delimiters (spaces) to separate words, like japanese.

Comparing Solutions

Here's a simple demo project that works on the jargon file (1.6 MB, 237,239 words). It compares three different solutions:

  1. componentsSeparatedByString: 270 ms
  2. enumerateSubstringsInRange: 125 ms
  3. stringByReplacingOccurrencesOfString, as described by @Monolo: 200 ms

Implementation

The core of it is the replacement loop:

NSMutableString *result = [NSMutableString stringWithCapacity:[originalString length]];
__block NSUInteger location = 0;
[originalString enumerateSubstringsInRange:(NSRange){0, [originalString length]}
                                   options:NSStringEnumerationByWords | NSStringEnumerationLocalized | NSStringEnumerationSubstringNotRequired
                                usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {

                                    if (substringRange.length > maxChar) {
                                        NSString *charactersBetweenLongWords = [originalString substringWithRange:(NSRange){ location, substringRange.location - location }];
                                        [result appendString:charactersBetweenLongWords];
                                        [result appendString:replaceWord];
                                        location = substringRange.location + substringRange.length;
                                    }

                                }];
[result appendString:[originalString substringFromIndex:location]];

Caveat

As pointed out by Monolo the proposed code uses NSString's length to determine the number of characters of a word. That's a questionable approach, to say the least. In fact a string's length specifies the number of code fragments used to encode the string, a value that often defers from what a human would assume the number of characters.

As the term "character" has different meanings in various contexts and the OP didn't specify which kind of character count to use I just leave the code as it was. If you want a different count please refer to the documentation that discusses the topic:

  • Apple's String Programming Guide, Characters and Grapheme Clusters
  • Unicode FAQ: How are characters counted when measuring the length or position of a character in a string?


回答2:

As you can see from the answers, there are several ways to accomplish what you are after, but personally I prefer to use the NSString class's stringByReplacingOccurrencesOfString:withString:options:range: method, which is made exactly to replace substrings with another string.

In your case we need to use the NSRegularExpressionSearch option which will allow to identify words with 7 or more letters (i.e., more than 6 letters as you state it).

If you use the \w* character expression you will automatically get Unicode support, so it works on as many languages as Apple (actually, ICU) supports.

It goes like this:

NSString *stringWithLongWords = @"There are some words of extended length in this text. One of them is Escher's. They will be identified with a regular expression and changed for some arbitrary word.";

NSString *overSixCharsPattern = @"(?w)\\b[\\w]{7,}\\b";
NSString *replacementString   = @"hello";

NSString *result = [stringWithLongWords stringByReplacingOccurrencesOfString: overSixCharsPattern
                                                                  withString: replacementString
                                                                     options: NSRegularExpressionSearch
                                                                       range: NSMakeRange(0, stringWithLongWords.length)];

The \b expressions denote a word boundary, which ensures that the whole word is matched and substituted. The w modifier makes \b use a more natural definition of word boundaries. Specifically, it handles the string "Escher's", the example mentioned by @NikolaiRuhe. Docs here, with a specific discussion of boundary detection here.

Also notice that a literal NSString (i.e., one you type directly in your Objective-C source file) needs two backslashes in the source code to produce one in the generated string.

There is more information in the NSString documentation

* Technically \w matches word characters, which also includes numbers in the definition used by regexes.



回答3:

Try this.

NSString *str  = @"Do any additional setup after loading the view, typically from a nib.";
NSMutableArray *array = [[str componentsSeparatedByString:@" "] mutableCopy];
for (int i = 0; i < [array count]; i++) {
    NSString *str_ = [array objectAtIndex:i];
    if ([str_ length] > 6)
        [array replaceObjectAtIndex:i withObject:@"Hello"];
}

And then Add them again

str = [array componentsJoinedByString:@" "];