How to swap two characters in a string?

2019-06-17 05:50发布

问题:

I want to write a function as follows:

  • Input: String A, int i, 0 < i < len(A)
  • Output: String A with character at (i - 1) swapped with character at i.

What is a clean solution that will achieve this? My current solution is:

let mut swapped = input_str[0..i].to_string();
swapped.push(input_str.char_at(i));
swapped.push(input_str.char_at(i - 1));
swapped.push_str(&query[i..input_str.len()]);

But that only works for ASCII strings. I can think of other solutions as converting to a vector in UTF-32, swapping there and converting back to a string, but it seems like a lot of extra work.

回答1:

Here's a pretty solution:

use std::str::CharRange;

fn swap_chars_at(input_str: &str, i: usize) -> String {
    // Pre-allocate a string of the correct size
    let mut swapped = String::with_capacity(input_str.len());
    // Pluck the previous character
    let CharRange { ch: prev_ch, next: prev } = input_str.char_range_at_reverse(i);
    // Pluck the current character
    let CharRange { ch, next } = input_str.char_range_at(i);
    // Put them back
    swapped.push_str(&input_str[..prev]);
    swapped.push(ch);
    swapped.push(prev_ch);
    swapped.push_str(&input_str[next..]);
    // Done!
    swapped
}

#[test]
fn smoke_test() {
    let s = swap_chars_at("lyra", 2);
    assert_eq!(s, "lrya");
}

#[test]
fn unicode() {
    // 'ç' takes up 2 bytes in UTF-8
    let s = swap_chars_at("ça va?", 2);
    assert_eq!(s, "aç va?");
}

From the documentation:

  • fn char_range_at(&self, start: usize) -> CharRange
    • Pluck a character out of a string and return the index of the next character.
  • fn char_range_at_reverse(&self, start: usize) -> CharRange
    • Given a byte position and a str, return the previous char and its position.

Together, these two methods let us peek backwards and forwards in the string—which is exactly what we want.


But wait, there's more! DK pointed out a corner case with the above code. If the input contains any combining characters, they may become separated from the characters they combine with.

Now, this question is about Rust, not Unicode, so I won't go into the details of how exactly that works. All you need to know for now is that Rust provides this method:

  • fn grapheme_indices(&self, is_extended: bool) -> GraphemeIndices
    • Returns an iterator over the grapheme clusters of self and their byte offsets.

With a healthy application of .find() and .rev(), we arrive at this (hopefully) correct solution:

#![allow(unstable)]  // `GraphemeIndices` is unstable

fn swap_graphemes_at(input_str: &str, i: usize) -> String {
    // Pre-allocate a string of the correct size
    let mut swapped = String::with_capacity(input_str.len());
    // Find the grapheme at index i
    let (_, gr) = input_str.grapheme_indices(true)
        .find(|&(index, _)| index == i)
        .expect("index does not point to a valid grapheme");
    // Find the grapheme just before it
    let (prev, prev_gr) = input_str.grapheme_indices(true).rev()
        .find(|&(index, _)| index < i)
        .expect("no graphemes to swap with");
    // Put it all back together
    swapped.push_str(&input_str[..prev]);
    swapped.push_str(gr);
    swapped.push_str(prev_gr);
    swapped.push_str(&input_str[i+gr.len()..]);
    // Done!
    swapped
}

#[test]
fn combining() {
    // Ensure that "c\u{327}" is treated as a single unit
    let s = swap_graphemes_at("c\u{327}a va?", 3);
    assert_eq!(s, "ac\u{327} va?");
}

Admittedly it's a bit convoluted. First it iterates through the input, plucking out the grapheme cluster at i. Then it iterates backward (.rev()) through the input, picking the rightmost cluster with index < i (i.e. the previous cluster). Finally it goes and puts everything back together.

If you're being really pedantic, there are still more special cases to deal with. For example, if the string contains Windows newlines ("\r\n"), then we probably don't want to swap them around. And in Greek, the letter sigma (σ) is written differently when it's at the end of a word (ς), so a better algorithm should translate between them as necessary. And don't forget those bidirectional control characters...

But for the sake of our sanity, we'll stop here.



标签: string rust