I want to write a function as follows:
- Input: String A, int i, 0 < i < len(A)
- Output: String A with character at (i - 1) swapped with character at i.
What is a clean solution that will achieve this? My current solution is:
let mut swapped = input_str[0..i].to_string();
swapped.push(input_str.char_at(i));
swapped.push(input_str.char_at(i - 1));
swapped.push_str(&query[i..input_str.len()]);
But that only works for ASCII strings. I can think of other solutions as converting to a vector in UTF-32, swapping there and converting back to a string, but it seems like a lot of extra work.
Here's a pretty solution:
From the documentation:
fn char_range_at(&self, start: usize) -> CharRange
fn char_range_at_reverse(&self, start: usize) -> CharRange
Together, these two methods let us peek backwards and forwards in the string—which is exactly what we want.
But wait, there's more! DK pointed out a corner case with the above code. If the input contains any combining characters, they may become separated from the characters they combine with.
Now, this question is about Rust, not Unicode, so I won't go into the details of how exactly that works. All you need to know for now is that Rust provides this method:
fn grapheme_indices(&self, is_extended: bool) -> GraphemeIndices
With a healthy application of
.find()
and.rev()
, we arrive at this (hopefully) correct solution:Admittedly it's a bit convoluted. First it iterates through the input, plucking out the grapheme cluster at
i
. Then it iterates backward (.rev()
) through the input, picking the rightmost cluster with index< i
(i.e. the previous cluster). Finally it goes and puts everything back together.If you're being really pedantic, there are still more special cases to deal with. For example, if the string contains Windows newlines (
"\r\n"
), then we probably don't want to swap them around. And in Greek, the letter sigma (σ) is written differently when it's at the end of a word (ς), so a better algorithm should translate between them as necessary. And don't forget those bidirectional control characters...But for the sake of our sanity, we'll stop here.