How to print a u8 slice as text if I don't car

2019-07-24 23:48发布

When printing a u8 array in Rust using println!("{:?}", some_u8_slice); this prints the numeric values (as it should).

What is the most direct way to format the characters as-is into the string without assuming any particular encoding.

Something like iterating over the byte string and writing each character to the stdout (without so much hassle).

Can this be done using Rusts format!?

Otherwise whats the most convenient way to print a u8 slice?

4条回答
手持菜刀,她持情操
2楼-- · 2019-07-25 00:05

If I can't assume a particular encoding, the way I normally do it is with the std::ascii::escape_default function. Basically, it will show most ASCII characters as they are, and then escape everything else. The downside is that you won't see every possible Unicode codepoint even if portions of your strict are correct UTF-8, but it does the job for most uses:

use std::ascii::escape_default;
use std::str;

fn show(bs: &[u8]) -> String {
    let mut visible = String::new();
    for &b in bs {
        let part: Vec<u8> = escape_default(b).collect();
        visible.push_str(str::from_utf8(&part).unwrap());
    }
    visible
}

fn main() {
    let bytes = b"foo\xE2\x98\x83bar\xFFbaz";
    println!("{}", show(bytes));
}

Output: foo\xe2\x98\x83bar\xffbaz

Another approach is to lossily decode the contents into a string and print that. If there's any invalid UTF-8, you'll get a Unicode replacement character instead of hex escapes of the raw bytes, but you will get to see all valid UTF-8 encoded Unicode codepoints:

fn show(bs: &[u8]) -> String {
    String::from_utf8_lossy(bs).into_owned()
}

fn main() {
    let bytes = b"foo\xE2\x98\x83bar\xFFbaz";
    println!("{}", show(bytes));
}

Output: foo☃bar�baz

查看更多
放我归山
3楼-- · 2019-07-25 00:15

The variant using escape_default():

use std::ascii::escape_default;

pub fn show_buf<B: AsRef<[u8]>>(buf: B) -> String {
    String::from_utf8(
        buf.as_ref()
           .iter()
           .map(|b| escape_default(*b))
           .flatten()
           .collect(),
    ).unwrap()
}
查看更多
劫难
4楼-- · 2019-07-25 00:17

If you just want to shovel the raw bytes unescaped to stdout, which can be especially useful when the output is redirected to a pipe or a file then following should do the job:

let out = std::io::stdout();
out.write_all(slice)?;
out.flush()?;

The flush is necessary since write_all immediately followed by a program exit fails to deliver the bytes to the underlying file descriptor.

查看更多
Rolldiameter
5楼-- · 2019-07-25 00:26

The simplest way is stdout().write_all(some_u8_slice). This will simply output the bytes, with no regard for their encoding. This is useful for binary data, or text in some unknown encoding where you want to preserve the original encoding.

If you want to treat the data as a string and you know that the encoding is UTF-8 (or a UTF-8 subset like ASCII) then you can do this:

use std::str;

fn main() {
    let some_utf8_slice = &[104, 101, 0xFF, 108, 111];
    if let Ok(s) = str::from_utf8(some_utf8_slice) {
        println!("{}", s);
    }
}

This will check that the data is valid UTF-8 before printing it.

查看更多
登录 后发表回答