Convert unicode code point to literal character in

2020-02-02 02:13发布

问题:

Let's say I have a text file like this.

\u0053
\u0075
\u006E

Is there a way I can convert that to this?

S
u
n

Currently, I'm using ioutil.ReadFile("data.txt"), but when I print the data, I get the unicode code points instead of the string literals. I realize this is the correct behavior for ReadFile, it's just not want I want.

I'm aiming for a substitution of the code points with their literal characters.

回答1:

You can use the strconv.Unquote() and strconv.UnquoteChar() functions to do the conversion.

One thing you should be aware of is that strconv.Unquote() can only unquote strings that are in quotes (e.g. start and end with a quote char " or a back quote char `), so we have to manually append that.

See this example:

lines := []string{
    `\u0053`,
    `\u0075`,
    `\u006E`,
}
fmt.Println(lines)

for i, v := range lines {
    var err error
    lines[i], err = strconv.Unquote(`"` + v + `"`)
    if err != nil {
        fmt.Println(err)
    }
}
fmt.Println(lines)

fmt.Println(strconv.Unquote(`"Go\u0070\x68\x65\x72"`))

Output (try it on the Go Playground):

[\u0053 \u0075 \u006E]
[S u n]
Gopher <nil>


回答2:

A slightly different approach is using strconv.ParseInt, this generates less garbage and uses less internal logic (Unquote does a lot of other checks) for parsing the lines:

for i, v := range lines {
    if len(v) != 6 {
        continue
    }

    if r, err := strconv.ParseInt(v[2:], 16, 32); err == nil {
        lines[i] = string(r)
    }
}

playground



标签: unicode go