How to isolate non english words separated by spac

2019-07-19 09:28发布

问题:

I have this string

"Hello there, this is some line-aa."

how to slice it into an array like this?

Hello
there,
this
is
some
line-aa.

this is what I have tried so far

function sliceSpaces(arg)
  local list = {}
  for k in arg:gmatch("%w+") do
    print(k)
    table.insert(list, k)
  end
  return list
end

local sentence = "مرحبا يا اخوتي"
print("sliceSpaces")
print(sliceSpaces(sentence))

this code works for English text, but not for arabic, how can I make it work for arabic too?

回答1:

Lua strings are sequences of bytes, not Unicode characters. The pattern %w matches alphanumeric characters, but it applies to ASCII only.

Instead, use %S to match a non-whitespace character:

for k in arg:gmatch("%S+") do