I would like to write a function to check if the input is a string or not like this:
is_string(Input) ->
case check_if_string(Input) of
true -> {ok, Input};
false -> error
end.
But I found it is tricky to check whether the input is a string in Erlang. The string definition in Erlang is here: http://erlang.org/doc/man/string.html.
Any suggestions?
Thanks in advance.
In erlang, string can be represented by list or binary.
If string is used as list then you can use following function to check:
If string is used as binary in code then is_binary(Term) in build function can be used.
In Erlang a string can be actually quite a few things, so there are a few ways to do this depending on exactly what you mean by "a string". It is worth bearing in mind that every sort of string in Erlang is a list of character or lexeme values of some sort.
Encodings are not simple things, particularly when Unicode is involved. Characters can be almost arbitrarily high values, lexemes are globbed together in deep lists of integers, and Erlang
iolist()
s (which are super useful) are deep lists of mixed integer and binary values that get automatically flattened and converted during certain operations. If you are dealing with anything other than flat lists of printable ASCII values then I strongly recommend you read these:So... this is not a very simple question.
What to do about all the confusion?
Quick answer that always works: Consider the origin of the data.
You should know what kind of data you are dealing with, whether it is coming over a socket or from a file, or especially if you are generating it yourself. On the edges of your system you may need some help purifying data, though, because network clients send all sorts of random trash from time to time.
Some helper functions for the most common cases live in the io_lib module:
true
if the input is a list of characters in the unicode range.true
if the input is a deep list of legal chars.true
if the input is a deep list of Latin-1 (your basic printable ASCII values from 32 to 126).true
if the input is a flat list of Latin-1 characters (90% of the time this is what you're looking for)true
if the input is a list of printable Latin-1 (If the above isn't what you wanted, 9% of the time this is the one you want)true
if the input is a flat list of printable chars.true
if the input is a flat list of printable unicode chars (for that 1% of the time that this is your problem -- except that for some of us, myself included here in Japan, this covers 99% of my input checking cases).For more particular cases you can either use a regex from the re module or write your own recursive function that zips through a string for those special cases where a regex either doesn't fit, is impossible, or could make you vulnerable to regex attacks.