二郎解析字符串的数据类型使用正则表达式(erlang parse string to data ty

2019-10-30 06:38发布

我试图在二郎一个解析器,在希望识别字符串中的数据类型。 经过搜索,我无法找到任何存在的问题是我的:

  • 原始字符串: atom1,"string2,,\"\",",{tuple3, "s pa ces \"", {[test]},"_#",test},<<"binary4\",,>>">>, #{map5=>5, element=>{e1,e2}}, #record6{r1 = 1, r2 = 2} , <<300:16>>

  • 需要字符串,它是要解析: "atom1,\"string2,,\\\"\\\",\",{tuple3, \"s pa ces \\\"\", {[test]},\"_#\",test},<<\"binary4\\\",,>>\">>, #{map5=>5, element=>{e1,e2}}, #record6{r1 = 1, r2 = 2} , <<300:16>>"

  • 预计输出继电器:

     + number of params: 7 + value ------> type" - atom1 ------> Atom - "string2,,\"\"," ------> String - {tuple3, "s pa ces \"", {[test]},"_#",test} ------> Tuple - <<"binary4\",,>>">> ------> Binary - #{map5=>5, element=>{e1,e2}} ------> Map - #record6{r1 = 1, r2 = 2} ------> Record - <<300:16>> ------> Binary 

但正如预期的我当前的代码不工作,那就是:

comma_parser(Params) ->
{ok, R} = re:compile("(\".*?\"|[^\",\\s]+)(?=\\s*,|\\s*$)"),
{match, Matches} = re:run(Params, R, [{capture, [1], list}, global]),
?DEBUG("truonggv1 - comma_parser: Matches: ~p~n", [Matches]),
[M || [M] <- Matches].

电流输出:

  + number of params: 14
  + value ------> type
    - atom1 ------> Atom
    - "string2,,\"\" ------> String
    - ",{tuple3, "s pa ces \"" ------> String
    - {[test]} ------> Tuple
    - "_#" ------> String
    - test} ------> Atom
    - "binary4\" ------> String
    - >> ------> Atom
    - #{map5=>5 ------> Map
    - element=>{e1 ------> Atom
    - e2}} ------> Atom
    - 1 ------> Atom
    - 2} ------> Atom
    - <<300:16>> ------> Binary

有谁知道如何解决这个吗?

更新我的代码使用参数是“字符串,它是需要分析”,我上面所指出的:

check_params_by_comma(Params) ->
  case string:str(Params, ",") of
     0 ->
       Result = Params;
     1 ->
       Result = "param starts with character ',' ~n";
     _Comma_Pos ->
       Parse_String = comma_parser(Params),
       Result = "number of params: " ++ integer_to_list(length(Parse_String))
                ++ "\n\n\r\t value ------> type \n\r"
                ++ "\t*********************\n\r"
                ++ ["\t" ++ X ++ " ------> " ++ check_type(X) ++ "\n\r"|| X <- Parse_String]
  end,
  Result.

check_type(X) ->
  Binary = string:str(X, "<<"),
  String = string:str(X, "\""),
  Tuple = string:str(X, "{"),
  List = string:str(X, "["),
  Map = string:str(X, "#{"),
  case X of
    _ when 1 == Binary -> "Binary";
    _ when 1 == String -> "String";
    _ when 1 == Tuple -> "Tuple";
    _ when 1 == List -> "List";
    _ when 1 == Map -> "Map";
    _ -> "Atom"
  end.

comma_parser(Params) ->
  {ok, R} = re:compile("(\".*?\"|[^\",\\s]+)(?=\\s*,|\\s*$)"),
  {match, Matches} = re:run(Params, R, [{capture, [1], list}, global]),
  [M || [M] <- Matches].

Answer 1:

我不能完全肯定我理解你想要达到的,但让我告诉你,我做你的输入,让我们看看有没有什么帮助你的。 你的情况似乎在极力呼吁erl_scan:字符串和erl_parse:parse_exprs ,所以这是我想的第一件事。

这是我原来的解析的版本:

-module(x).

-export([test/0, check_params_by_comma/1]).

test() ->
  Input =
    "atom1,\"string2,,\\\"\\\",\",{tuple3, \"s pa ces \\\"\", "
    "{[test]},\"_#\",test},<<\"binary4\\\",,>>\">>, "
    "#{map5=>5, element=>{e1,e2}}, #record6{r1 = 1, r2 = 2} , <<300:16>>",
  io:format("~p~n", [check_params_by_comma(Input)]).

check_params_by_comma(Params) ->
  {ok, Tokens, _} = erl_scan:string(Params ++ "."),
  {ok, Exprs} = erl_parse:parse_exprs(Tokens),
  Exprs.

当然,这还不是全部,因为你想要一个不同类型的输出,但我们几乎没有。 复制从你原来的问题演示代码,我不得不使用erl_prettypr:格式/ 1渲染的条款和我结束了类似:

-module(x).

-export([test/0, check_params_by_comma/1]).

test() ->
  Input =
    "atom1,\"string2,,\\\"\\\",\",{tuple3, \"s pa ces \\\"\", "
    "{[test]},\"_#\",test},<<\"binary4\\\",,>>\">>, "
    "#{map5=>5, element=>{e1,e2}}, #record6{r1 = 1, r2 = 2} , <<300:16>>",
  io:format("~s~n", [check_params_by_comma(Input)]).

check_params_by_comma(Params) ->
  Parse_String = comma_parser(Params),
  "number of params: " ++ integer_to_list(length(Parse_String))
  ++ "\n\n\r\t value ------> type \n\r"
  ++ "\t*********************\n\r"
  ++ ["\t" ++ erl_prettypr:format(X) ++ " ------> " ++ check_type(X) ++ "\n\r"|| X <- Parse_String].

comma_parser(Params) ->
  {ok, Tokens, _} = erl_scan:string(Params ++ "."),
  {ok, Exprs} = erl_parse:parse_exprs(Tokens),
  Exprs.

check_type({Type, _, _}) -> atom_to_list(Type);
check_type({Type, _, _, _}) -> atom_to_list(Type).

我觉得这应该是足以解决你的问题,但是,作为奖金轨道,让我用iolists获得几乎完全需要什么样的预期输出重构这一点:

-module(x).

-export([test/0, check_params_by_comma/1]).

test() ->
  Input =
    "atom1,\"string2,,\\\"\\\",\",{tuple3, \"s pa ces \\\"\", "
    "{[test]},\"_#\",test},<<\"binary4\\\",,>>\">>, "
    "#{map5=>5, element=>{e1,e2}}, #record6{r1 = 1, r2 = 2} , <<300:16>>",
  io:format("~s~n", [check_params_by_comma(Input)]).

check_params_by_comma(Params) ->
  {ok, Tokens, _} = erl_scan:string(Params ++ "."),
  {ok, Exprs} = erl_parse:parse_exprs(Tokens),
  [
    io_lib:format("+ number of params: ~p~n", [length(Exprs)]),
    "+ value ------> type \n"
  | lists:map(fun format_expr/1, Exprs)
  ].

format_expr(Expr) ->
  io_lib:format(
    "\t- ~s ------> ~s~n",
    [erl_prettypr:format(Expr), string:titlecase(type(Expr))]
  ).

%% or you can do type(Expr) -> atom_to_list(hd(tuple_to_list(Expr))).
type({Type, _, _}) -> atom_to_list(Type);
type({Type, _, _, _}) -> atom_to_list(Type).

希望这可以帮助 :)



文章来源: erlang parse string to data types using regex
标签: regex erlang