How to parse a comma delimited string when comma a

2019-02-26 01:39发布

I have this string in C#

adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO

I want to use a RegEx to parse it to get the following:

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

In addition to the above example, I tested with the following, but am still unable to parse it correctly.

"%exc.uns: 8 hours let  @ = ABC, DEF", "exc_it = 1 day"  , " summ=graffe ", " a,b,(c,d)" 

The new text will be in one string

string mystr = @"""%exc.uns: 8 hours let  @ = ABC, DEF"", ""exc_it = 1 day""  , "" summ=graffe "", "" a,b,(c,d)"""; 

9条回答
甜甜的少女心
2楼-- · 2019-02-26 01:56

Assuming non nested, matching parentheses, you can easily match the tokens you want instead of splitting the string:

MatchCollection matches = Regex.Matches(data, @"(?:[^(),]|\([^)]*\))+");
查看更多
唯我独甜
3楼-- · 2019-02-26 01:58
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
    if (str[i] == ',' && scopeLevel == 0)
    {
        resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
        firstIndex = i + 1;
    }
    else if (str[i] == '(') scopeLevel++;
    else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));
查看更多
混吃等死
4楼-- · 2019-02-26 01:59

The TextFieldParser (msdn) class seems to have the functionality built-in:

TextFieldParser Class: - Provides methods and properties for parsing structured text files.

Parsing a text file with the TextFieldParser is similar to iterating over a text file, while the ReadFields method to extract fields of text is similar to splitting the strings.

The TextFieldParser can parse two types of files: delimited or fixed-width. Some properties, such as Delimiters and HasFieldsEnclosedInQuotes are meaningful only when working with delimited files, while the FieldWidths property is meaningful only when working with fixed-width files.

See the article which helped me find that

查看更多
我命由我不由天
5楼-- · 2019-02-26 02:01

Just this regex:

[^,()]+(\([^()]*\))?

A test example:

var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(@"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
    .Cast<Match>()
    .Select(m => m.Value);

returns

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
 NG/CL
 5 value of CL(JK)
 HO
查看更多
We Are One
6楼-- · 2019-02-26 02:02

Another way to implement what Snowbear was doing:

    public static string[] SplitNest(this string s, char src, string nest, string trg)
    {
        int scope = 0;
        if (trg == null || nest == null) return null;
        if (trg.Length == 0 || nest.Length < 2) return null;
        if (trg.IndexOf(src) >= 0) return null;
        if (nest.IndexOf(src) >= 0) return null;

        for (int i = 0; i < s.Length; i++)
        {
            if (s[i] == src && scope == 0)
            {
                s = s.Remove(i, 1).Insert(i, trg);
            }
            else if (s[i] == nest[0]) scope++;
            else if (s[i] == nest[1]) scope--;
        }

        return s.Split(trg);
    }

The idea is to replace any non-nested delimiter with another delimiter that you can then use with an ordinary string.Split(). You can also choose what type of bracket to use - (), <>, [], or even something weird like \/, ][, or `'. For your purposes you would use

string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");

The function would first turn your string into

adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO

then split on the ~, ignoring the nested commas.

查看更多
劫难
7楼-- · 2019-02-26 02:03

If you simply must use Regex, then you can split the string on the following:

,                # match a comma
(?=              # that is followed by
  (?:            # either
    [^\(\)]*     #  no parens at all
    |            # or
    (?:          #  
      [^\(\)]*   #  ...
      \(         #  (
      [^\(\)]*   #     stuff in parens
      \)         #  )
      [^\(\)]*   #  ...
    )+           #  any number of times
  )$             # until the end of the string
)

It breaks your input into the following:

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

You can also use .NET's balanced grouping constructs to create a version that works with nested parens, but you're probably just as well off with one of the non-Regex solutions.

查看更多
登录 后发表回答