How can I fix this regex expression?

2020-05-07 09:01发布

Preface: This question is a derivative of this question.


Here is my code:

using System;
using System.Linq;
using System.Text.RegularExpressions;

class MainClass {
  public static void Main (string[] args) {
        const string rawLine = "\"TeamName\",\"PlayerName\",\"Position\"  \"Chargers\",\"Philip Rivers\",\"QB\"  \"Colts\",\"Peyton Manning\",\"QB\"  \"Patriots\",\"Tom Brady\",\"QB\"";
        var parsedLines = Regex.Split(rawLine, "(\".*? \"(?:,\".*? \")*)");
        parsedLines.ToList().ForEach(Console.WriteLine);

        Console.WriteLine("Press [ENTER] to exit.");
        Console.ReadLine();
  }
}

Here is my output:

"TeamName","PlayerName","Position"  "
Chargers
","Philip Rivers","QB"  "
Colts
","Peyton Manning","QB"  "
Patriots","Tom Brady","QB"
Press [ENTER] to exit.

And here is my desired output:

"TeamName","PlayerName","Position"
"Chargers","Philip Rivers","QB"
"Colts","Peyton Manning","QB"
"Patriots","Tom Brady","QB"
Press [ENTER] to exit.

How can I fix the regex to generate my desired output?


Relevant:

3条回答
何必那么认真
2楼-- · 2020-05-07 09:28

As Amy has already mentioned, your string seems to be something like CSV. If it is really a valid CSV - use special libraries.

If CSVHelper isn't applicable in this case and you really need regex, try something like this one:

(?<=(?:^|  ))(.*?)(?=(?:  \")|$)

I haven't coded for C#, so regex may need some edits due to c# specific.

Edit. Code example.

using System;
using System.Linq;
using System.Text.RegularExpressions;

class MainClass {
  public static void Main (string[] args) {
        const string rawLine = "\"TeamName\",\"PlayerName\",\"Position\"  \"Chargers\",\"Philip Rivers\",\"QB\"  \"Colts\",\"Peyton Manning\",\"QB\"  \"Patriots\",\"Tom Brady\",\"QB\"";
            //var parsedLines = Regex.Split(rawLine, "(?<=(?:^|  ))(.*?)(?=(?:  \")|$)");
      var parsedLines = Regex.Split(rawLine, "(?<=^)(.*?)(?=(?:  \")|$)|(?<=  )(.*?)(?=(?:  \")|$)");
            parsedLines.ToList().ForEach(Console.WriteLine);

            Console.WriteLine("Press [ENTER] to exit.");
            Console.ReadLine();
  }
}

This code with "dirty" fix for assertion error. However, i can't reproduce it with onlinetool :) Original regex commented in this example.

I hope, this will help you. But i must say again if you working with csv - it is better to use special tools, not regex :)

查看更多
我只想做你的唯一
3楼-- · 2020-05-07 09:30

Use negative lookbehind, positive lookbehind, character class with quanitifer, positive lookahead, and negative lookahead.

Working Demo

using System;
using System.Linq;
using System.Text.RegularExpressions;

class MainClass {
  public static void Main (string[] args) {
        const string rawLine = "\"TeamName\",\"PlayerName\",\"Position\"  \"Chargers\",\"Philip Rivers\",\"QB\"  \"Colts\",\"Peyton Manning\",\"QB\"  \"Patriots\",\"Tom Brady\",\"QB\"";
            var parsedLines = Regex.Split(rawLine, "(?<![,])(?<=[\"])[ ]{2}(?=[\"])(?![,])");
            parsedLines.ToList().ForEach(Console.WriteLine);

            Console.WriteLine("Press [ENTER] to exit.");
            Console.ReadLine();
  }
}
查看更多
霸刀☆藐视天下
4楼-- · 2020-05-07 09:36

Good comments through-out the thread (I would strongly suggest pursuing one of those options), I wont focus on them. Here's an alternative solution that uses Matches from the Regex pattern, skip how many fields you have (columns) and then take how many records you want.

I'm using a pattern like (\"(.*?)[^,]") and explanation can be found here of what it means.

const string rawLine = "\"TeamName\",\"PlayerName\",\"Position\"  \"Chargers\",\"Philip Rivers\",\"QB\"  \"Colts\",\"Peyton Manning\",\"QB\"  \"Patriots\",\"Tom Brady\",\"QB\"";                       
var matches = new Regex(@"(\""(.*?)[^,]"")").Matches(rawLine).Cast<Match>().ToList();
// loop through our matches
for(int i = 0; i < matches.Count; i++)
{                
    // join our records we need to output
    string str = string.Join(",", matches.Skip(i * 3).Take(3));
    if(!string.IsNullOrEmpty(str))
         Console.WriteLine(str);
}            
Console.WriteLine("Press [ENTER] to exit.");
Console.ReadLine();

Please note, there's no error checking at all, can be improved, but does produces the output you need. *Also make sure you import System.Linq if not already there.

Output Test

enter image description here

查看更多
登录 后发表回答