可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I know how to get substrings from a string which are coma seperated but here's a complication: what if substring contains a coma.
If a substring contains a coma, new line or double quotes the entire substring is encapsulated with double quotes.
If a substring contains a double quote the double quote is escaped with another double quote.
Worst case scenario would be if I have something like this:
first,"second, second","""third"" third","""fourth"", fourth"
In this case substrings are:
- first
- second, second
- "third" third
- "fourth", fourth
second, second is encapsulated with double quotes, I don't want those double quotes in a list/array.
"third" third is encapsulated with double quotes because it contains double quotes and those are escaped with aditional double quotes. Again I don't want the encapsulating double quotes in a list/array and i don't want the double quotes that escape double quotes, but I want original double quotes which are a part of the substring.
回答1:
One way using TextFieldParser
:
using (var reader = new StringReader("first,\"second, second\",\"\"\"third\"\" third\",\"\"\"fourth\"\", fourth\""))
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader))
{
parser.Delimiters = new[] { "," };
parser.HasFieldsEnclosedInQuotes = true;
while (!parser.EndOfData)
{
foreach (var field in parser.ReadFields())
Console.WriteLine(field);
}
}
For
first
second, second
"third" third
"fourth", fourth
回答2:
Try this
string input = "first,\"second, second\",\"\"\"third\"\" third\",\"\"\"fourth\"\", fourth\"";
string[] output = input.Split(new string[] {"\",\""}, StringSplitOptions.RemoveEmptyEntries);
回答3:
I would suggest you to construct a small state machine for this problem. You would have states like:
- Out - before the first field is reached
- InQuoted - you were Out and " arrived; now you're in and the field is quoted
- InQuotedMaybeOut - you were InQuoted and " arrived; now you wait for the next character to figure whether it is another " or something else; if else, then select the next valid state (character could be space, new line, comma, so you decide the next state); otherwise, if " arrived, you push " to the output and step back to InQuoted
- In - after Out, when any character has arrived except , and ", you are automatically inside a new field which is not quoted.
This will certainly read CSV correctly. You can also make the separator configurable, so that you support TSV or semicolon-separated format.
Also keep in mind one very important case in CSV format: Quoted field may contain new line! Another special case to keep an eye on: empty field (like: ,,).
回答4:
This is not the most elegant solution but it might help you. I would loop through the characters and do an odd-even count of the quotes. For example you have a bool that is true if you have encountered an odd number of quotes and false for an even number of quotes.
Any comma encountered while this bool value is true should not be considered as a separator. If you know it is a separator you can do several things with that information. Below I replaced the delimiter with something more manageable (not very efficient though):
bool odd = false;
char replacementDelimiter = "|"; // Or some very unlikely character
for(int i = 0; i < str.len; ++i)
{
if(str[i] == '\"')
odd = !odd;
else if (str[i] == ',')
{
if(!odd)
str[i] = replacementDelimiter;
}
}
string[] commaSeparatedTokens = str.Split(replacementDelimiter);
At this point you should have an array of strings that are separated on the commas that you have intended. From here on it will be simpler to handle the quotes.
I hope this can help you.
回答5:
Mini parser
using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApp
{
class Program
{
private static IEnumerable<string> Parse(string input)
{
if (string.IsNullOrWhiteSpace(input))
{
// empty string => nothing to do
yield break;
}
int count = input.Length;
StringBuilder sb = new StringBuilder();
int j;
for (int i = 0; i < count; i++)
{
char c = input[i];
if (c == ',')
{
yield return sb.ToString();
sb.Clear();
}
else if (c == '"')
{
// begin quoted string
sb.Clear();
for (j = i + 1; j < count; j++)
{
if (input[j] == '"')
{
// quote
if (j < count - 1 && input[j + 1] == '"')
{
// double quote
sb.Append('"');
j++;
}
else
{
break;
}
}
else
{
sb.Append(input[j]);
}
}
yield return sb.ToString();
// clear buffer and skip to next comma
sb.Clear();
for (i = j + 1; i < count && input[i] != ','; i++) ;
}
else
{
sb.Append(c);
}
}
}
[STAThread]
static void Main(string[] args)
{
foreach (string str in Parse("first,\"second, second\",\"\"\"third\"\" third\",\"\"\"fourth\"\", fourth\""))
{
Console.WriteLine(str);
}
Console.WriteLine();
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
}
Result
- first
- second, second
- "third" third
- "fourth", fourth
回答6:
Thank you for your answers, but before I got to see them I wrote this solution, it's not pretty but it works for me.
string line = "first,\"second, second\",\"\"\"third\"\" third\",\"\"\"fourth\"\", fourth\"";
var substringArray = new List<string>();
string substring = null;
var doubleQuotesCount = 0;
for (var i = 0; i < line.Length; i++)
{
if (line[i] == ',' && (doubleQuotesCount % 2) == 0)
{
substringArray.Add(substring);
substring = null;
doubleQuotesCount = 0;
continue;
}
else
{
if (line[i] == '"')
doubleQuotesCount++;
substring += line[i];
//If it is a last character
if (i == line.Length - 1)
{
substringArray.Add(substring);
substring = null;
doubleQuotesCount = 0;
}
}
}
for(var i = 0; i < substringArray.Count; i++)
{
if (substringArray[i] != null)
{
//remove first double quote
if (substringArray[i][0] == '"')
{
substringArray[i] = substringArray[i].Substring(1);
}
//remove last double quote
if (substringArray[i][substringArray[i].Length - 1] == '"')
{
substringArray[i] = substringArray[i].Remove(substringArray[i].Length - 1);
}
//Replace double double quotes with single double quote
substringArray[i] = substringArray[i].Replace("\"\"", "\"");
}
}