I'm reading clipboard data coming from excel using
var stream = (System.IO.Stream) ( Forms.Clipboard.GetDataObject() ).GetData( Forms.DataFormats.CommaSeparatedValue );
,
but unfortunately, excel is passing cell text instead of cell values. When the cells are using special formatting (such as the thousands seperator), the clipboard data for a series of cells in columns that looks like this:
1,234,123.00 2,345.00 342.00 12,345.00
is stored as this:
\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"
when what I really want is this:
1234123.00, 2345.00, 342.00, 12345.00
I had been previously using the clipData.Split(new string[] { "," }, StringSllitOptions.None))
function to turn my CSV clipboard data into a series of cells, but this fails when there is escaped formatted text containing commas.
I'm asking if anyone can think of a way to split this string into a set of cells, ignoring the commas escaped within the \"
bits, since this is how Excel is choosing to escape cells containing commas.
In short, how can I turn a single string containing this:
\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"
into an array of strings containing this:
{ "1,234,123.00", "2,345.00", "342.00", "12,345.00" }
Without ruining my ability to parse a simple comma delimited string.
*****edit***
Follow up question (formulated as a DFA) here: Split a string based on each time a Deterministic Finite Automata reaches a final state?
You could try to use a bit of LINQ:
Alternatively, if you don't like this suggestion, try to implement a similar pattern with RegEx.
From your input example, we can see that there are three "unwanted" sequences of characters:
So, add all these sequences to the input array for the
Split
method:This will give you an array containing a few empty elements. If that is a problem, use
StringSplitOptions.RemoveEmptyEntries
instead ofStringSplitOptions.None
:First off I've dealt with data from Excel before and what you typically see is comma separated values and if the value is considered to be a string it will have double quotes around it (and can contain commas and double quotes). If it is considered to be numeric then there are not double quotes. Additionally if the data contains a double quote that will be delimited by a double quote like
""
. So assuming all of that here's how I've dealt with this in the pastOf course this assumes the data coming in is valid so if you have something like
"fo,o"b,ar","bar""foo"
this will not work. Additionally if your data contains"
then it will be turned into a " which may or may not be desirable.I agree with Kyle regarding your string probably not being consistent.
Instead of Kyle's first step you could use
There are a lot of ways to do this. One inelegant way that would work is:
Now you have what you wanted in first place