C# isolate elements of an array in a string with e

2019-08-09 07:41发布

问题:

I've several arrays such as:

string[] sArTrigFunctions = {"sin", "cos", "tan", "sinh", "cosh", "tanh", "cot", "sec", "csc", "arcsin", "arccos", "arctan", "coth", "sech", "csch"};
string[] sArGreek = { "alpha", "beta", "chi", "delta", "Delta", "epsi", "varepsilon", "eta", "gamma", "Gamma", "iota", "kappa", "lambda", "Lambda", "lamda", "Lamda", "mu", "nu", "omega", "Omega", "phi", "varphi", "Phi", "pi", "Pi", "psi", "Psi", "rho", "sigma", "Sigma", "tau", "theta", "vartheta", "Theta", "upsilon", "xi", "Xi", "zeta" };
string sArBinOp = {"lt","gt","eq","neq",.....}; etc.

These array elements are used in a text file where these are mixed with each other or with other content of the file. For example: sintheta, altc. I want to escape these array elements in the file with \ so sintheta becomes \sin\theta and altc becomes a\ltc. A simple string.replace(...) does not work. For example if I run the following foreach loop on sArTrigFunctions array and then on sArGreek array, it will replace sintheta in the file to \sinth\eta. If I rearrange the order of sArGreek elements in descending order by length of elements so theta comes before eta, then the following code will first change sintheta to \sin\theta and then to \sin\th\eta. Likewise, running the following code on sArBinOp array will replace sindelta to sinde\lta or if we first run the following code on sArGreek and then on sArGreek the sindelta gets changed to \sin\de\lta:

foreach (string s in sArGreek)
{
    strfileContent = strfileContent.Replace(s, "\\" + s);
}

Question: How can we programmatically make it so that during the replace process if an array element is inside another array element of any array don't escape it with \. For example don't escape eta in sintheta but do so in sineta. Likewise, don't escape lt in sindelta but do so in altc Note: The array elements in the file are not not necessarily separated by a space, i.e. sintheta is not written as sin theta otherwise we could use C# Regex Word Boundary to achieve this using the code like the following, for example:

foreach (string s in sArGreek)
{
    strfileContent = Regex.Replace(strfileContent, "\\b" + s + "\\b", "\\" + s + " ");
}

回答1:

You can do this with a regular expression replace.

First you need to construct your Regex from the input arrays. The structure of the expression is:

term1|term2|term3|t4|t5

Meaning, all the terms in a single string, separated by "|" (regex OR), sorted by descending term length. This is important since we want to capture longer terms when possible, and fallback to shorter terms when needed.

To do that, a little LINQ query comes handy:

Regex re = new Regex(String.Join("|", (
    from s in sArTrigFunctions.Union(sArGreek).Union(sArBinOp)
    orderby s.Length descending
    select s).ToArray()));

We're creating a single enumerable from all our arrays, then sorting by length, and joining to a single string. This is used to create a Regex object.

Then it's a simple replace:

re.Replace("sintheta altc", "\\$&");

"\\$&" means replace the entire match (single term at a time) with itself prefixed with a backslash.

Here's a fiddle