XML Clean up (remove invalid characters from attri

2019-07-23 16:03发布

问题:

How can i remove none valid chars from xml but keep standard for example i want remove all < and " from attribute value inner strings

<log>
  <data id="1" name="No Error"  value="0" />
  <data id="2" name="Error "1" between text" value="0" />
  <data id="3" name="Error <2> between text"  value="0"  />
</log>

How can i daynamicly remove quotes surrounds "1" and <> surrounds 2

that final out put shuld be

<log>
  <data id="1" name="No Error"  value="0"  />
  <data id="2" name="Error 1 between text" value="0" />
  <data id="3" name="Error 2 between text"  value="0"  />
</log>

Thanks for the suppot

I was thinking of the following solution:

  1. Read the file as text
  2. Modify any string that starts with <Name=> and ends with <value=>
  3. remove all ",<,>
  4. add " after <name=> and add " before <value=>

if this is correct, how can i do this with C#, the replace method will not work.

Thanks

回答1:

for your information I found 2 different ways,

1-

public static void ReplaceInvalidCharFromAttribute(string filePath, string startElement, string nextElement, string[] removeStrings)
        {
            string tempFile = Path.GetTempFileName();

            using (var sr = new StreamReader(filePath))
            {
                using (var sw = new StreamWriter(tempFile))
                {
                    string line;
                    string temp;
                    while ((line = sr.ReadLine()) != null)
                    {
                        temp = RemoveInvalidCharFromAttribute(line, startElement, nextElement, removeStrings);
                        sw.WriteLine(temp??line);
                    }
                }
            }

            File.Delete(filePath);
            File.Move(tempFile, filePath);
        }



public static string RemoveInvalidCharFromAttribute(string input, string startElement, string nextElement, string[] invalidChars)
        {
            if (input.IndexOf(startElement) < 0 || input.IndexOf(nextElement) < 0) return null;
            int start =input.IndexOf(startElement) + startElement.Length;
            int end = input.IndexOf(nextElement);
            StringBuilder res = new StringBuilder(input.Substring(start, (end - start)));
            StringBuilder resCopy = new StringBuilder(res.ToString());

            foreach (string inv in invalidChars)
                res.Replace(inv, "");

            // return the result after surrounding the text with double 
            return
                input.Replace(
                resCopy.ToString(),
                (String.Concat("\"", String.Concat(res.ToString().Trim(), "\" "))));
        }

2- http://support.microsoft.com/kb/316063

so for so good, Thanks



回答2:

in PHP I use the following to encode the data, before it goes into the XML:-

function xml_encode($string)
{
    $string=preg_replace("/&/", "&amp;", $string);
    $string=preg_replace("/</", "&lt;", $string);
    $string=preg_replace("/>/", "&gt;", $string);
    $string=preg_replace("/\"/", "&quot;", $string);
    $string=preg_replace("/%/", "&#37;", $string);

    return utf8_encode($string);
}

It will look like you suggest in a browser, until you actually look at the source.

At this point you would need to check for "& amp;" and hex/octal codes.

Hope that helps a little.