Using C# Regular expression to replace XML element

2019-03-30 23:25发布

I'm writing some code that handles logging xml data and I would like to be able to replace the content of certain elements (eg passwords) in the document. I'd rather not serialize and parse the document as my code will be handling a variety of schemas.

Sample input documents:

doc #1:

   <user>
       <userid>jsmith</userid>
       <password>myPword</password>
    </user>

doc #2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>myPword</ns:password>
 </secinfo>

What I'd like my output to be:

output doc #1:

<user>
       <userid>jsmith</userid>
       <password>XXXXX</password>
 </user>

output doc #2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>XXXXX</ns:password>
 </secinfo>

Since the documents I'll be processing could have a variety of schemas, I was hoping to come up with a nice generic regular expression solution that could find elements with password in them and mask the content accordingly.

Can I solve this using regular expressions and C# or is there a more efficient way?

7条回答
孤傲高冷的网名
2楼-- · 2019-03-31 00:06

Regex is the wrong approach for this, I've seen it go so badly wrong when you least expect it.

XDocument is way more fun anyway:

XDocument doc = XDocument.Parse(@"
            <user>
                <userid>jsmith</userid>
                <password>password</password>
            </user>");

doc.Element("user").Element("password").Value = "XXXX";

// Temp namespace just for the purposes of the example -
XDocument doc2 = XDocument.Parse(@"
            <secinfo xmlns:ns='http://tempuru.org/users'>
                <ns:userid>jsmith</ns:userid>
                <ns:password>password</ns:password>
            </secinfo>");

doc2.Element("secinfo").Element("{http://tempuru.org/users}password").Value = "XXXXX";
查看更多
一夜七次
3楼-- · 2019-03-31 00:06

The main reason that XSLT exist is to be able to transform XML-structures, this means that an XSLT is a type of stylesheet that can be used to alter the order of elements och change content of elements. Therefore this is a typical situation where it´s highly recommended to use XSLT instead of parsing as Andrew Hare said in a previous post.

查看更多
【Aperson】
4楼-- · 2019-03-31 00:07

This problem is best solved with XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="//password">
        <xsl:copy>
            <xsl:text>XXXXX</xsl:text>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

This will work for both inputs as long as you handle the namespaces properly.

Edit : Clarification of what I mean by "handle namespaces properly"

Make sure your source document that has the ns name prefix has as namespace defined for the document like so:

<?xml version="1.0" encoding="utf-8"?>
<secinfo xmlns:ns="urn:foo">
    <ns:username>jsmith</ns:username>
    <ns:password>XXXXX</ns:password>
</secinfo>
查看更多
forever°为你锁心
5楼-- · 2019-03-31 00:08

From experience with systems that try to parse and/or modify XML without proper parsers, let me say: DON'T DO IT. Use an XML parser (There are other answers here that have ways to do that quickly and easily).

Using non-xml methods to parse and/or modify an XML stream will ALWAYS lead you to pain at some point in the future. I know, because I have felt that pain.

I know that it seems like it would be quicker-at-runtime/simpler-to-code/easier-to-understand/whatever if you use the regex solution. But you're just going to make someone's life miserable later.

查看更多
ら.Afraid
6楼-- · 2019-03-31 00:14

You can use regular expressions if you know enough about what you are trying to match. For example if you are looking for any tag that has the word "password" in it with no inner tags this regex expression would work:

(<([^>]*?password[^>]*?)>)([^<]*?)(<\/\2>)

You could use the same C# replace statement in zowat's answer as well but for the replace string you would want to use "$1XXXXX$4" instead.

查看更多
forever°为你锁心
7楼-- · 2019-03-31 00:15

I'd say you're better off parsing the content with a .NET XmlDocument object and finding password elements using XPath, then changing their innerXML properties. It has the advantage of being more correct (since XML isn't regular in the first place), and it's conceptually easy to understand.

查看更多
登录 后发表回答