Using C# Regular expression to replace XML element

I'm writing some code that handles logging xml data and I would like to be able to replace the content of certain elements (eg passwords) in the document. I'd rather not serialize and parse the document as my code will be handling a variety of schemas.

Sample input documents:

doc #1:

   <user>
       <userid>jsmith</userid>
       <password>myPword</password>
    </user>

doc #2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>myPword</ns:password>
 </secinfo>

What I'd like my output to be:

output doc #1:

<user>
       <userid>jsmith</userid>
       <password>XXXXX</password>
 </user>

output doc #2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>XXXXX</ns:password>
 </secinfo>

Since the documents I'll be processing could have a variety of schemas, I was hoping to come up with a nice generic regular expression solution that could find elements with password in them and mask the content accordingly.

Can I solve this using regular expressions and C# or is there a more efficient way?

标签： c# .net xml regex parsing

7条回答

孤傲高冷的网名

2楼-- · 2019-03-31 00:06

Regex is the wrong approach for this, I've seen it go so badly wrong when you least expect it.

XDocument is way more fun anyway:

XDocument doc = XDocument.Parse(@"
            <user>
                <userid>jsmith</userid>
                <password>password</password>
            </user>");

doc.Element("user").Element("password").Value = "XXXX";

// Temp namespace just for the purposes of the example -
XDocument doc2 = XDocument.Parse(@"
            <secinfo xmlns:ns='http://tempuru.org/users'>
                <ns:userid>jsmith</ns:userid>
                <ns:password>password</ns:password>
            </secinfo>");

doc2.Element("secinfo").Element("{http://tempuru.org/users}password").Value = "XXXXX";

0人赞添加讨论(0) 举报

一夜七次

3楼-- · 2019-03-31 00:06

The main reason that XSLT exist is to be able to transform XML-structures, this means that an XSLT is a type of stylesheet that can be used to alter the order of elements och change content of elements. Therefore this is a typical situation where it´s highly recommended to use XSLT instead of parsing as Andrew Hare said in a previous post.

0人赞添加讨论(0) 举报

【Aperson】

4楼-- · 2019-03-31 00:07

This problem is best solved with XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="//password">
        <xsl:copy>
            <xsl:text>XXXXX</xsl:text>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

This will work for both inputs as long as you handle the namespaces properly.

Edit : Clarification of what I mean by "handle namespaces properly"

Make sure your source document that has the ns name prefix has as namespace defined for the document like so:

<?xml version="1.0" encoding="utf-8"?>
<secinfo xmlns:ns="urn:foo">
    <ns:username>jsmith</ns:username>
    <ns:password>XXXXX</ns:password>
</secinfo>

0人赞添加讨论(0) 举报

forever°为你锁心

5楼-- · 2019-03-31 00:08

From experience with systems that try to parse and/or modify XML without proper parsers, let me say: DON'T DO IT. Use an XML parser (There are other answers here that have ways to do that quickly and easily).

Using non-xml methods to parse and/or modify an XML stream will ALWAYS lead you to pain at some point in the future. I know, because I have felt that pain.

I know that it seems like it would be quicker-at-runtime/simpler-to-code/easier-to-understand/whatever if you use the regex solution. But you're just going to make someone's life miserable later.

0人赞添加讨论(0) 举报

ら.Afraid

6楼-- · 2019-03-31 00:14

You can use regular expressions if you know enough about what you are trying to match. For example if you are looking for any tag that has the word "password" in it with no inner tags this regex expression would work:

(<([^>]*?password[^>]*?)>)([^<]*?)(<\/\2>)

You could use the same C# replace statement in zowat's answer as well but for the replace string you would want to use "$1XXXXX$4" instead.

0人赞添加讨论(0) 举报

forever°为你锁心

7楼-- · 2019-03-31 00:15

I'd say you're better off parsing the content with a .NET XmlDocument object and finding password elements using XPath, then changing their innerXML properties. It has the advantage of being more correct (since XML isn't regular in the first place), and it's conceptually easy to understand.

0人赞添加讨论(0) 举报

1 2 下一页

Using C# Regular expression to replace XML element

Edit : Clarification of what I mean by "handle namespaces properly"

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间