I have string like this
"<root><text>My test is > & < </text></root>"
Actually this is correct xml, except &, <, > symbols.
I need to convert it to <root><text>My test is > & < </text></root>
before I convert it with XElement.Parse(str);
How to make this conversion?
This is nigh-on impossible to achieve reliably. You should correct this issue at source. If you control the system that is inserting the "My test is > & < " string, then you should escape this string before inserting it. HttpUtility.HtmlEncode
is a reasonable way of doing that.
XElement will automatically escape the text if you use new XElement
rather than XElement.Parse()
:
LINQPad snippet:
var str = "<root><text>My test is > & < </text></root>";
var element = new XElement("element", str);
element.Dump();
output:
<element><root><text>My test is > & < </text></root></element>
edit: I've jsut re-read the question and realised that this doesn't produce the desired output.
The problem you have is that your incoming XML string is fundamentally invalid. If you can control the source then you should fix it there. If not, there's no easy way of fixing it.
Don't replace the variables with user text (this is XML injection - buggy, unsafe). Replace them with escaped text. Here is an XML escape function: http://msdn.microsoft.com/en-us/library/system.security.securityelement.escape%28VS.80%29.aspx
This is just like you would do it with HTML too.
The idea of this being "XML except for xyz" perhaps needs examining more closely. To tackle this properly, you need to define a grammar for the language that you call "XML except for xyz", and then you need to write a parser that analyzes documents conforming to that grammar; the output of this parser can be an XML representation of the input. This is all quite doable. Not easy, but doable. Of course, the benefit of using a standard like XML is that you can get a parser off-the-shelf, whereas if you invent your own grammar then you have to write your own parser.
Writing a good parser for your language is time-consuming, not least because of the extensive testing required. Writing a bad parser that's badly tested is probably quite easy, and this is what a lot of bad programmers would do. A good software engineer in this situation would recognize the benefits of conforming to standards.