getting the current position from an XmlReader

2019-02-05 18:14发布

Is there a way to get the current position in the stream of the node under examination by the XmlReader?

I'd like to use the XmlReader to parse a document and save the position of certain elements so that I can seek to them later.

Addendum:

I'm getting Xaml generated by a WPF control. The Xaml should not change frequently. There are placeholders in the Xaml where I need to replace items, sometimes looping. I thought it might be easier to do in code rather than a transform (I might be wrong about this). My idea was to parse it to a simple data structure of what needs to be replace and where it is, then use a StringBuilder to produce the final output by copying chunks from the xaml string.

5条回答
祖国的老花朵
2楼-- · 2019-02-05 18:40

As Jon Skeet says, XmlTextReader implements IXmlLineInfo but XmlTextReader was deprecated since .NET 2.0 and the question is about XmlReader only. I found this solution:

XmlReader xr = XmlReader.Create( // MSDN recommends to use Create() instead of ctor()
    new StringReader("<some><xml><string><data>"),
    someSettings // furthermore, can't set XmlSettings on XmlTextReader
);
IXmlLineInfo xli = (IXmlLineInfo)xr;

while (xr.Read())
{
    // ... some read actions ...

    // current position in StringReader can be accessed through
    int line = xli.LineNumber;
    int pos  = xli.LinePosition;
}

P.S. Tested for .NET Compact Framework 3.5, but should work for others too.

查看更多
唯我独甜
3楼-- · 2019-02-05 18:41

I have worked on a solution for this, and while it may not work in every scenario and uses reflection against private members of .NET Framework classes, I am able to calculate the correct position of the XmlReader with the extension method shown below.

Your XmlReader must be created from a StreamReader using an underlying FileStream (I haven't tried other Streams, and they may work as well so long as they report their position).

I've posted details here: http://g-m-a-c.blogspot.com/2013/11/determine-exact-position-of-xmlreader.html

public static class XmlReaderExtensions
{
    private const long DefaultStreamReaderBufferSize = 1024;

    public static long GetPosition(this XmlReader xr, StreamReader underlyingStreamReader)
    {
        // Get the position of the FileStream
        long fileStreamPos = underlyingStreamReader.BaseStream.Position;

        // Get current XmlReader state
        long xmlReaderBufferLength = GetXmlReaderBufferLength(xr);
        long xmlReaderBufferPos = GetXmlReaderBufferPosition(xr);

        // Get current StreamReader state
        long streamReaderBufferLength = GetStreamReaderBufferLength(underlyingStreamReader);
        int streamReaderBufferPos = GetStreamReaderBufferPos(underlyingStreamReader);
        long preambleSize = GetStreamReaderPreambleSize(underlyingStreamReader);

        // Calculate the actual file position
        long pos = fileStreamPos 
            - (streamReaderBufferLength == DefaultStreamReaderBufferSize ? DefaultStreamReaderBufferSize : 0) 
            - xmlReaderBufferLength 
            + xmlReaderBufferPos + streamReaderBufferPos - preambleSize;

        return pos;
    }

    #region Supporting methods

    private static PropertyInfo _xmlReaderBufferSizeProperty;

    private static long GetXmlReaderBufferLength(XmlReader xr)
    {
        if (_xmlReaderBufferSizeProperty == null)
        {
            _xmlReaderBufferSizeProperty = xr.GetType()
                                             .GetProperty("DtdParserProxy_ParsingBufferLength",
                                                          BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return (int) _xmlReaderBufferSizeProperty.GetValue(xr);
    }

    private static PropertyInfo _xmlReaderBufferPositionProperty;

    private static int GetXmlReaderBufferPosition(XmlReader xr)
    {
        if (_xmlReaderBufferPositionProperty == null)
        {
            _xmlReaderBufferPositionProperty = xr.GetType()
                                                 .GetProperty("DtdParserProxy_CurrentPosition",
                                                              BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return (int) _xmlReaderBufferPositionProperty.GetValue(xr);
    }

    private static PropertyInfo _streamReaderPreambleProperty;

    private static long GetStreamReaderPreambleSize(StreamReader sr)
    {
        if (_streamReaderPreambleProperty == null)
        {
            _streamReaderPreambleProperty = sr.GetType()
                                              .GetProperty("Preamble_Prop",
                                                           BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return ((byte[]) _streamReaderPreambleProperty.GetValue(sr)).Length;
    }

    private static PropertyInfo _streamReaderByteLenProperty;

    private static long GetStreamReaderBufferLength(StreamReader sr)
    {
        if (_streamReaderByteLenProperty == null)
        {
            _streamReaderByteLenProperty = sr.GetType()
                                             .GetProperty("ByteLen_Prop",
                                                          BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return (int) _streamReaderByteLenProperty.GetValue(sr);
    }

    private static PropertyInfo _streamReaderBufferPositionProperty;

    private static int GetStreamReaderBufferPos(StreamReader sr)
    {
        if (_streamReaderBufferPositionProperty == null)
        {
            _streamReaderBufferPositionProperty = sr.GetType()
                                                    .GetProperty("CharPos_Prop",
                                                                 BindingFlags.Instance | BindingFlags.NonPublic);
        }

        return (int) _streamReaderBufferPositionProperty.GetValue(sr);
    }

    #endregion
}
查看更多
来,给爷笑一个
4楼-- · 2019-02-05 18:42

I have the same problem and apparently there is no simple solution.

So I decided to manipulate two read-only FileStream : one for the XmlReader, the other to get the position of each line :

private void ReadXmlWithLineOffset()
{
    string malformedXml = "<test>\n<test2>\r   <test3><test4>\r\n<test5>Thi is\r\ra\ntest</test5></test4></test3></test2>";
    string fileName = "test.xml";
    File.WriteAllText(fileName, malformedXml);

    XmlTextReader xr = new XmlTextReader(new FileStream(fileName, FileMode.Open, FileAccess.Read));
    FileStream fs2 = new FileStream(fileName, FileMode.Open, FileAccess.Read);

    try
    {
        int currentLine = 1;
        while(xr.Read())
        {
            if (!string.IsNullOrEmpty(xr.Name))
            {
                for (;currentLine < xr.LineNumber; currentLine++)
                    ReadLine(fs2);
                Console.WriteLine("{0} : LineNum={1}, FileOffset={2}", xr.Name, xr.LineNumber, fs2.Position);
            }
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine("Exception : " + ex.Message);
    }
    finally
    {
        xr.Close();
        fs2.Dispose();
    }
}

private void ReadLine(FileStream fs)
{
    int b;
    while ((b = fs.ReadByte()) >= 0)
    {
        if (b == 10) // \n
            return;
        if (b == 13) // \r
        {
            if (fs.ReadByte() != 10) // if not \r\n, go back one byte
                fs.Seek(-1, SeekOrigin.Current);
            return;
        }
    }            
}

This is not the best way of doing this because it uses two readers. To avoid this, we could rewrite a new FileReader shared between the XmlReader and the line counter. But it simply gives you the offset of the line you're interested in. To get the exact offset of the tag, we should use LinePosition, but this can be tricky because of the Encoding.

查看更多
一纸荒年 Trace。
5楼-- · 2019-02-05 18:44

Just to head off one suggestion before it's made: you could keep a reference to the underlying stream you pass into XmlReader, and make a note of its position - but that will give you the wrong results, as the reader will almost certainly be buffering its input (i.e. it'll read the first 1024 characters or whatever - so your first node might "appear" to be at character 1024).

If you use XmlTextReader instead of just XmlReader, then that implements IXmlLineInfo, which means you can ask for the LineNumber and LinePosition at any time - is that good enough for you? (You should probably check HasLineInfo() first, admittedly.)

EDIT: I've just noticed that you want to be able to seek to that position later... in that case line information may not be terribly helpful. It's great for finding something in a text editor, but not so great for moving a file pointer. Could you give some more information about what you're trying to do? There may be a better way of approaching the problem.

查看更多
女痞
6楼-- · 2019-02-05 18:55

Thanks Geoff for the answer. It worked perfectly on windows 7. But somehow with .net 4 version on windows server 2003 of mscorlib.dll, i had to change following 2 functions to work.

private long GetStreamReaderBufferLength(StreamReader sr)
    {
        FieldInfo _streamReaderByteLenField = sr.GetType()
                                            .GetField("charLen",
                                                        BindingFlags.Instance | BindingFlags.NonPublic);

        var fValue = (int)_streamReaderByteLenField.GetValue(sr);

        return fValue;
    }

    private int GetStreamReaderBufferPos(StreamReader sr)
    {
        FieldInfo _streamReaderBufferPositionField = sr.GetType()
                                            .GetField("charPos",
                                                        BindingFlags.Instance | BindingFlags.NonPublic);
        int fvalue = (int)_streamReaderBufferPositionField.GetValue(sr);

        return fvalue;
    }

Also underlyingStreamReader in GetPosition method should be peek to advance the pointer.

private long GetPosition(XmlReader xr, StreamReader underlyingStreamReader)
    {
        long pos = -1;
        while (pos < 0)
        {
            // Get the position of the FileStream
             underlyingStreamReader.Peek();
            long fileStreamPos = underlyingStreamReader.BaseStream.Position;

            //            long fileStreamPos = GetStreamReaderBasePosition(underlyingStreamReader);
            // Get current XmlReader state
            long xmlReaderBufferLength = GetXmlReaderBufferLength(xr);
            long xmlReaderBufferPos = GetXmlReaderBufferPosition(xr);

            // Get current StreamReader state
            long streamReaderBufferLength = GetStreamReaderBufferLength(underlyingStreamReader);
            long streamReaderBufferPos = GetStreamReaderBufferPos(underlyingStreamReader);
            long preambleSize = GetStreamReaderPreambleSize(underlyingStreamReader);


            // Calculate the actual file position
            pos = fileStreamPos
                - (streamReaderBufferLength == DefaultStreamReaderBufferSize ? DefaultStreamReaderBufferSize : 0)
                - xmlReaderBufferLength
                + xmlReaderBufferPos + streamReaderBufferPos;// -preambleSize;
        }
        return pos;
    }
查看更多
登录 后发表回答