need a document to extract text from image using o

2019-02-25 01:03发布

I need to do the simple Program whcih need to extract text from image using Onenote Interop? Could any one suggest me the appropriate document for my concept please?

1条回答
该账号已被封号
2楼-- · 2019-02-25 01:20

Text recognized by OneNote's OCR is stored in the one:OCRText element in the XML file structure in OneNote. e.g.

<one:Page ...>
    ...
    <one:Image ...>
        ...
        <one:OCRData lang="en-US">
            <one:OCRText><![CDATA[This is some sampletext]]></one:OCRText>
        </one:OCRData>
    </one:Image>
</one:Page>

You can see this XML using a program called OMSPY (it shows you the XML behind OneNote pages) - http://blogs.msdn.com/b/johnguin/archive/2011/07/28/onenote-spy-omspy-for-onenote-2010.aspx

To extract the text you would use the OneNote COM interop (as you pointed out). e.g.

//Instantialize OneNote
ApplicationClass onApp = new ApplicationClass();

//Get the XMl from the selected page
string xml = "";
onApp.GetPageContent("put the page id here", out xml);

//Put it into an XML document (from System.XML.Linq)
XDocument xDoc = XDocument.Parse(xml);

//OneNote's Namespace - for OneNote 2010
XNamespace one = "http://schemas.microsoft.com/office/onenote/2010/onenote";

//Get all the OCRText from the page
string[] OCRText = xDoc.Descendants(one + "OCRText").Select(x => x.Value).ToArray();

See the "Application Interface" docs on MSDN for more info - http://msdn.microsoft.com/en-us/library/gg649853.aspx

查看更多
登录 后发表回答