I read Excel files using OpenXml. all work fine but if the spreadsheet contains one cell that has an address mail and after it a space and another word, such as:
abc@abc.com abc
It throws an exception immediately at the opening of the spreadsheet:
var _doc = SpreadsheetDocument.Open(_filePath, false);
exception:
DocumentFormat.OpenXml.Packaging.OpenXmlPackageException
Additional information:
Invalid Hyperlink: Malformed URI is embedded as a hyperlink in the document.
Unfortunately solution where you have to open file as zip and replace broken hyperlink would not help me.
I just was wondering how it is posible that it works fine when your target framework is 4.0 even if your only installed .Net Framework has version 4.7.2. I have found out that there is private static field inside
System.UriParser
that selects version of URI's RFC specification. So it is possible to set it to V2 as it is set for .net 4.0 and lower versions of .Net Framework. Only problem that it isprivate static readonly
.Maybe someone will want to set it globally for whole application. But I wrote
UriQuirksVersionPatcher
that will update this version and restore it back in Dispose method. It is obviously not thread-safe but it is acceptable for my purpose.Usage:
P.S. Later I found that someone already implemented this pathcher: https://github.com/google/google-api-dotnet-client/blob/master/Src/Support/Google.Apis.Core/Util/UriPatcher.cs
There is an open issue on the OpenXml forum related to this problem: Malformed Hyperlink causes exception
In the post they talk about encountering this issue with a malformed "mailto:" hyperlink within a Word document.
They propose a work-around here: Workaround for malformed hyperlink exception
The workaround is essentially a small console application which locates the invalid URL and replaces it with a hard-coded value; here is the code snippet from their sample that does the replacement; you could augment this code to attempt to correct the passed brokenUri:
The problem I had was actually with an Excel document (like you) and it had to do with a malformed http URL; I was pleasantly surprised to find that their code worked just fine with my Excel file.
Here is the entire work-around source code, just in case one of these links goes away in the future:
I haven't use OpenXml but if there's no specific reason for using it then I highly recommend LinqToExcel from LinqToExcel. Example of code is here: