I am having some very strange issues with OLE DB connecting to Excel workbooks.
Our system has a large Excel macro enabled template (we have Excel 2010 and Excel 2016). Occasionally a user will add pictures, charts, tabs, etc that give the error External table is not in the expected format when attempting to read a hidden tab in the workbook that the users do not access.
Normally, we have the user download a new template and redo the work without adding pictures.
Recently, there was many occurrences of this behavior and I attempted to investigate on my development computer. I found that I could not open a connection to the "corrupted" file via the website during the file upload process where the file is read into a byte stream and a temporary .xlsm file is created and opened using an OleDbConnection and data is read from a hidden tab.
Nothing I did to the file would open it to be opened via the code hosted in IIS. This includes the following:
- Remove all pictures
- Remove all tabs but the hidden tab
- Unhide the tab we read from
- save the workbook as an xlsx to remove macros
- save the workbook as a 2003 - 2007 workbook and then save back to either an xlsx or xlsm
The website is running under .NET Framework 4.0 and is running under IIS.
For my investigation, I wrote the following code and hosted it on the same development computer in a page in a test web application hosted in IIS Express under .NET Framework 4.0 and it successfully opened and read the data from the original "corrupted" file.
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.OleDb;
using System.Data.SqlClient;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
public partial class ReadExcelTabToDataSet : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
string szSheetName = @"C:\Temp\Test.xlsm";
string szConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties=\"Excel 12.0;HDR=YES\";Data Source=" + szSheetName;
string szQuery = "Select 'Configuration$' as Sheet, * From [Configuration$B1:S2]";
string szExcelTableName = "ValidateFlag";
DataSet ds;
using (OleDbConnection conn = new OleDbConnection(szConnection))
{
using (OleDbDataAdapter da = new OleDbDataAdapter(szQuery, conn))
{
conn.Open();
ds = new DataSet();
da.Fill(ds, szExcelTableName);
}
}
}
}
This raises lots of warning bells and has stumped me. This test seems to rule out everything but how OleDb performs when hosted in IIS. When this page is copied to our site, it fails on the conn.Open().
Does anyone understand why this is happening and how to fix it? I do not want to penalize our users for strange Microsoft issues like this.
Thanks,
EDIT 1
If the file is marked "read-only" and it is in a directory that has full access, then the file can be opened and the data may be read successfully.
This still presents an issue since we are opening the file, checking information and finally, making changes on the tab.
I began investigating using ClosedXml (a .NET solution built around OpenXML) to get past many of the OleDb issues that I am encountering. When attempting to open the "corrupted" workbooks using ClosedXml, I received an error message that I was able to duplicate using the Microsoft Open XML SDK.
The cause of the issue is a form button that executes VBA code to copy data within the template from 1 tab to another. The text in the form button contained a carriage return (i.e. a br). When the size of the template gets large and the user saves their work, Excel corrupts the HTML by not terminating the br. While the ACE OleDB command provides no indication, the Open XML SDK provides the following message:
Cannot open the file: Part /xl/drawings/vmlDrawing4.vml: The 'br' start tag on line 19 position 29 does not match the end tag of 'font'. Line 20, position 9.
If the extension of the template is renamed from .xlsm to .zip, the actual file can be inspected and the cause can be fixed. In this case, I had to remove the break between words on the button.
It does concern me that Excel becomes unstable as the file size is increased and does not save workbooks correctly at that point, but I am able to work around this one case.
I have experienced the same error message when using ADO connection from one Excel workbook (A) to another Excel Workbook (B). Workbook A is open by a user and workbook B is closed, but connected to by ADO in Read/Write mode.
The "External table is not in the expected format" error occurs when workbook B is updated and saved by ADO but when the disk is full.
Disk Full errors should not be common, and they are fairly simply overcome if the workbook is open by a user because the user is warned and can save the workbook to another area. However, when open through an ADO connection there is no warning and thus the workbook (B in this case) does not save correctly and becomes corrupted - at least this is what I have concluded.
When workbook B is subsequently opened by a user there is a warning that the workbook is corrupted. After attempted recovery the workbook appears empty. However, closing workbook B and then running a SQL query against it (ADO connection in Read Only mode) does sometimes return data (depending upon the extent of the corruption), but the data is incomplete.
I post this in the hope it may be helpful because it's the same error and it causes corruption, albeit by a different cause to Lee Z's issue. Alas, it's not a cure, but I hope it's informative.
"External table is not in the expected format" Is an generic error which because of (sadly) far to many reason, In my case it was because I didn't decrypt the file properly.
My best bet would be to check your program, test it with a brand new file, and locate the error that way.
Best of luck!