I am trying to remove all extra blank rows and columns from an excel file using Interop Library.
I followed this question Fastest method to remove Empty rows and Columns From Excel Files using Interop and i find it helpful.
But i have excel files that contains a small set of data but a lot of empty rows and columns (from the last non empty row (or column) to the end of the worksheet)
I tried looping over Rows and Columns but the loop is taking hours.
I am trying to get the last non-empty row and column index so i can delete the whole empty range in one line
XlWks.Range("...").EntireRow.Delete(xlShiftUp)
Note: i am trying to get the last row containing data to remove all extra blanks (after this row , or column)
Any suggestions?
Several years ago I created a MSDN code sample that permits a developer to get the last used row and column from a worksheet. I modified it, placed all needed code into a class library with a windows form front end to demo the operation.
Underlying code uses Microsoft.Office.Interop.Excel.
Location on Microsoft one drive https://1drv.ms/u/s!AtGAgKKpqdWjiEGdBzWDCSCZAMaM
Here I get the first sheet in an Excel file, get the last used row and col and present as a valid cell address.
Within the demo project I also get all sheets for an excel file, present them in a ListBox. Select a sheet name from the list box and get that sheet's last row and column in a valid cell address.
Upon first glance when opening the solution from the link above you will note there is a lot of code. The code is optimal and will release all objects immediately.
I'm using ClosedXml which has useful 'LastUsedRow' and 'LastUsedColumn' methods.
This simple loop deleted 5000 out of 10000 rows in 38 seconds. Not fast, but a lot better than 'hours'. That depends on how many rows/columns you're dealing with of course which you don't say. However, after further tests with 25000 empty rows out of 50000 it does take about 30 minutes to delete the empty rows in a loop. Clearly deleting rows isn't an efficient process.
A better solution is to create a new sheet and then copy the rows you want to keep.
Step 1 - create sheet with 50000 rows and 20 columns, every other row and column is empty.
Step 2 - copy the rows with data to a new sheet. This takes 10 seconds.
Step 3 - this would be to do the same operation for the columns.
Let's say the last corner cell with data is J16 - so no data in columns K onwards, or in rows 17 downwards. Why are you actually deleting them? What is the scenario and what are you trying to achieve? Is it clearing our formatting? Is is clearing our formulas which show an empty string?
In any case, looping is not the way.
The code below shows a way to use the Clear() method of Range object to clear all contents and formulas and formatting from a range. Alternatively if you really want to delete them, you can use the Delete() method to delete a whole rectangular Range in one hit. Will be much faster than looping...
You should be able to find the last non-empty row and column with something similar to this:
That's VB.NET, but it should more or less work. That will return Row 16 and Column 10 (based on your picture above). Then you can use that to find the range you want to delete all in one line.
Seems that your problem has been resolved by Microsoft. Take a look at Range.CurrentRegion Property, which returns a range bounded by any combination of blank rows and blank columns. There's one inconvenience: this property cannot be used on a protected worksheet.
For further details, please see: How to Find Current Region, Used Range, Last Row and Last Column in Excel with VBA Macro
Some of SO members have mentioned about UsedRange property, which might be useful too, but the differ to
CurrentRegion
is thatUsedRange
returns a range includes any cell that has ever been used.So, if you would like to get a
LAST(row)
andLAST(column)
occupied by data, you have to use End property withXlDirection
:xlToLeft
and/orxlUp
.Note #1:
If your data are in a tabular format, you can simply find last cell, by using:
Note #2:
If your data aren't in a tabular format, you need to loop through the collection of rows and columns to find last non-blank cell.
Good luck!
Update 1
If your goal is to import the excel data using c#, assuming that you have identified the the highest used index in your worksheet (in the image you posted it is Col = 10 , Row = 16), you can convert the maximum used indexes to letter so it will be
J16
and select only the used range using andOLEDBCommand
Else, i don't think it is easy to find a faster method.
You can refer to these article to convert indexes into alphabet and to connect to excel using OLEDB:
Initial Answer
As you said you started from the following question:
And you are trying to "get the last row containing data to remove all extra blanks (after this row , or column)"
So assuming that you are working with the accept answer (provided by @JohnG), so you can add some line of code to get the last used row and column
Empty Rows are stored in a list of integer
rowsToDelete
You can use the following code to get the last non empty rows with an index smaller than the last empty row
And if
NonEmptyRows.Max() < rowsToDelete.Max()
the last non-empty row isNonEmptyRows.Max()
Else it isworksheet.Rows.Count
and there is no empty rows after the last used one.The same thing can be done to get the last non empty column
The code is Edited in
DeleteCols
andDeleteRows
functions: