Summary
This question is somehow the follow-up to this question:
How to implement column self-naming from its index?
Having tested the code provided in this above-linked question's answers, I finally encountered a serious performance issue.
Performance issue
The performance issue occurs upon a Sheet initialization, that is, when I initialize the Sheet's Cells.
''' <summary>
''' Initialize an instance of the Company.Project.Sheet class.
''' </summary>
''' <param name="nativeSheet">The native worksheet from which to initialize.</param>
Friend Sub New(ByVal nativeSheet As Microsoft.Office.Interop.Excel.Worksheet)
_nativeSheet = nativeSheet
Dim cells As IDictionary(Of String, ICell) = New Dictionary(Of String, ICell)()
'These iterations hurt the performance of the API...'
For rowIndex As Integer = 1 To _nativeSheet.Rows.Count Step 1
For colIndex As Integer = 1 To _nativeSheet.Columns.Count Step 1
Dim c As ICell = New Cell(_nativeSheet.Cells(rowIndex, colIndex))
cellules.Add(c.Name, c)
Next
Next
_cellules = New ReadOnlyDictionary(Of String, ICell)(cells)
End Sub
- ReadOnlyDictionary(Of TKey, TValue) :
A custom read-only dictionary that simply wraps a IDictionary(Of TKey, TValue) to prevent modifications.
Discussion
I'm working this way since each cell in an underlying spreadsheet worksheet is initialized from the initialization of the worksheet until the end, that is, when the worksheet is disposed or finalized. Hence, the same way I wish to initialize the cells of a Sheet, but I also wish to keep the performance boost of using the indexed cells over the named ("A1") cells, while keeping the ease of use to the API user to refer to a cell with its name, that is how I intend to use the dictionary, so that when I refer to cell "A1", I access this key into my dictionary and address the cell (1, 1) accordingly.
Aside, I know of an even faster way to read from a worksheet using the Worksheet.UsedRange property that returns all of the used cells into a 2D matrix.
If there was anyhow the same or about the same for the set of cells with which I could initialize multiple instances of my Cell class with, this would be great, and performant!I also thought of initializing like only a 100 x 100 matrix cells in memory while mapping them with my dictionary, as one will rarely use the whole sheet's cells. As such, I am still thinking of a way where I would have to access a not yet initialized cell, let's say Cells(120, 120). Ideally, I think, the program would have to initialize all the cells between the maximum initially initialized Cell(100, 100) until Cell (120, 120). Am I clear enough here? Feel free to ask for clarification! =)
Another option could be that I only initialize the cells' names into the dictionary and keeping there row and column index in memory, not initializing a Cell instance with its nativeCell, say a Range. Here's the code of my Cell class to illustrate what I mean.
''' ''' Represents a cell in a worksheet. ''' ''' Friend Class Cell Implements ICell
Private _nativeCell As Microsoft.Office.Interop.Excel.Range Private _name As String ''' <summary> ''' Initializes a new instance of the Company.Project.Cell class. ''' </summary> ''' <param name="nativeCell">The Microsoft.Office.Interop.Excel.Range to wrap.</param> Friend Sub New(ByVal nativeCell As Microsoft.Office.Interop.Excel.Range) _nativeCell = nativeCell End Sub Public ReadOnly Property NativeCell() As Microsoft.Office.Interop.Excel.Range Implements ICellule.NativeCell Get Return _nativeCell End Get End Property Public ReadOnly Property Column() As Integer Implements ICell.Column Get Return _nativeCell.Column End Get End Property Public ReadOnly Property Row() As Integer Implements ICell.Row Get Return _nativeCell.Row End Get End Property Public ReadOnly Property Name() As String Implements ICellule.Name Get If (String.IsNullOrEmpty(_name) OrElse _name.Trim().Length = 0) Then _ _name = GetColumnName() Return _nom End Get End Property Public Property Value() As Object Implements ICellule.Value Get Return _nativeCell.Value2 End Get Set(ByVal value As Object) _nativeCell.Value2 = value End Set End Property Public ReadOnly Property FormattedValue() As String Implements ICellule.FormattedValue Get Return _nativeCell.Text End Get End Property Public ReadOnly Property NumericValue() As Double? Implements ICellule.NumericValue Get Return Value End Get End Property
Questions
What are my other options?
Are there any other ways to walk through?
Is there a way I can make the actual approach viable as for performance concerns?
For your information, this issue timed out on testing, so the test never ended within an acceptable time range which actually take centuries...
Any thoughts are welcome! I'm open minded to other solutions or approach that will help me achieve this objective while addressing this performance issue.
Thanks to you all! =)
EDIT #1
Thanks to Maxim Gueivandov, his solution solves the issue I have addressed in this question.
Aside, there's another problem that arose from this solution: SystemOutOfMemoryException
, and that will be addressed in another question.
My Sincerest Thanks to Maxim Gueivandov.