I have two excel files.
First excel file contains the Person Name and Total Days Present column
Ex.
PersonName TotalDays
xyz
abcd
Another excel file contains Person Name, Date and Status (present/absent).
PersonName Date Status
xyz 1/1/2011 Present
xyz 1/1/2011 Present
I need to group the similar dates status as one, and count them to update in first excel file.
I have around 100 rows in first file where as 20,000 rows in second file from where I need to check. So to make it faster, I loaded all the rows from second file in an Array and reading them to calculate with each entry which works correctly.
The issue is, it take large memory so in Windows so many application automatically opens up and system almost hangs.
Is there any alternate to implement this without memory issue and fast processing. I came across Scripting.Dictionary but not sure whether it will take less memory.
EDIT
I tried using redim preserve and static array with 20,000 size, in both case same problem happens.
EDIT
lblStatus.Caption = "Loading to memory"
Dim ArrAuditData() As AData
Dim TotalLookUpCount As Integer
For J = 1 To 50000
If lookUpRange.Cells(J, cmbChoice.ListIndex) = "Fail" Then
ReDim Preserve ArrAuditData(J) As AData
ArrAuditData(TotalLookUpCount).AuditType = lookUpRange.Cells(J, cmdAudit2.ListIndex)
ArrAuditData(TotalLookUpCount).TransTime = lookUpRange.Cells(J, cmbChoice.ListIndex - 1)
ArrAuditData(TotalLookUpCount).AuditValue = lookUpRange.Cells(J, cmbChoice.ListIndex)
ArrAuditData(TotalLookUpCount).Slno = lookUpRange.Cells(J, 0)
TotalLookUpCount = TotalLookUpCount + 1
ElseIf lookUpRange.Cells(J, cmbChoice.ListIndex) = "" And J > 4 Then Exit For
End If
DoEvents
Next
An array of 20,000 elements containing 4 Variants each will take up less than 2 MB of RAM. I don't think memory has anything to do with your problem -- unless you happen to be using an old computer with 2 MB of RAM or something like that.
A more likely reason why your code is so heavy is that you are looping through cells. There is significant overhead to each communication between VBA and Excel sheet data, and this adds up when you refer to many cells one at a time. In your case, your loop does up to 200,000 separate cell references.
Instead, you should load all your data at once into a Variant
array, and then loop through that array, as shown below. This is significantly faster (even though this uses more memory, not less; but again, I don't think memory is your issue).
lblStatus.Caption = "Loading to memory"
Dim ArrAuditData() As AData
Dim varTemp As Variant
Dim TotalLookUpCount As Integer
' Load everything into a Variant array.
varTemp = lookUpRange
ReDim ArrAuditData(1 To UBound(varTemp, 1)) As AData
For J = 1 To UBound(varTemp, 1)
If varTemp(J, cmbChoice.ListIndex) = "Fail" Then
ArrAuditData(TotalLookUpCount).AuditType = varTemp(J, cmdAudit2.ListIndex)
ArrAuditData(TotalLookUpCount).TransTime = varTemp(J, cmbChoice.ListIndex - 1)
ArrAuditData(TotalLookUpCount).AuditValue = varTemp(J, cmbChoice.ListIndex)
ArrAuditData(TotalLookUpCount).Slno = varTemp(J, 0)
TotalLookUpCount = TotalLookUpCount + 1
ElseIf varTemp(J, cmbChoice.ListIndex) = "" And J > 4 Then
Exit For
End If
DoEvents
Next
ReDim Preserve ArrAuditData(TotalLookUpCount) As AData
For further reading, have a look at this old but still relevant article: http://www.avdf.com/apr98/art_ot003.html
If you still think RAM is the issue, then please show us the AData
type declaration.
EDIT: Also, never ReDim Preserve
inside a loop like that! ReDim Preserve
is a very expensive operation and rarely needs to be done more than once on any given array. Doing it 20,000 times will slow down your code. Here I take it out of the loop, and just use it once at the end to trim off the unused elements. (Notice how I initially ReDim
'ed the array to fit the largest conceivable number of elements.)
I would suggest a different approach.
If I interpret the question correctly:
- you want to get a count of days each person is "Present" or "Absent"
- the first file (call it file1) contains one row per person (about 100 people)
- the second file (call it file2) contains one row per person per day (100 people and 200 days = 20000 rows)
- the desired output is a two extra columns in file 1, being a count of "Present" and a count of "Absent"
The approach I would use is to use the COUNTIF (or if you hvae Excel 2007 or later COUNTIFS)
Assume
- file1 contains a Table on Sheet1 called StatusReport, columns A = Name, B = Present, C = Absent
- one row for each unique name
- file2 contains a Table on Sheet1 called StatusData, columns A = Name, B = Date, C = Status
- one row for each name for each date
Solution for Excel 2007 or 2010
- file1 cell B2
=COUNTIFS(file2.xlsx!StatusData[Name],[Name],file2.xlsx!StatusData[Status],StatusReport[[#Headers],[Present]])
- file1 cell C2
=COUNTIFS(file2.xlsx!StatusData[Name],[Name],file2.xlsx!StatusData[Status],StatusReport[[#Headers],[Absent]])
Solution for Excel 2003
Add an extra column D to file2 StatusData table (call it Code)
=Sheet1!$A2&"_"&Sheet1!$C2
file1 cell B2
=COUNTIF([file2.xls]Sheet1!$D:$D,Sheet2!$A2&"_"&Sheet2!$B$1)
- file1 cell C2
=COUNTIF([file2.xls]Sheet1!$D:$D,Sheet2!$A2&"_"&Sheet2!$C$1)
Note: while these formula give the same result the COUNTIFS + Table references version in 2010 if so much faster its not funny (my test on about 300,000 rows updates in a few seconds).