VBA Excel large data manipulation taking forever

2020-01-29 12:24发布

I have two excel files.

First excel file contains the Person Name and Total Days Present column Ex.

PersonName       TotalDays
xyz               
abcd             

Another excel file contains Person Name, Date and Status (present/absent).

PersonName      Date      Status
xyz           1/1/2011    Present
xyz           1/1/2011    Present

I need to group the similar dates status as one, and count them to update in first excel file.

I have around 100 rows in first file where as 20,000 rows in second file from where I need to check. So to make it faster, I loaded all the rows from second file in an Array and reading them to calculate with each entry which works correctly.

The issue is, it take large memory so in Windows so many application automatically opens up and system almost hangs.

Is there any alternate to implement this without memory issue and fast processing. I came across Scripting.Dictionary but not sure whether it will take less memory.

EDIT I tried using redim preserve and static array with 20,000 size, in both case same problem happens.

EDIT

lblStatus.Caption = "Loading to memory"
 Dim ArrAuditData() As AData
 Dim TotalLookUpCount As Integer
 For J = 1 To 50000

 If lookUpRange.Cells(J, cmbChoice.ListIndex) = "Fail" Then
  ReDim Preserve ArrAuditData(J) As AData
    ArrAuditData(TotalLookUpCount).AuditType = lookUpRange.Cells(J, cmdAudit2.ListIndex)
    ArrAuditData(TotalLookUpCount).TransTime = lookUpRange.Cells(J, cmbChoice.ListIndex - 1)
    ArrAuditData(TotalLookUpCount).AuditValue = lookUpRange.Cells(J, cmbChoice.ListIndex)
    ArrAuditData(TotalLookUpCount).Slno = lookUpRange.Cells(J, 0)

    TotalLookUpCount = TotalLookUpCount + 1
ElseIf lookUpRange.Cells(J, cmbChoice.ListIndex) = "" And J > 4 Then Exit For

    End If
    DoEvents
  Next

2条回答
Fickle 薄情
2楼-- · 2020-01-29 13:17

An array of 20,000 elements containing 4 Variants each will take up less than 2 MB of RAM. I don't think memory has anything to do with your problem -- unless you happen to be using an old computer with 2 MB of RAM or something like that.

A more likely reason why your code is so heavy is that you are looping through cells. There is significant overhead to each communication between VBA and Excel sheet data, and this adds up when you refer to many cells one at a time. In your case, your loop does up to 200,000 separate cell references.

Instead, you should load all your data at once into a Variant array, and then loop through that array, as shown below. This is significantly faster (even though this uses more memory, not less; but again, I don't think memory is your issue).

lblStatus.Caption = "Loading to memory"
Dim ArrAuditData() As AData
Dim varTemp As Variant
Dim TotalLookUpCount As Integer

' Load everything into a Variant array. 
varTemp = lookUpRange

ReDim ArrAuditData(1 To UBound(varTemp, 1)) As AData

For J = 1 To UBound(varTemp, 1)

    If varTemp(J, cmbChoice.ListIndex) = "Fail" Then

        ArrAuditData(TotalLookUpCount).AuditType = varTemp(J, cmdAudit2.ListIndex)
        ArrAuditData(TotalLookUpCount).TransTime = varTemp(J, cmbChoice.ListIndex - 1)
        ArrAuditData(TotalLookUpCount).AuditValue = varTemp(J, cmbChoice.ListIndex)
        ArrAuditData(TotalLookUpCount).Slno = varTemp(J, 0)
        TotalLookUpCount = TotalLookUpCount + 1

    ElseIf varTemp(J, cmbChoice.ListIndex) = "" And J > 4 Then
        Exit For

    End If

    DoEvents
Next

ReDim Preserve ArrAuditData(TotalLookUpCount) As AData

For further reading, have a look at this old but still relevant article: http://www.avdf.com/apr98/art_ot003.html

If you still think RAM is the issue, then please show us the AData type declaration.

EDIT: Also, never ReDim Preserve inside a loop like that! ReDim Preserve is a very expensive operation and rarely needs to be done more than once on any given array. Doing it 20,000 times will slow down your code. Here I take it out of the loop, and just use it once at the end to trim off the unused elements. (Notice how I initially ReDim'ed the array to fit the largest conceivable number of elements.)

查看更多
何必那么认真
3楼-- · 2020-01-29 13:25

I would suggest a different approach.

If I interpret the question correctly:

  • you want to get a count of days each person is "Present" or "Absent"
  • the first file (call it file1) contains one row per person (about 100 people)
  • the second file (call it file2) contains one row per person per day (100 people and 200 days = 20000 rows)
  • the desired output is a two extra columns in file 1, being a count of "Present" and a count of "Absent"

The approach I would use is to use the COUNTIF (or if you hvae Excel 2007 or later COUNTIFS)

Assume

  • file1 contains a Table on Sheet1 called StatusReport, columns A = Name, B = Present, C = Absent
  • one row for each unique name
  • file2 contains a Table on Sheet1 called StatusData, columns A = Name, B = Date, C = Status
  • one row for each name for each date

Solution for Excel 2007 or 2010

  • file1 cell B2
    =COUNTIFS(file2.xlsx!StatusData[Name],[Name],file2.xlsx!StatusData[Status],StatusReport[[#Headers],[Present]])
  • file1 cell C2
    =COUNTIFS(file2.xlsx!StatusData[Name],[Name],file2.xlsx!StatusData[Status],StatusReport[[#Headers],[Absent]])

Solution for Excel 2003

  • Add an extra column D to file2 StatusData table (call it Code)
    =Sheet1!$A2&"_"&Sheet1!$C2

  • file1 cell B2
    =COUNTIF([file2.xls]Sheet1!$D:$D,Sheet2!$A2&"_"&Sheet2!$B$1)

  • file1 cell C2
    =COUNTIF([file2.xls]Sheet1!$D:$D,Sheet2!$A2&"_"&Sheet2!$C$1)

Note: while these formula give the same result the COUNTIFS + Table references version in 2010 if so much faster its not funny (my test on about 300,000 rows updates in a few seconds).

查看更多
登录 后发表回答