VBA - Find all numbered lines in VBE Modules via p

2020-03-05 03:46发布

问题:

Task:

My goal is to find all numbered lines in procedures of my Code Modules. The CodeModule.Find method can be used to check for search terms (target parameter).

Syntax:

object.Find(target, startline, startcol, endline, endcol [, wholeword] [, matchcase] [, patternsearch])

The referring help site https://msdn.microsoft.com/en-us/library/aa443952(v=vs.60).aspx states: parameter patternsearch: Optional. A Boolean value specifying whether or not the target string is a regular expression pattern. If True, the target string is a regular expression pattern. False is the default.

As explained above the find method allows a regex pattern search, which I would like to use in order to identify numbered lines in a precise way: digits followed by a tab. The example below therefore defines a search string s and sets the last parameter PatternSearch in the .Find method to True.

Problem AFAIK a valid regex definition could be

s = "[0-9]{1,4}[ \t]"

but that doesn't show anything, not even an error.

In order to show at least any results, I defined the search term

s = "[0-9]*[ \t]*)"

in the calling example procedure ListNumberedLines showing erratic results.

Question

Is there any possibility to use a valid regex patternsearch in the CodeModule.Find method?

Example code

Option Explicit

' ==============
' Example Search
' ==============
Sub ListNumberedLines()
'  Declare search pattern string s
   Dim S As String
10  S = "[0-9]*[ \t]*)"     
20  Debug.Print "Search Term: " & S
30  Call findWordInModules(S)

End Sub

Public Sub findWordInModules(ByVal sSearchTerm As String)
' Purpose: find modules ('components') with lines containing a search term
' Method:  .CodeModule.Find with last parameter patternsearch set to True
' Based on https://www.devhut.net/2016/02/24/vba-find-term-in-vba-modulescode/

' VBComponent requires reference to Microsoft Visual Basic for Applications Extensibility
'             or keep it as is and use Late Binding instead
' Declare module variable oComponent
  Dim oComponent            As Object    'VBComponent

  For Each oComponent In Application.VBE.ActiveVBProject.VBComponents
    If oComponent.CodeModule.Find(sSearchTerm, 1, 1, -1, -1, False, False, True) = True Then
        Debug.Print "Module: " & oComponent.Name  'Name of the current module in which the term was found (at least once)
        'Need to execute a recursive listing of where it is found in the module since it could be found more than once
        Call listLinesinModuleWhereFound(oComponent, sSearchTerm)
    End If
  Next oComponent
End Sub

Sub listLinesinModuleWhereFound(ByVal oComponent As Object, ByVal sSearchTerm As String)
' Purpose: list module lines containing a search term
' Method:  .CodeModule.Find with last parameter patternsearch set to True
  Dim lTotalNoLines         As Long   'total number of lines within the module being examined
  Dim lLineNo               As Long   'will return the line no where the term is found
    lLineNo = 1
  With oComponent         ' Module
    lTotalNoLines = .CodeModule.CountOfLines
    Do While .CodeModule.Find(sSearchTerm, lLineNo, 1, -1, -1, False, False, True) = True
        Debug.Print vbTab & "Zl. " & lLineNo & "|" & _
                    Trim(.CodeModule.Lines(lLineNo, 1))  'Remove any padding spaces
        lLineNo = lLineNo + 1    'Restart the search at the next line looking for the next occurence
    Loop
  End With
End Sub

回答1:

As @MatsMug says, parsing VBA with Regex is hard impossible, but line-numbers are a simpler case, and should be findable with regex alone.

Fortunately, line numbers can only appear within a procedure body (including before the End Sub/Function/Property statement), so we know they'll never be the first line of your code.

Unfortunately, you can prefix a line-label with 0 or more line continuations:

Sub Foo()
 _
 _
10 Beep
End Sub

Furthermore, a line number isn't always followed by a space - it can be followed by an instruction separator, giving the line-number the appearance of a line-label:

Sub foo()
10: Beep
End Sub

And if you're code is evil, you might encounter a negative line-number (entered by using hex notation - which VBE dutifully pretty prints back to the code-pane with a leading space and a negative number):

Sub foo()
10 Beep
 -1 Beep
End Sub

And we also need to be able to identify numbers that appear on a continued line, that aren't line-numbers:

Sub foo()
  Debug.Print _
5 & "is not a line-number"
End Sub

So, here's some evil line-numbering, with a mix of all of those edge-cases:

Option Explicit

Sub foo()

5: Beep

 _
 _
 _
10 Beep

20 _
'Debug.Print _
30

50: Beep

40 Beep

 _
 -1 _
 Beep 'The "-1" line number is achieved by entering "&HFFFFFFFF"

Debug.Print _
2 & "is not a line-number"

60 End Sub

And here's some regex that identifies the line-numbers:

(?<! _)\n( _\n)* ?(?<line_number>(?:\-)?\d+)[: ]

And here's a syntax highlight from regex101:



回答2:

For the longest time, Rubberduck was struggling with properly/formally parsing line numbers - our work-around was to remove them (replacing them with spaces) before feeding the code module contents to our parser.

Recently we've managed to formally define line numbers:

// lineNumberLabel should actually be "statement-label" according to MS VBAL but they only allow lineNumberLabels:
// A <statement-label> that occurs as the first element of a <list-or-label> element has the effect 
// as if the <statement-label> was replaced with a <goto-statement> containing the same 
// <statement-label>. This <goto-statement> takes the place of <line-number-label> in 
// <statement-list>.  
listOrLabel :
    lineNumberLabel (whiteSpace? COLON whiteSpace? sameLineStatement?)*
    | (COLON whiteSpace?)? sameLineStatement (whiteSpace? COLON whiteSpace? sameLineStatement?)*
;
sameLineStatement : blockStmt;

And lineNumberLabel is defined as:

//Statement labels can only appear at the start of a line.
statementLabelDefinition : {_input.La(-1) == NEWLINE}? (combinedLabels | identifierStatementLabel | standaloneLineNumberLabel);
identifierStatementLabel : unrestrictedIdentifier whiteSpace? COLON; 

standaloneLineNumberLabel : 
    lineNumberLabel whiteSpace? COLON
    | lineNumberLabel;
combinedLabels : lineNumberLabel whiteSpace identifierStatementLabel;
lineNumberLabel : numberLiteral;

(full Antlr4 grammar here)

Notice the predicate {_input.La(-1) == NEWLINE}?, which force the parser rule to only match a statementLabelDefinition at the start of a line - a logical line of code.

You see VBA code has physical code lines, like what you're getting from the CodeModule's contents. But VBA code also has a concept of logical code lines, and it turns out that is all the parser cares about.

This would trip any typical regex:

Sub DoSomething()
    Debug.Print _
42
End Sub

There's only 1 logical line of code between the signature and the End Sub token, but a simple Find will happily consider that 42 as a "line number" ...which it isn't - it's the argument passed to Debug.Print, in the same instruction, on the same logical code line, but on the next physical code line.

And you can't be dealing with logical code lines without first pre-processing your input, to take line continuation tokens into account.

And in order to do that, you need to actually parse the instructions you're seeing - at least know where they start and where they end... and that's no small undertaking! see ThunderFrame's answer

The VBIDE API is extremely limited, and won't be helpful for that.


TL;DR: You can't parse VBA code with regular expressions alone. So, nope. Sorry! you need a much more complex regex pattern than that - see ThunderFrame's answer.



回答3:

Conclusion regarding CodeModule.Find via search pattern

Firstly, CodeModule.Find doesn't help via search pattern and its possible use is intransparent. I agree that the VBIDE API is extremely limited and that there exist excellent professional tools which I highly recommand for any programmer :-)

Consequence: Work around via XML

Secondly I prefer household remedies if possible, so I tried to find an alternative solution using only the helpful parts of VBIDE.

Method That is why I tried a simple xml conversation of the CodeModule.Lines allowing a flexible search within logical lines. Instead of using regular expressions in requesting the xml data, I demonstrate a method to find leading numbers via a well defined XPath search (loop thru node list), thus resolving most problems shown by @ThunderFrame. The search string in function showErls is defined as "line[substring(translate(.,'0123456789','¹¹¹¹¹¹¹¹¹¹'),1,1)="¹"]"

Furthermore function 'lineNumber' returns the logical line number within the module. Note: To keep it simple, the search is restrained to one module only (user defined constant MYMODULE) and code avoids any regex.

Work around code - main sub

Option Explicit
' ==========================================
' User defined name of module to be analyzed
' ==========================================
  Const MYMODULE = "modThunderFrame"    ' << change to existing module name or userform
' Declare xml file as object
  Dim xCMods As Object            ' Late Binding; instead of Early Bd: Dim xCMods As MSXML2.DOMDocument6

Public Sub TestLineNumbers()
' =================
' A. Load/refresh code into xml
' =================
' set xml into memory - contains code module(s) lines
  Set xCMods = CreateObject("MSXML2.Domdocument.6.0") ' L.Bd.; instead of E.Bd: Set xCMods = New MSXML2.DOMDocument60
      xCMods.async = False
      xCMods.validateOnParse = False
' read in user defined code module and load xml, if failed show error message
  refreshCM MYMODULE
  If xCMods Is Nothing Then Exit Sub

' ======================
' B. search line numbers
' ======================
  showERLs

' =============================
' C. Save xml if needed
' =============================
  ' xCMods.Save ThisWorkbook.Path & "\VBE(" & MYMODULE & ").xml"
  ' MsgBox "Successfully exported Excel data to " & ThisWorkbook.Path & "\VBE(" & MYMODULE & ").XML!", _
  '        vbInformation, "Module " & MYMODULE & " to xml"

' =================
' D. terminate xml
' =================
  Set xCMods = Nothing

End Sub

Sub procedures

Private Sub showERLs()
' Purpose: [B.] declare XPath search string and define special translate character
  Dim s  As String
  Dim S1 As String: S1 = Chr(185)   ' superior number 1 (hex B9) replaces any digit
' declare node and node list
  Dim line    As Object
  Dim lines   As Object
' define XPath search string for first digit in line (usual case)
  s = "line[substring(translate(.,'0123456789','" & String(10, S1) & "'),1,1)=""" & _
                  S1 & _
                  """]"
' start debugging
  Debug.Print "**search string=""" & s & """" & vbNewLine & String(50, "-")
  Debug.Print "Line #|Line Content" & vbNewLine & String(50, "-"); ""
' set node list
  Set lines = xCMods.DocumentElement.SelectNodes(s)
' -------------------
' loop thru node list
' -------------------
  For Each line In lines
      Debug.Print Format(lineNumber(line), "00000") & "|" & line.Text      ' return logical line number plus line content
  Next line

End Sub

Private Sub refreshCM(sModName As String)
' Purpose: [A.] load xml string via LoadXML method
  Dim sErrTxt As String
  Dim line    As Object
  Dim lines   As Object
  Dim xpe     As Object
  Dim s       As String  ' xpath expression
  Dim pos     As Integer ' position of line number prefix
  ' ======================================
  ' 1. Read code module lines and load xml
  ' ======================================
    If Not xCMods.LoadXML(readCM(sModName)) Then
    ' set ParseError object
      Set xpe = xCMods.parseError
      With xpe
        sErrTxt = sErrTxt & vbNewLine & String(20, "-") & vbNewLine & _
          "Loading Error No " & .ErrorCode & " of xml file " & vbCrLf & _
          Replace(" " & Replace(.URL, "file:///", "") & " ", "  ", "[No file found]") & vbCrLf & vbCrLf & _
          xpe.reason & vbCrLf & _
          "Source Text:    " & .srcText & vbCrLf & _
          "char?:  " & """" & Mid(.srcText, .linepos, 1) & """" & vbCrLf & vbCrLf & _
          "Line no:    " & .line & vbCrLf & _
          "Line pos: " & .linepos & vbCrLf & _
          "File pos.:  " & .filepos & vbCrLf & vbCrLf
      End With
      MsgBox sErrTxt, vbExclamation, "XML Loading Error"
      Set xCMods = Nothing
      Exit Sub
    End If

' 2. resolve hex input problem of negative line numbers with leading space (thx @Thunderframe)
    s = "line"
    Set lines = xCMods.DocumentElement.SelectNodes(s)
  ' loop thru all logical lines
    For Each line In lines
        pos = ErlPosInLine(line.Text)
        If pos <= Len(line.Text) Then
           ' to do: add attribute to line node, if wanted

           ' correct line content
             line.Text = Mid(line.Text, pos)
        End If
    Next
End Sub

Private Function lineNumber(node As Object) As Long
' Purpose: [B.] return logical line number within code module lines
' Param.:  IXMLDomNode
' Method:  XPath via preceding-sibling count plus one
Dim tag As String: tag = "line"
lineNumber = node.SelectNodes("preceding-sibling::" & tag).Length + 1

End Function


Private Function readCM(Optional modName = "*") As String
' Purpose: return code module line string (VBIDE) of a user defined module to be read into xml
' Call:    called from [A.] refreshCM
'          xCMods.LoadXML(readCM(sModName))
' Declare variable
  Dim s     As String
  Dim md As CodeModule
  If modName = "*" Then Exit Function
  On Error GoTo OOPS
' get code module lines into string
  Set md = Application.VBE.ActiveVBProject.VBComponents(modName).CodeModule   ' MSAccess: Modules("modVBELines")
' change to xml tags
  s = getTags(md.lines(1, md.CountOfLines))
' return
  readCM = s
OOPS:
End Function

Private Function getTags(ByVal s As String, Optional mode = False) As String
' Purpose: prepares xml string to be loaded
' define constant
  Const HEAD = "<?xml version=""1.0"" encoding=""utf-8""?>" & vbCrLf & "<cm>" & vbCrLf
' 1. change tag characters
  s = Replace(Replace(s, "<", "&lt;"), ">", "&gt;")
' 2. change special characters (ampersand)
  s = Replace(s, "&", "&amp;")
' 3. change "_" points
  s = Replace(s, "_" & vbCrLf, Chr(133) & vbLf)
' 4. define logical line entities
  If Right(s, 2) = vbCrLf Then s = Left(s, Len(s) - 2)
  s = HEAD & "  <line>" & Replace(s, vbCrLf, "</line>" & vbCrLf & "  <line>") & "</line>" & vbCrLf & "</cm>"

' debug xml tags if second function parameter is true (mode = True)
  If mode Then Debug.Print s

' return
  getTags = s
End Function

Sub testErlPosInLine()
' Purpose: Test Thunderframe's problem with ERL prefixes (underscores, " ",..) and hex inputs
Dim s As String
s = " _" & vbLf & " -1 xx"
MsgBox "|" & Mid(s, ErlPosInLine(s)) & "|" & vbNewLine & _
       "prefix = |" & Mid(s, 1, ErlPosInLine(s) - 1) & "|"

End Sub
Private Function ErlPosInLine(ByVal s As String) As Integer
' Purpose: remove prefix (underscore, tab, " ",.. ) from numbered line
' cf:      http://stackoverflow.com/questions/42716936/vba-to-remove-numbers-from-start-of-string-cell
  Dim i As Long
  For i = 1 To Len(s)                 ' loop each char
    Select Case Mid$(s, i, 1)       ' examine current char
        Case " "                    ' permitted chars
        Case "_"
        Case vbLf, Chr(133), Chr(34)
        Case "0" To "9": Exit For   ' cut off point
        Case Else: Exit For         ' i is the cut off point
    End Select
  Next
  If Mid$(s, i, 1) = "-" And Len(s) > 1 Then
   If IsNumeric(Mid$(s, i + 1, 1)) Then i = i + 1
  End If
' return
ErlPosInLine = i
' debug.print Mid$(s, i) '//strip lead
End Function