Feeds:
Posts
Comments

Archive for March, 2011

Don’t Forget the RegEx

String manipulation and parsing tasks are just as common in Excel/VBA development as they are in apps developed in any other programming language. C, C# and C++ have powerful built-in string classes that greatly reduce the amount of code that must be written to accomplish these tasks. While VBA string manipulation code will always be more “wordy” than it’s C++ counterparts, there’s no reason that it has to be any less capable or powerful.

The task at hand was to write a utility that would parse lines in a CSV file. However, commas that appeared inside quotation marks should be ignored. For example, the string A,B,C,”D,E”,F,”G,H,I”,J,K, should be split into 8 elements:
A
B
C
D,E
F
G,H,I
J
K

Many beginner to intermediate VBA programmers may not be aware that Microsoft provides a powerful regex component that can be integrated into any Excel/VBA project. As soon as I understood the dimensions of the task, I immediately thought of Microsoft’s RegEx component. I’m sure there are many ways of doing this, but I think the RegEx object provides the most efficient method. Let’s get right to it.

Using RegEx in Excel/VBA

1. Start a new Excel workbook, open the VBA editor, and add a reference to “Microsoft VBScript Regular Expressions 5.5” (the current version as of this writing).

2. Add an ActiveX command button to the worksheet, name it “btnShowForm”, and change the Caption to “Show Regex Form”.

3. Back in the VBA editor, insert a new UserForm into the project, name it “frmParse”, and change the Caption to “Parse Special”.

4. Add the following controls to the new UserForm: one Label named “lblInput” (Caption = “String to Parse”), one Textbox named “txtInput”, one Listbox named “lstOutput” and two Command Buttons. The first command button should be named “btnParse” (Caption = “Parse”) and the second button should be named “btnExit” (Caption = “Exit”).

Your UserForm should look something like this:

New UserForm

New UserForm

5. Next, add the following code to the btnExit_Click event:

Me.Hide

6. Next, add the following code to the btnParse_Click event (don’t worry, we’ll create the clsRegEx class module next):


Dim arrData() As String
Dim intWord As Integer
Dim objRegex As clsRegex

Set objRegex = New clsRegex
arrData = Split(objRegex.RegexParse(txtInput.Text), ",")
lstOutput.Clear
For intWord = 0 To UBound(arrData)
lstOutput.AddItem Replace(arrData(intWord), "?", ",")
Next intWord

Set objRegex = Nothing

7. Next, add a new Class module to the project, name it “clsRegex” and add the following function to the new class:


Public Function RegexParse(ByVal ToParse As String) As String
'Use regular expressions to find and replace quoted text in the CSV input.
'NOTE: Must add a reference to "Microsoft VBScript Regular Expressions 5.5".
Dim strPat As String, strReturn As String
Dim objRegExp As VBScript_RegExp_55.RegExp
Dim objMatch As VBScript_RegExp_55.Match

On Error GoTo RPError

strReturn = ToParse

If InStr(ToParse, Chr$(34)) > 0 Then
Set objRegExp = New VBScript_RegExp_55.RegExp
With objRegExp
.IgnoreCase = True
.Global = True 'Identify ALL instances that match regex pattern, not just first one.

        'NOTE: "[^"]+" regex code means "Match a double quote, match one or more chars that
        ' are not double quotes, then match a second double quote".
strPat = Chr$(34) & "[^" & Chr$(34) & "]+" & Chr$(34)
.Pattern = strPat
End With

For Each objMatch In objRegExp.Execute(strReturn)
strReturn = Replace(strReturn, objMatch.Value, _
Replace(Replace(objMatch.Value, Chr$(34), vbNullString), ",", "?"))
Next objMatch
End If

RPResume:
RegexParse = strReturn
Set objMatch = Nothing
Set objRegExp = Nothing
Exit Function

RPError:
MsgBox "Error: " & Err.Number & vbCrLf & vbCrLf & "Description: " & _
Err.Description, vbCritical, "Regex Parsing Error"
strReturn = ToParse 'Set the return value back to the original string
Resume RPResume

End Function

8. Finally, go back to the worksheet and add the following code to the cmdShowForm_Click event:


frmParse.Show vbModal
Unload frmParse

That’s it for the coding. Now save and open your new workbook. Click the “Show Form” button on the worksheet. After the UserForm appears, enter the following in the input text box:

A,B,C,”D,E”,F,”G,H,I”,J,K

Click the “Parse” button and you should see the Listbox fill with eight rows. The string in the text box should be broken out exactly as required.

RegEx form with Listbox filled

RegEx form with Listbox filled

Notice the way the “RegExp” and “Match” classes of the VBScript_RegExp_55 module are used to accomplish the task. What actually happens is that the RegExp module is used to identify the pattern where a comma appears within quotation marks. It then changes those quotes to question marks, so the calling routine (cmdParse_Click) can use the Split function to push the CSV line into an array correctly. The btnParse code then switches the question marks back to commas before outputting the data to the Listbox.

So, don’t forget the “RegEx” the next time you’re faced with a pattern-matching problem.

Advertisements

Read Full Post »