I'm trying to create an Excel macro that formats dimension values into our company's format. That way we can easily import the data into our systems without having to do thousands of dimensions manually. There are a few problems I'm running into though:
- There are many variations of dimensions that suppliers send us, which makes it difficult for me to come up with some sort of regex to capture all of the values.
- Even if I were able to come up with some sort of regex to process the values, I'm not sure how I'd replace the values with the correct form since I'm not sure if it's possible to replace a regex match with a captured regex group value. If it is, I have not clue how I'd go about it with the situation.
Our company's standard format for Dimensions is as follows:
Can have up to 3 parameters per value
Attribute1:Value1:Unit1;Attribute2:Value2:Unit2;Attribute3:Value3:Unit3
Example: 1" L x 2" W x 3" H Translates to Length:1:in;Width:2:in;Height:3:in
Possible Values that are able to be used with it are:
- Length
- Width
- Height
- Arc
- Area
- Circumference
- Depth
- Dia
- Thickness
Some of the variations I've noted in the past year include:
- Length - L or L.
- Width - W or W.
- Height - H, H., Heigth
- Circumference - Round
- Depth - D, D., Deep
- Dia - Diameter or Dia.
- Thickness - Thick
- in - inch, inches, in., ", ''(2 apostrophes)
- ft - feet
A small sample of Product Dimensions (Note the inconsistencies):
3 3/4" Width x 2 1/2" Height
L 4 3/4" x W 1 1/2" x H 3"
3 1/2" W x 2 1/8" H x 2 7/8" D
3 5/8" W x 2 1/2" H x 5/8" Depth
3 3/4" W x 1" H
1 1/4" W x 3 1/4" H
2 3/8" Diameter
3" W x 2 1/2" H
2" W x 3" H
2 1/2" W x 2" H
1 3/8" W x 2 1/8" H
3 1/2" W x 3 1/2" H
1 1/2" W x 3" H
2" W x 1 7/8" H x 1 1/2" D
4 3/4" W x 3 1/2" H
4 3/4" W x 4" H x 1 1/4" D
3 1/2" W x 3 1/2" H x 3 1/2" D
3-1/2" W x 2-3/4" H
3.5" W x 4" H
3" H
3 1/4" W x 2 1/4" H
4 7/16" W x 6 1/4" H
3 1/4" W x 3 1/4" H
5" W x 7" H
So far, I've come up with the regex (\d+(.| |/|)\d+((/)\d+|)|\d+)
, which appears to pick up all the numbers, but I'm not quite sure how I would go about finding all the different variations of the attributes and units. The only thing I thought might work would be lookbehinds & lookaheads, but I'm not well-versed enough with this flavor of regex to figure it.
Question 1: Is regex the best way to go about this task or is there a better way?
Question 2: The ultimate question, How can I accomplish this complex task or is it even remotely possible using vba?
You can create a parser / render, the below example shows how that may be implemented in EBNF parser based on RegEx, put the code into standard VBA module:
Save a sample as ANSI or Unicode to text file
Source.txt
in the same folder as Excel file, and runTestParserRender()
. The output will be saved to text fileResult.txt
. Processing starts from parsing. Variations in attributes and units are cast by RegEx replacement first. Then matched to the RegEx patterns parts are folded into tokens. Wrong Value + Attribute sequences are rectified with RegEx sub-matches swapping by replacement. At the end of the parsing the single top level Structure token should left, otherwise the error is raised. If parsing is failed, unrecognized parts are put into braces in output. If it's success then the reverse process of content retrieving with rendering continues up to the last token.The parsing algorithm in outline can be represented by the EBNF grammar below (simplified, replacement not shown):
and related diagram:
The output for the sample you provided is as follows:
BTW I used the same approach in VBA JSON parser.