I'm making a C# program that will be able to dynamically read an IBM HOST Copybook written in COBOL and generate an SQL table off of it. Once the table is generated I can upload a file into my program and it will read, convert from IMB-37 and insert the file into that sql table. So far I can handle almost anything, although I'm running into some issues with REDEFINES.
For example:
10 SOME-FIELD PIC 9(3) COMP-3. SCRRB205
4117 10 SOME-OTHER-FIELD REDEFINES 3041-17
4117 SOME-FIELD PIC X(2). 3041-17
I understand that the redefine takes the place of the field above it in this case, although what i don't understand is how the compiler knows if it should use the redefine on it or not. I'm assuming that in this case it will be because the first one is a number where the second one is a character, although in the example below they are all using characters.
05 STREET-ADDRESS.
10 ADDRESS-LINE-1 PIC X(20).
10 ADDRESS-LINE-2 PIC X(20).
05 PO-BOX REDEFINES STREET-ADDRESS PIC X(40).
I have tried just ignoring the redefines since it will always take the same amount of space, but in the case where the original field is packed and the redefined one is not I need to know when to unpack the field.
Any help with this would be amazing guys!
I can maybe help you, as 2 years ago I have accomplished exactly what you are doing now.
I had to design a MySQL Datawarehouse, including the ETL system, based exclusively on files from a RM COBOL ERP application running on Linux. The application had more than 600 files, and it was still unclear how much of them would finally end up in the database. Most of the important files were indexed, on COMP fields to make it harder, and one of the obvious requirement was that all relationships between files and their indexed keys could be reproduced on the database. So I potentially needed every field of every file.
Giving the number of files, it was out of question to treat all the files, manually and one by one.
So my idea was to code a VB.NET application that take the COBOL copybooks in input and :
At the beginning of the project, I ran into exactly the same issues than you now, notably those damn REDEFINES. I found the task of listing and coding all copybook possibilities, if not impossible, at least hazardous. So I looked into another way, and found this :
CB2XML
COBOL copybook to XML converter: SourceForge
This saved me weeks of hard work on copybook parsing and interpreting. It can parse COBOL copybooks to change them into an XML file describing perfectly all PICTURE with a lot of useful attributes, like length or type. It fully support COBOL'86 standards.
Example :
Turns into this :
List of all XML attributes
With the help of this XML structure I have achieved all the goals and beyond.
The generated COBOL programs that convert the indexed files (readable only with RM cobol runtime) into flat files deals with every field, ARRAYS and REDEFINES included.
Not all the fields have a purpose when they are in the database but at least everything is available all the time
With the file above, the SEQUENTIAL text file copybook becomes this :
Auto generated COBOL
MOVE instructions
Once the flat files are written, they can be processed to MySQL by the VBA code, also generated by the VB.NET application.
Auto generated VBA
Type def declaration to deal with the text file importation
Note the original PICTURE in comments next to each field
Create table procedure
Each field has become an object (from a custom class I created), and the method
SQLtypeFull
used below returns the MySQL datatype of each fieldFinal SQL statement
I have much more in the generated VBA modules, and the level of detail and accuracy of the generated xml helped a lot for all of them:
I have probably shown enough to give you some ideas so I will stop there.
The most important: On several hundred thousands of records, I have not a single digit loss on computations. When I SUM() on all rows using SQL in the Database, I have the exact same numbers than returned by the original COBOL application
If you wonder why I used Access/VBA and not .NET for the importation: it was a non-negotiable requirement -_-
On a last note : I am not affiliated in anyway with CB2XML and this is not an advertisement for it. It's just a great and helpful piece of software, and deserves love and attention.
REDEFINES
is going to make your task more difficult. It is not the the "compiler" knows which particular field to use, intuitively, it is that the code in the existing COBOL system knows which field to use. There will be some indication, some value in another field, which will indicate which of the fields to use at which particular time.Taking your second example, as the first is devoid of context:
That field will be interrogated before the data is used. Either directly (you can find lots of horrible code out there) or with an 88-level Condition Name:
Your first example will be dealt with in a similar manner.
It is an "old style" use of REDEFINES, to use the same storage locations on a record for mutually-exclusive situations. Saves storage, which was expensive. The system you are working with is either "old", or the design of it was infected by false "experience".
You have two broad choices: to replicate all the conditional selection of data (so that you have two sets of business-logic to keep in step); to get the file changed so that each field occupies its own storage.
The presence of COMP-3 (or PACKED-DECIMAL) or COMP/COMP-4/COMP-5/BINARY data-types also complicate things for you. You'd need to then do your EBCDIC-to-ASCII at the field level, for actual EBCDIC data, and do whatever would be necessary to convert or simply acquire the "computational" data.
Also be aware that any signed-DISPLAY-numeric fields (numeric fields with a PICture beginning with an S but without an explicit "computational" usage) will apparently contain "character" data in the final byte, as the sign is held as an "overpunch" of the final byte.
Note that the binary data-types will be Big Endian.
It will be massively simpler for you if you receive files which have no REDEFINES, no "computational" fields, and no embedded signs (or implicit decimal-places). All your data would be character, and you can EBCDIC-to-ASCII at the record-level (or at the file level, with your file-transfer mechanism).
If you look at questions here tagged COMP-3, you'll find further discussion of this, and if you decide that the ridiculous route (your program understanding native Mainframe COBOL data-items rather than plain "text") is the only possible way to go, then there are a number of things in the discussions you may find useful and be able to use or apply.
If your company is "regulated" externally, then ensure your Compliance, Audit and Accounting departments are happy with your design before you code one line. Whoops. Late for that. Let's hope it is manufacturing.