I am working with some old data imports and came across a bunch of data from an external source that reports financial numbers with a signed overpunch. I've seen alot, but this is before my time. Before I go about creating a function to parse these strangers, I wanted to check to see if there was a standard way to handle these.
I guess my question is, does the .Net framework provide a standard facility for converting signed overpunch strings? If not .NET, are there any third party tools I can use so I don't reinvent the wheel?
Over-punched numeric (Zoned-Decimal in Cobol) comes from the old-punched cards where they over-punched the sign on the last digit in a number. The format is commonly used in Cobol.
As there are both Ascii and Ebcdic Cobol compilers, there are both Ascii and EBCDIC versions of the Zoned-Numeric. To make it even more complicated, the -0 and +0 values ({} for US-Ebcdic (IBM037) are different for say German-Ebcdic (IBM273 where they are äü) and different again in other Ebcdic language versions).
To process successfully, You need to know:
- Did the data originate in a Ebcdic or Ascii system
- if Ebcdic - which language US, German etc
If the data is in the original character set, you can calculate the sign by
For EBCDIC the numeric hex codes are:
Digit 0 1 2 .. 9
unsigned: x'F0' x'F1' x'F2' .. x'F9' 012 .. 9
Negative: x'D0' x'D1' x'D2' .. x'D9' }JK .. R
Positive: x'C0' x'C1' x'C2' .. x'C9' {AB .. I
For US-Ebcdic Zoned this is the java code to convert a string:
int positiveDiff = 'A' - '1';
int negativeDiff = 'J' - '1';
lastChar = ret.substring(ret.length() - 1).toUpperCase().charAt(0);
switch (lastChar) {
case '}' : sign = "-";
case '{' :
lastChar = '0';
break;
case 'A':
case 'B':
case 'C':
case 'D':
case 'E':
case 'F':
case 'G':
case 'H':
case 'I':
lastChar = (char) (lastChar - positiveDiff);
break;
case 'J':
case 'K':
case 'L':
case 'M':
case 'N':
case 'O':
case 'P':
case 'Q':
case 'R':
sign = "-";
lastChar = (char) (lastChar - negativeDiff);
default:
}
ret = sign + ret.substring(0, ret.length() - 1) + lastChar;
For German-EBCDIC {} become äü, for other EBCDIC-Language you would need lookup the appropriate coded page.
For Ascii Zoned this is the java code
int positiveFjDiff = '@' - '0';
int negativeFjDiff = 'P' - '0';
lastChar = ret.substring(ret.length() - 1).toUpperCase().charAt(0);
switch (lastChar) {
case '@':
case 'A':
case 'B':
case 'C':
case 'D':
case 'E':
case 'F':
case 'G':
case 'H':
case 'I':
lastChar = (char) (lastChar - positiveFjDiff);
break;
case 'P':
case 'Q':
case 'R':
case 'S':
case 'T':
case 'U':
case 'V':
case 'W':
case 'X':
case 'Y':
sign = "-";
lastChar = (char) (lastChar - negativeFjDiff);
default:
}
ret = sign + ret.substring(0, ret.length() - 1) + lastChar;
Finally if you are working in EBCDIC you can calculate it like
sign = '+'
if (last_digit & x'F0' == x'D0') {
sign = '-'
}
last_digit = last_digit | x'F0'
One last problem is decimal points are not stored in a Zoned, decimal they are assumed. You need to look at the Cobol-Copybook.
e.g.
if the cobol Copybook is
03 fld pic s99999.
123 is stored as 0012C (EBCDIC source)
but if the copybook is (v stands for assumed decimal point)
03 fld pic s999v99.
then 123 is stored as 1230{
It would be best to do the translated in Cobol !!! or using a Cobol Translation packages.
There are several Commercial Packages for handling Cobol Data, they tend to be expensive.
There are some Java are some open source packages that can deal with Mainframe Cobol Data.
Presumably in the specification for the file or your program you are told how to deal with this? No?
As Bruce Martin has said, a true Overpunch goes back to the days of punched-cards. You punched the final digit of a number, then re-punched (overpunched) the same position on the card.
The link to the Wiki that you included in your question is fine for that. But I'm pretty sure the source of your data is not punched-cards.
Although part of this answer presumes you are using a Mainframe, the solution proposed is machine-independent.
The source of your data is a Mainframe? We don't know, although it is important information. For the moment, let's assume it is so.
Unless it is very old data which is unchanging, it has been processed on the Mainframe in the last 20 years. Unless the compiler used (assuming it has come from a COBOL program) is very, very old, then you need to know the setting of compiler option NUMPROC
. Here's why: http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/igy3pg50/2.4.36?DT=20090820210412
Default is: NUMPROC(NOPFD)
Abbreviations are: None
The compiler accepts any valid sign configuration: X'A', X'B', X'C',
X'D', X'E', or X'F'. NUMPROC(NOPFD) is the recommended option in most
cases.
NUMPROC(PFD) improves the performance of processing numeric internal
decimal and zoned decimal data. Use this option only if your program
data agrees exactly with the following IBM system standards:
Zoned decimal, unsigned: High-order 4 bits of the sign byte contain
X'F'.
Zoned decimal, signed overpunch: High-order 4 bits of the sign byte
contain X'C' if the number is positive or 0, and X'D' if it is not.
Zoned decimal, separate sign: Separate sign contains the character '+'
if the number is positive or 0, and '-' if it is not.
Internal decimal, unsigned: Low-order 4 bits of the low-order byte
contain X'F'.
Internal decimal, signed: Low-order 4 bits of the low-order byte
contain X'C' if the number is positive or 0, and X'D' if it is not.
Data produced by COBOL arithmetic statements conforms to the above IBM
system standards. However, using REDEFINES and group moves could
change data so that it no longer conforms. If you use NUMPROC(PFD),
use the INITIALIZE statement to initialize data fields, rather than
using group moves.
Using NUMPROC(PFD) can affect class tests for numeric data. You should
use NUMPROC(NOPFD) or NUMPROC(MIG) if a COBOL program calls programs
written in PL/I or FORTRAN.
Sign representation is affected not only by the NUMPROC option, but
also by the installation-time option NUMCLS.
Use NUMPROC(MIG) to aid in migrating OS/VS COBOL programs to
Enterprise COBOL. When NUMPROC(MIG) is in effect, the following
processing occurs:
Preferred signs are created only on the output of MOVE statements and arithmetic operations.
No explicit sign repair is done on input.
Some implicit sign repair might occur during conversion.
Numeric comparisons are performed by a decimal comparison, not a logical comparison.
What does that mean to you? If NUMPROC(NOPFD) is being used, you may see X'A' through X'F' in the high-order nybble of the final byte of the field. If NUMPROC(PFD) is being used you shouldn't see anything other that X'C' or X'D' in that position.
Note that if the file you are receiving has been generated by the installed Mainframe SORT product, you have the same potential issue.
may and shouldn't are not good things to see in a specification.
Is your data remotely business-critical in a financial environment? Then you almost certainly have issues of audit and compliance. It runs something like this:
Auditor, "What do you do with the data when you receive it?"
You, "The first thing I do is change it"
Auditor, "Really? How do you verify the data once you have changed it?"
You, "Errr..."
You might get lucky and never have an auditor look at it.
All those non-deterministic words aren't very good for programming.
So how do you get around it?
There should be no fields on the data that you receive which have embedded signs. There should be no numeric fields which are not represented as character data (no binary, packed, or floating-point). If a field is signed, the sign should be presented separately. If a field has decimal places, an actual .
or ,
(depending on home-country of the system) should be provided, or as an alternative a separate field with a scaling-factor.
Is this difficult for your Mainframe people to do? Not remotely. Insist on it. If they will not do it, document it such that problems arising are not yours, but theirs.
If all numeric data presented to you is plain character data (plus, minus, comma, digits 0 to 9) then you will have absolutely no problem in understanding the data, whether it is any variant of EBCDIC or any variant of ASCII.
Be aware that any fields with decimal-places coming from COBOL are exact decimal amounts. Do not store/use them in anything other than fields in your language which can processes exact decimal amounts.
You don't provide any sample data. So here's a sample:
123456{
This should be represented to yous as:
+1234560
If it has two decimal places:
+12345.60
or
+12345602 (where the trailing 2 is a scaling-factor, which you validate)
If numeric data is to be transferred from external systems, it should always be done in character format. It will make everything so much easier to code, understand, maintain, and audit.
Zoned decimal is easy and requires no char manipulation.
private int ConvertOverpunch(byte[] number)
{
// Works for EBCDIC or ASCII, all charsets
int rtnVal = 0;
for(int i = 0; i<number.length; i++)
{
int digit = 0x0f & number[i];
rtnVal = (rtnVal * 10) + digit;
}
// Extract sign
// This works in EBCDIC
// Need to find out what your sign is in ASCII
if(0xD0 & number[number.length-1])
{
rtnVal *= -1;
}
return rtnVal;
}
Here are two other approaches, so you have more alternatives to choose from:
public static int Overpunch2Int_v1(string number)
{
number = number.ToLower();
char last = number.Last();
number = number.Substring(0, number.Length - 1);
if (last == '}' || (last >= 'j' && last <= 'r'))
{
number = "-" + number;
if (last == '}')
number += "0";
else
number += (char)(last - 'j' + '1');
}
else if (last == '{' || (last >= 'a' && last <= 'i'))
{
if (last == '{')
number += "0";
else
number += (char)(last - 'a' + '1');
}
return Int32.Parse(number);
}
public static int Overpunch2Int_v2(string number)
{
number = number.ToLower();
char last = number.Last();
number = number.Substring(0, number.Length - 1);
if (last >= '{')
number = (last == '}'? "-" : "") + number + "0";
else if (last >= 'a' && last <= 'r')
{
bool isNegative = last >= 'j';
char baseChar = isNegative ? 'j' : 'a';
number = (isNegative ? "-" : "") + number + (char)(last - baseChar + '1');
}
return Int32.Parse(number);
}
Please note that both methods don't validate the string and expect a valid number.
If you don't have enough already here is another option using an extension method, you could make this better by using some of the ideas in the other posts.
/// <summary>
/// Extension method to get overpunch value
/// </summary>
/// <param name="number">the text to convert</param>
/// <returns>int</returns>
public static int OverpunchValue(this String number)
{
int returnValue;
var ovpValue = OverPunchValues.Instance.OverPunchValueCollection.First(o => o.OverpunchCharacter ==
Convert.ToChar(number.Substring(number.Length - 1)));
returnValue = Convert.ToInt32(number.Substring(0, number.Length - 1) + ovpValue.NumericalValue.ToString());
return ovpValue.IsNegative ? returnValue * -1 : returnValue;
}
/*singleton to store values */
public class OverPunchValues
{
public List<OverPunchValue> OverPunchValueCollection { get; set; }
private OverPunchValues()
{
OverPunchValueCollection = new List<OverPunchValue>();
OverPunchValueCollection.Add(new OverPunchValue { OverpunchCharacter = '{', IsNegative = true, NumericalValue = 0 });
OverPunchValueCollection.Add(new OverPunchValue { OverpunchCharacter = 'J', IsNegative = true, NumericalValue = 1 });
//add the rest of the values here...
}
static readonly OverPunchValues _instance = new OverPunchValues();
public static OverPunchValues Instance
{
get { return _instance; }
}
}
public class OverPunchValue
{
public char OverpunchCharacter { get; set; }
public bool IsNegative { get; set; }
public int NumericalValue { get; set; }
public OverPunchValue()
{
}
}
And then you can call it like:
string str = "00345{";
int temp = str.OverpunchValue();
private int ConvertOverpunch(string number)
{
number = number.ToLower();
Regex r = new Regex("}|j|k|l|m|n|o|p|q|r");
if(r.IsMatch(number))
{
number = "-" + number;
}
number = number.Replace('}', '0');
number = number.Replace('j', '1');
number = number.Replace('k', '2');
number = number.Replace('l', '3');
number = number.Replace('m', '4');
number = number.Replace('n', '5');
number = number.Replace('o', '6');
number = number.Replace('p', '7');
number = number.Replace('q', '8');
number = number.Replace('r', '9');
number = number.Replace('{', '0');
number = number.Replace('a', '1');
number = number.Replace('b', '2');
number = number.Replace('c', '3');
number = number.Replace('d', '4');
number = number.Replace('e', '5');
number = number.Replace('f', '6');
number = number.Replace('g', '7');
number = number.Replace('h', '8');
number = number.Replace('i', '9');
try
{
int intNumber = Convert.ToInt32(number);
return intNumber;
}
catch
{
return 0;
}
}
Made this from the top of my head, no testing has been done.
I just wanted to chime in here, as I have written a class to handle these. I wrote it before I knew the name "Signed Overpunch", so I called it "packed-sign". The advantage of my approach is that it is actually a Java NumberFormatter, so that it is easy to use with any framework that uses java.lang.Number or java.text.NumberFormat
Anyone with more experience with dealing with these signed overpunch numbers, please feel free to open a pull request to make my implementation more compatible with different encodings/variations etc.
GitHub Repo