Parser for signed overpunch values?

2020-03-03 09:15发布

站内文章 / C#

114 0

放荡不羁爱自由

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am working with some old data imports and came across a bunch of data from an external source that reports financial numbers with a signed overpunch. I've seen alot, but this is before my time. Before I go about creating a function to parse these strangers, I wanted to check to see if there was a standard way to handle these.

I guess my question is, does the .Net framework provide a standard facility for converting signed overpunch strings? If not .NET, are there any third party tools I can use so I don't reinvent the wheel?

回答1:

Over-punched numeric (Zoned-Decimal in Cobol) comes from the old-punched cards where they over-punched the sign on the last digit in a number. The format is commonly used in Cobol.

As there are both Ascii and Ebcdic Cobol compilers, there are both Ascii and EBCDIC versions of the Zoned-Numeric. To make it even more complicated, the -0 and +0 values ({} for US-Ebcdic (IBM037) are different for say German-Ebcdic (IBM273 where they are äü) and different again in other Ebcdic language versions).

To process successfully, You need to know:

Did the data originate in a Ebcdic or Ascii system
if Ebcdic - which language US, German etc

If the data is in the original character set, you can calculate the sign by

For EBCDIC the numeric hex codes are:

Digit          0     1     2   ..    9

unsigned:   x'F0' x'F1' x'F2'  .. x'F9'     012 .. 9 
Negative:   x'D0' x'D1' x'D2'  .. x'D9'     }JK .. R
Positive:   x'C0' x'C1' x'C2'  .. x'C9'     {AB .. I

For US-Ebcdic Zoned this is the java code to convert a string:

int positiveDiff = 'A' - '1';
int negativeDiff = 'J' - '1';

lastChar = ret.substring(ret.length() - 1).toUpperCase().charAt(0);

    switch (lastChar) {
        case '}' : sign = "-";
        case '{' :
            lastChar = '0';
        break;
        case 'A':
        case 'B':
        case 'C':
        case 'D':
        case 'E':
        case 'F':
        case 'G':
        case 'H':
        case 'I':
            lastChar = (char) (lastChar - positiveDiff);
        break;
        case 'J':
        case 'K':
        case 'L':
        case 'M':
        case 'N':
        case 'O':
        case 'P':
        case 'Q':
        case 'R':
            sign = "-";
            lastChar = (char) (lastChar - negativeDiff);
        default:
    }
    ret = sign + ret.substring(0, ret.length() - 1) + lastChar;

For German-EBCDIC {} become äü, for other EBCDIC-Language you would need lookup the appropriate coded page.

For Ascii Zoned this is the java code

    int positiveFjDiff = '@' - '0';
    int negativeFjDiff = 'P' - '0';

    lastChar = ret.substring(ret.length() - 1).toUpperCase().charAt(0);

    switch (lastChar) {
        case '@':
        case 'A':
        case 'B':
        case 'C':
        case 'D':
        case 'E':
        case 'F':
        case 'G':
        case 'H':
        case 'I':
            lastChar = (char) (lastChar - positiveFjDiff);
        break;
        case 'P':
        case 'Q':
        case 'R':
        case 'S':
        case 'T':
        case 'U':
        case 'V':
        case 'W':
        case 'X':
        case 'Y':
            sign = "-";
            lastChar = (char) (lastChar - negativeFjDiff);
        default:
    }
    ret = sign + ret.substring(0, ret.length() - 1) + lastChar;

Finally if you are working in EBCDIC you can calculate it like

sign = '+'
if (last_digit & x'F0' == x'D0') {
   sign = '-' 
} 
last_digit = last_digit | x'F0'

One last problem is decimal points are not stored in a Zoned, decimal they are assumed. You need to look at the Cobol-Copybook.

e.g.

 if the cobol Copybook is

    03 fld                 pic s99999.

 123 is stored as     0012C (EBCDIC source)

 but if the copybook is (v stands for assumed decimal point) 

   03 fld                  pic s999v99.

 then 123 is stored as 1230{

It would be best to do the translated in Cobol !!! or using a Cobol Translation packages.

There are several Commercial Packages for handling Cobol Data, they tend to be expensive. There are some Java are some open source packages that can deal with Mainframe Cobol Data.

回答2:

Presumably in the specification for the file or your program you are told how to deal with this? No?

As Bruce Martin has said, a true Overpunch goes back to the days of punched-cards. You punched the final digit of a number, then re-punched (overpunched) the same position on the card.

The link to the Wiki that you included in your question is fine for that. But I'm pretty sure the source of your data is not punched-cards.

Although part of this answer presumes you are using a Mainframe, the solution proposed is machine-independent.

The source of your data is a Mainframe? We don't know, although it is important information. For the moment, let's assume it is so.

Unless it is very old data which is unchanging, it has been processed on the Mainframe in the last 20 years. Unless the compiler used (assuming it has come from a COBOL program) is very, very old, then you need to know the setting of compiler option NUMPROC. Here's why: http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/igy3pg50/2.4.36?DT=20090820210412

Default is: NUMPROC(NOPFD)

Abbreviations are: None

The compiler accepts any valid sign configuration: X'A', X'B', X'C', X'D', X'E', or X'F'. NUMPROC(NOPFD) is the recommended option in most cases.

NUMPROC(PFD) improves the performance of processing numeric internal decimal and zoned decimal data. Use this option only if your program data agrees exactly with the following IBM system standards:

Zoned decimal, unsigned: High-order 4 bits of the sign byte contain X'F'.

Zoned decimal, signed overpunch: High-order 4 bits of the sign byte contain X'C' if the number is positive or 0, and X'D' if it is not.

Zoned decimal, separate sign: Separate sign contains the character '+' if the number is positive or 0, and '-' if it is not.

Internal decimal, unsigned: Low-order 4 bits of the low-order byte contain X'F'.

Internal decimal, signed: Low-order 4 bits of the low-order byte contain X'C' if the number is positive or 0, and X'D' if it is not.

Data produced by COBOL arithmetic statements conforms to the above IBM system standards. However, using REDEFINES and group moves could change data so that it no longer conforms. If you use NUMPROC(PFD), use the INITIALIZE statement to initialize data fields, rather than using group moves.

Using NUMPROC(PFD) can affect class tests for numeric data. You should use NUMPROC(NOPFD) or NUMPROC(MIG) if a COBOL program calls programs written in PL/I or FORTRAN.

Sign representation is affected not only by the NUMPROC option, but also by the installation-time option NUMCLS.

Use NUMPROC(MIG) to aid in migrating OS/VS COBOL programs to Enterprise COBOL. When NUMPROC(MIG) is in effect, the following processing occurs:
Preferred signs are created only on the output of MOVE statements and arithmetic operations.

No explicit sign repair is done on input.

Some implicit sign repair might occur during conversion.

Numeric comparisons are performed by a decimal comparison, not a logical comparison.

What does that mean to you? If NUMPROC(NOPFD) is being used, you may see X'A' through X'F' in the high-order nybble of the final byte of the field. If NUMPROC(PFD) is being used you shouldn't see anything other that X'C' or X'D' in that position.

Note that if the file you are receiving has been generated by the installed Mainframe SORT product, you have the same potential issue.

may and shouldn't are not good things to see in a specification.

Is your data remotely business-critical in a financial environment? Then you almost certainly have issues of audit and compliance. It runs something like this:

Auditor, "What do you do with the data when you receive it?"
You, "The first thing I do is change it"
Auditor, "Really? How do you verify the data once you have changed it?"
You, "Errr..."

You might get lucky and never have an auditor look at it.

All those non-deterministic words aren't very good for programming.

So how do you get around it?

There should be no fields on the data that you receive which have embedded signs. There should be no numeric fields which are not represented as character data (no binary, packed, or floating-point). If a field is signed, the sign should be presented separately. If a field has decimal places, an actual . or , (depending on home-country of the system) should be provided, or as an alternative a separate field with a scaling-factor.

Is this difficult for your Mainframe people to do? Not remotely. Insist on it. If they will not do it, document it such that problems arising are not yours, but theirs.

If all numeric data presented to you is plain character data (plus, minus, comma, digits 0 to 9) then you will have absolutely no problem in understanding the data, whether it is any variant of EBCDIC or any variant of ASCII.

Be aware that any fields with decimal-places coming from COBOL are exact decimal amounts. Do not store/use them in anything other than fields in your language which can processes exact decimal amounts.

You don't provide any sample data. So here's a sample:

123456{

This should be represented to yous as:

+1234560

If it has two decimal places:

+12345.60
or
+12345602 (where the trailing 2 is a scaling-factor, which you validate)

If numeric data is to be transferred from external systems, it should always be done in character format. It will make everything so much easier to code, understand, maintain, and audit.

回答3:

Zoned decimal is easy and requires no char manipulation.

private int ConvertOverpunch(byte[] number)
{
    // Works for EBCDIC or ASCII, all charsets
    int rtnVal = 0;
    for(int i = 0; i<number.length; i++)
    {
       int digit = 0x0f & number[i];
       rtnVal = (rtnVal * 10) + digit;
    }

    // Extract sign
    // This works in EBCDIC
    // Need to find out what your sign is in ASCII
    if(0xD0 & number[number.length-1])
    {
       rtnVal *= -1;
    }   

    return rtnVal;
}

回答4:

Here are two other approaches, so you have more alternatives to choose from:

public static int Overpunch2Int_v1(string number)
{
    number = number.ToLower();
    char last = number.Last();
    number = number.Substring(0, number.Length - 1);
    if (last == '}' || (last >= 'j' && last <= 'r'))
    {
        number = "-" + number;
        if (last == '}')
            number += "0";
        else
            number += (char)(last - 'j' + '1');
    }
    else if (last == '{' || (last >= 'a' && last <= 'i'))
    {
        if (last == '{')
            number += "0";
        else
            number += (char)(last - 'a' + '1');
    }

    return Int32.Parse(number);
}

public static int Overpunch2Int_v2(string number)
{
    number = number.ToLower();
    char last = number.Last();
    number = number.Substring(0, number.Length - 1);

    if (last >= '{')
        number = (last == '}'? "-" : "") + number + "0";
    else if (last >= 'a' && last <= 'r')
    {
        bool isNegative = last >= 'j';
        char baseChar = isNegative ? 'j' : 'a';
        number = (isNegative ? "-" : "") + number + (char)(last - baseChar + '1');
    }

    return Int32.Parse(number);
}

Please note that both methods don't validate the string and expect a valid number.

回答5:

If you don't have enough already here is another option using an extension method, you could make this better by using some of the ideas in the other posts.

/// <summary>
/// Extension method to get overpunch value
/// </summary>
/// <param name="number">the text to convert</param>
/// <returns>int</returns>
public static int OverpunchValue(this String number)
{
    int returnValue;

    var ovpValue = OverPunchValues.Instance.OverPunchValueCollection.First(o => o.OverpunchCharacter ==
        Convert.ToChar(number.Substring(number.Length - 1)));

    returnValue = Convert.ToInt32(number.Substring(0, number.Length - 1) + ovpValue.NumericalValue.ToString());

    return ovpValue.IsNegative ? returnValue * -1 : returnValue;
}

/*singleton to store values */
public class OverPunchValues
{
    public List<OverPunchValue> OverPunchValueCollection { get; set; }

    private OverPunchValues()
    {
        OverPunchValueCollection = new List<OverPunchValue>();
        OverPunchValueCollection.Add(new OverPunchValue { OverpunchCharacter = '{', IsNegative = true, NumericalValue = 0 });
        OverPunchValueCollection.Add(new OverPunchValue { OverpunchCharacter = 'J', IsNegative = true, NumericalValue = 1 });
        //add the rest of the values here...
    }

    static readonly OverPunchValues _instance = new OverPunchValues();

    public static OverPunchValues Instance
    {
        get { return _instance; }
    }
}

public class OverPunchValue
{
    public char OverpunchCharacter { get; set; }
    public bool IsNegative { get; set; }
    public int NumericalValue { get; set; }

    public OverPunchValue()
    {

    }            
}

And then you can call it like:

string str = "00345{";

int temp = str.OverpunchValue();

回答6:

private int ConvertOverpunch(string number)
    {
        number = number.ToLower();
        Regex r = new Regex("}|j|k|l|m|n|o|p|q|r");
        if(r.IsMatch(number))
        {
            number = "-" + number;
        }
        number = number.Replace('}', '0');
        number = number.Replace('j', '1');
        number = number.Replace('k', '2');
        number = number.Replace('l', '3');
        number = number.Replace('m', '4');
        number = number.Replace('n', '5');
        number = number.Replace('o', '6');
        number = number.Replace('p', '7');
        number = number.Replace('q', '8');
        number = number.Replace('r', '9');

        number = number.Replace('{', '0');
        number = number.Replace('a', '1');
        number = number.Replace('b', '2');
        number = number.Replace('c', '3');
        number = number.Replace('d', '4');
        number = number.Replace('e', '5');
        number = number.Replace('f', '6');
        number = number.Replace('g', '7');
        number = number.Replace('h', '8');
        number = number.Replace('i', '9');

        try
        {
            int intNumber = Convert.ToInt32(number);
            return intNumber;
        }
        catch 
        {
            return 0;
        }
    }

Made this from the top of my head, no testing has been done.

回答7:

I just wanted to chime in here, as I have written a class to handle these. I wrote it before I knew the name "Signed Overpunch", so I called it "packed-sign". The advantage of my approach is that it is actually a Java NumberFormatter, so that it is easy to use with any framework that uses java.lang.Number or java.text.NumberFormat Anyone with more experience with dealing with these signed overpunch numbers, please feel free to open a pull request to make my implementation more compatible with different encodings/variations etc. GitHub Repo