Performance: float to int cast and clipping result

2020-07-24 01:35发布

I'm doing some audio processing with float. The result needs to be converted back to PCM samples, and I noticed that the cast from float to int is surprisingly expensive. Whats furthermore frustrating that I need to clip the result to the range of a short (-32768 to 32767). While I would normally instictively assume that this could be assured by simply casting float to short, this fails miserably in Java, since on the bytecode level it results in F2I followed by I2S. So instead of a simple:

int sample = (short) flotVal;

I needed to resort to this ugly sequence:

int sample = (int) floatVal;
if (sample > 32767) {
    sample = 32767;
} else if (sample < -32768) {
    sample = -32768;
}

Is there a faster way to do this?

(about ~6% of the total runtime seems to be spent on casting, while 6% seem to be not that much at first glance, its astounding when I consider that the processing part involves a good chunk of matrix multiplications and IDCT)

  • EDIT The cast/clipping code above is (not surprisingly) in the body of a loop that reads float values from a float[] and puts them into a byte[]. I have a test suite that measures total runtime on several test cases (processing about 200MB of raw audio data). The 6% were concluded from the runtime difference when the cast assignment "int sample = (int) floatVal" was replaced by assigning the loop index to sample.

  • EDIT @leopoldkot: I'm aware of the truncation in Java, as stated in the original question (F2I, I2S bytecode sequence). I only tried the cast to short because I assumed that Java had an F2S bytecode, which it unfortunately does not (comming originally from an 68K assembly background, where a simple "fmove.w FP0, D0" would have done exactly what I wanted).

5条回答
SAY GOODBYE
2楼-- · 2020-07-24 02:13

When you cast int to short you never get clipping functionality, bits are truncated and then are read as short. E.g. (short)-40000 becomes 25536, and not -32768 as you expected.

Probably you have to edit you question, I am sure you know it if you disassembled bytecode. Also, there is a JIT compiler which might optimize this code (because it is called often) to platform dependent instructions.

Please convert this answer to comment.

查看更多
啃猪蹄的小仙女
3楼-- · 2020-07-24 02:17

This is Python, but should be easy to convert. I don't know how costly the floating point operations are, but if you can keep it in integer registers you might have some boost; this assumes that you can reinterpret the IEEE754 bits as an int. (That's what my poorly named float2hex is doing.)

import struct

def float2hex(v):
    s = struct.pack('f', v)
    h = struct.unpack('I', s)[0]
    return h

def ToInt(f):
    h = float2hex(f)
    s = h >> 31
    exp = h >> 23 & 0xFF
    mantissa = h & 0x7FFFFF
    exp = exp - 126
    if exp >= 16:
        if s:
            v = -32768
        else:
            v = 32767
    elif exp < 0:
        v = 0
    else:
        v = mantissa | (1 << 23)
        exp -= 24
        if exp > 0:
            v = v << exp
        elif exp < 0:
            v = v >> -exp

        if s:
            v = -v

    print v

This branching may kill you, but maybe this provides something useful anyway? This rounds toward zero.

查看更多
Summer. ? 凉城
4楼-- · 2020-07-24 02:18

float to int conversions is one of the slowest operations you can do on an x86 processor, as it requires modifying the x87 rounding modes (twice), which serializes and flushes the processor. You can get a sizeable speedup if you can use SSE instructions instead of x87 instructions, but I have no idea if there's a way to do that in java. Perhaps try using an x86_64 JVM?

查看更多
乱世女痞
5楼-- · 2020-07-24 02:22

You could turn two comparisons into one for values which are in range. This could halve the cost. Currently you perform only one comparison if the value is too negative. (which might not be your typical case)

if (sample + 0x7fff8000 < 0x7fff0000)
    sample = sample < 0 ? -32768 : 32767;
查看更多
虎瘦雄心在
6楼-- · 2020-07-24 02:24

int sample = ((int)floatval) & 0xffff;

查看更多
登录 后发表回答