可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm doing some audio processing with float. The result needs to be converted back to PCM samples, and I noticed that the cast from float to int is surprisingly expensive. Whats furthermore frustrating that I need to clip the result to the range of a short (-32768 to 32767). While I would normally instictively assume that this could be assured by simply casting float to short, this fails miserably in Java, since on the bytecode level it results in F2I followed by I2S. So instead of a simple:

int sample = (short) flotVal;

I needed to resort to this ugly sequence:

int sample = (int) floatVal;
if (sample > 32767) {
    sample = 32767;
} else if (sample < -32768) {
    sample = -32768;
}

Is there a faster way to do this?

(about ~6% of the total runtime seems to be spent on casting, while 6% seem to be not that much at first glance, its astounding when I consider that the processing part involves a good chunk of matrix multiplications and IDCT)

EDIT The cast/clipping code above is (not surprisingly) in the body of a loop that reads float values from a float[] and puts them into a byte[]. I have a test suite that measures total runtime on several test cases (processing about 200MB of raw audio data). The 6% were concluded from the runtime difference when the cast assignment "int sample = (int) floatVal" was replaced by assigning the loop index to sample.
EDIT @leopoldkot: I'm aware of the truncation in Java, as stated in the original question (F2I, I2S bytecode sequence). I only tried the cast to short because I assumed that Java had an F2S bytecode, which it unfortunately does not (comming originally from an 68K assembly background, where a simple "fmove.w FP0, D0" would have done exactly what I wanted).

回答1:

You could turn two comparisons into one for values which are in range. This could halve the cost. Currently you perform only one comparison if the value is too negative. (which might not be your typical case)

if (sample + 0x7fff8000 < 0x7fff0000)
    sample = sample < 0 ? -32768 : 32767;

回答2:

When you cast int to short you never get clipping functionality, bits are truncated and then are read as short. E.g. (short)-40000 becomes 25536, and not -32768 as you expected.

Probably you have to edit you question, I am sure you know it if you disassembled bytecode. Also, there is a JIT compiler which might optimize this code (because it is called often) to platform dependent instructions.

Please convert this answer to comment.

回答3:

float to int conversions is one of the slowest operations you can do on an x86 processor, as it requires modifying the x87 rounding modes (twice), which serializes and flushes the processor. You can get a sizeable speedup if you can use SSE instructions instead of x87 instructions, but I have no idea if there's a way to do that in java. Perhaps try using an x86_64 JVM?

回答4:

This is Python, but should be easy to convert. I don't know how costly the floating point operations are, but if you can keep it in integer registers you might have some boost; this assumes that you can reinterpret the IEEE754 bits as an int. (That's what my poorly named float2hex is doing.)

import struct

def float2hex(v):
    s = struct.pack('f', v)
    h = struct.unpack('I', s)[0]
    return h

def ToInt(f):
    h = float2hex(f)
    s = h >> 31
    exp = h >> 23 & 0xFF
    mantissa = h & 0x7FFFFF
    exp = exp - 126
    if exp >= 16:
        if s:
            v = -32768
        else:
            v = 32767
    elif exp < 0:
        v = 0
    else:
        v = mantissa | (1 << 23)
        exp -= 24
        if exp > 0:
            v = v << exp
        elif exp < 0:
            v = v >> -exp

        if s:
            v = -v

    print v

This branching may kill you, but maybe this provides something useful anyway? This rounds toward zero.