Java NIO scan through ByteBuffer for certain bytes

2019-03-02 07:57发布

问题:

Okay, so I'm trying to do something that seemed like it should be fairly simple, but with these new NIO interfaces, things are confusing the hell out of me! Here's what I'm trying to do, I need to scan through a file as bytes until encountering certain bytes! When I encounter those certain bytes, need to grab that segment of the data and do something with it, and then move on and do this again. I would have thought that with all these markers and positions and limits in ByteBuffer, I'd be able to do this, but I can't seem make it work! Here's what I have so far..

test.text:

this is a line of text a
this is line 2b
line 3
line 4
line etc.etc.etc.

Test.java:

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class Test {
    public static final Charset ENCODING = Charset.forName("UTF-8");
    public static final byte[] NEWLINE_BYTE = {0x0A, 0x0D};

    public Test() {

        String pathString = "test.txt";

        //the path to the file
        Path path = Paths.get(pathString);

        try (FileChannel fc = FileChannel.open(path, 
                StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {            
            if (fc.size() > 0) {
                int n;
                ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
                do {                    
                    n = fc.read(buffer);
                } while (n != -1 && buffer.hasRemaining());
                buffer.flip();
                int pos = 0;
                System.out.println("FILE LOADED: |" + new String(buffer.array(), ENCODING) + "|");
                do {
                    byte b = buffer.get();
                    if (b == NEWLINE_BYTE[0] || b == NEWLINE_BYTE[1]) {
                        System.out.println("POS: " + pos);
                        System.out.println("POSITION: " + buffer.position());
                        System.out.println("LENGTH: " + Integer.toString(buffer.position() - pos));
                        ByteBuffer lineBuffer = ByteBuffer.wrap(buffer.array(), pos + 1, buffer.position() - pos);
                        System.out.println("LINE: |" + new String(lineBuffer.array(), ENCODING) + "|");
                        pos = buffer.position();
                    }
                } while (buffer.hasRemaining());
            } 
        } catch (IOException ioe) {
           ioe.printStackTrace();
        }
    }
    public static void main(String args[]) {
        Test t = new Test();
    }
}

So the first part is working, the fc.read(buffer) function only ever runs once and pulls the entire file into the ByteBuffer. Then in the second do loop, I'm able to loop through byte by byte just fine and it does hit the if statement when it hits a \n(or \r), but then I can't figure out how to get that PORTION of the bytes I've just looked through into a separate byte array to work with! I've tried splice and various flips, and I've tried wrap as shown in the code above, but can't seem to make it work, both buffers alway have the complete file and so does anything I splice or wrap off it!

I just need to loop through the file byte by byte, looking at a certain section at a time, and then my end goal, when I've looked through and found the right spot, I want to insert some data to the right spot! I need that lineBuffer as outputted at "LINE: " to have ONLY the portion of the bytes I've looped through so far! Help and thank you!

回答1:

Leaving the I/O aside, once you have content in the ByteBuffer it would be a lot simpler to convert it to a CharBuffer via asCharBuffer(). Then CharBuffer implements CharSequence, which gives you a lot of String and regex methods to use.



回答2:

Here is the solution I ended up with, using the bulk relative get function of ByteBuffer to get the chunk each time. I think I'm using the mark() functionality as it's intended, though am using an additional variable (pos) to keep track of the mark since I can't find a function in ByteBuffer to return the relative position of the mark itself. Also, I've got explicit functionality to look for either \r, \n, or both in sequence. Keep in mind this code will only work on UTF-8 encoded data. I hope this helps someone else.

public class Test {
    public static final Charset ENCODING = Charset.forName("UTF-8");
    public static final byte[] NEWLINE_BYTES = {0x0A, 0x0D};

    public Test() {
        //test text file sequence of any strings followed by newline
        String pathString = "test.txt";
        Path path = Paths.get(pathString);

        try (FileChannel fc = FileChannel.open(path, 
                StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {

            if (fc.size() > 0) {
                int n;
                ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
                do {                    
                    n = fc.read(buffer);
                } while (n != -1 && buffer.hasRemaining());
                buffer.flip();
                int newlineByteCount = 0;
                buffer.mark();
                do {
                    //get one byte at a time
                    byte b = buffer.get();

                    if (b == NEWLINE_BYTES[0] || b == NEWLINE_BYTES[1]) {
                        newlineByteCount++;

                        byte nextByte = buffer.get();
                        if (nextByte == NEWLINE_BYTES[1]) {
                            newlineByteCount++;
                        } else {
                            buffer.position(buffer.position() - 1);
                        }

                        int pos = buffer.position();
                        //reset the buffer back to the mark() position
                        buffer.reset();
                        //create an array just the right length and get the bytes we just measured out 
                        int length = pos - buffer.position() - newlineByteCount;
                        byte[] lineBytes = new byte[length];
                        buffer.get(lineBytes, 0, length);

                        String lineString = new String(lineBytes, ENCODING);
                        System.out.println("LINE: " + lineString);

                        buffer.position(buffer.position() + newlineByteCount);

                        buffer.mark();
                        newlineByteCount = 0;
                    } else if (newlineByteCount > 0) {

                    }
                } while (buffer.hasRemaining());
            } 
        } catch (IOException ioe) { ioe.printStackTrace(); }
    }
    public static void main(String args[]) { new Test(); }
}


回答3:

I needed something similar but more general than splitting a single buffer. In my case, I've multiple buffers; in fact, my code is a modification of Spring StringDecoder that can convert a Flux<DataBuffer>(DataBuffer) to Flux<String>.

https://stackoverflow.com/a/48111196/839733