Trying to read binary file as text but scanner sto

2019-03-03 06:00发布

问题:

I'm trying to read a binary file but my program just stops at first line.. I think it's because of the strange characters the file has..I just want to extract some directions from it. Is there a way to do this?..

public static void main(String[] args) throws IOException
{

    Scanner readF = new Scanner(new File("D:\\CurrentDatabase_372.txt"));
    String line = null;
    String newLine = System.getProperty("line.separator");
    FileWriter writeF = new FileWriter("D:\\Songs.txt");

    while (readF.hasNext())
    {
        line = readF.nextLine();

        if (line.contains("D:\\") && line.contains(".mp3"))
        {
            writeF.write(line.substring(line.indexOf("D:\\"), line.indexOf(".mp3") + 4) + newLine);
        }
    }

    readF.close();
    writeF.close();
}

The file starts like this:

pppppamepD:\Music\Korn\Untouchables\03     Blame.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables003pMetalKornUntouchables003pBlameKornUntouchables003pKornKornUntouchables003pMP3pppppCpppÀppp@ppøp·pppŸú#pdppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒppp’ÍpET?ppppppôpp¼}`Ñ#ãâK†¡H¤*(DppppppppppppppppuÞѤéú:M®$@]jkÝW0ÛœFµú½XVNp`w—wâÊp:ºŽwâÊpppp8Npdpp¡pp{)pppppppppppppppppyY:¸[ªA¥Bi   `Û¯pppppppppppp2pppppppppppppppppppppppppppppppppppp¿ÞpAppppppp€ppp€;€?€CpCpC€H€N€S€`€e€y€~p~p~€’€«€Ê€â€Hollow LifepD:\Musica\Korn\Untouchables\04 Hollow Life.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables004pMetalKornUntouchables004pHollow LifeKornUntouchables004pKornKornUntouchables004pMP3pppppCpppÀHppppppøp¸pppǺxp‰ppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒpppŠºppppppppppôpp¼}`Ñ#ãâK†¡H¤*(DpppppppppppppppppãG#™R‚CA—®þ^bN °mbŽ‚^¨pG¦sp;5p5ÓÐùšwâÊp
)ŽwâÊpppp8Npdpp!cpp{pppppppppppppppppyY:¸[ªA¥Bi `ۯǺxp‰pppppp2pppppppppppppppppppppppppppppppppppp¿

I want to extract file directions like "D:\Music\Korn\Untouchables\03 Blame.mp3".

回答1:

You cannot use a line-oriented scanner to read binary files. You have no guarantee that the binary file even has "lines" delimited by newline characters. For example, what would your scanner do if there were TWO files matching the pattern "D:\.*.mp3" with no intervening newline? You would extract everything between the first "D:\" and the last ".mp3", with all the garbage in between. Extracting file names from a non-delimited stream such as this requires a different strategy.

If i were writing this I'd use a relatively simple finite-state recognizer that processes characters one at a time. When it encounters a "d" it starts saving characters, checking each character to ensure that it matches the required pattern, ending when it sees the "3" in ".mp3". If at any point it detects a character that doesn't fit, it resets and continues looking.

EDIT: If the files to be processed are small (less than 50mb or so) you could load the entire file into memory, which would make scanning simpler.



回答2:

As was said, since it is a binary file you can't use a Scanner or other character based readers. You could use a regular FileInputStream to read the actual raw bytes of the file. Java's String class has a constructor that will take an array of bytes and turn them into a string. You can then search that string for the file name(s). This may work if you just use the default character set.

String(byte[]): http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html FileInputStream for reading bytes: http://download.oracle.com/javase/tutorial/essential/io/bytestreams.html



回答3:

Use hasNextLine() instead of hasNext() in the while loop check.

while (readF.hasNextLine()) {
 String line = readF.nextLine();
 //Your code
 }