Encoding errors when compressing files with Apache

2019-04-12 10:55发布

问题:

I am compressing files using the Apache Commons API Compression. Windows 7 works fine, but in Linux (ubuntu 10.10 - UTF8), characters in file names and folder names, such as "º", for example, are replaced by "?".

Is there any parameter I should pass to the API when compact, or when uncompressing tar?

I'am using tar.gz format, following the API examples.

The files I'm trying compress, are created in windows... is there any trouble?

The code:

    public class TarGzTest 
    {

    public static void createTarGzOfDirectory(String directoryPath, String tarGzPath) throws IOException
    {
        System.out.println("Criando tar.gz da pasta " + directoryPath + " em " + tarGzPath);
        FileOutputStream fOut = null;
        BufferedOutputStream bOut = null;
        GzipCompressorOutputStream gzOut = null;
        TarArchiveOutputStream tOut = null;

        try
        {
            fOut = new FileOutputStream(new File(tarGzPath));
            bOut = new BufferedOutputStream(fOut);
            gzOut = new GzipCompressorOutputStream(bOut);
            tOut = new TarArchiveOutputStream(gzOut);

            addFileToTarGz(tOut, directoryPath, "");
        }
        finally
        {
            tOut.finish();
            tOut.close();
            gzOut.close();
            bOut.close();
            fOut.close();
        }
        System.out.println("Processo concluído.");
    }

    private static void addFileToTarGz(TarArchiveOutputStream tOut, String path, String base) throws IOException
    {
        System.out.println("addFileToTarGz()::"+path);
        File f = new File(path);
        String entryName = base + f.getName();
        TarArchiveEntry tarEntry = new TarArchiveEntry(f, entryName);

        tOut.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);

        if(f.isFile())
        {
            tOut.putArchiveEntry(tarEntry);

            IOUtils.copy(new FileInputStream(f), tOut);

            tOut.closeArchiveEntry();
        }
        else
        {
            File[] children = f.listFiles();

            if(children != null)
            {
                for(File child : children)
                {
                    addFileToTarGz(tOut, child.getAbsolutePath(), entryName + "/");
                }
            }
        }
    }
}

(I suppress the main method;)

EDIT (monkeyjluffy) : The changes that I made are to have always the same archive on different platform. Then the hash calculated on it is the same.

回答1:

I found a workaround for my trouble.

For some reason, java doesn't respects my environment's encoding, and change it to cp1252.

After that I uncompress the file, I just enter in it folder, and ran this command:

convmv --notest -f cp1252 -t utf8 * -r

And it converts everything recursively to UTF-8.

Problem solved, guys.

more info about encoding problems in linux here.

Thanks everyone for the help.



回答2:

FYI, there's a bug in the above code explained here: Tar problem with apache commons compress

Basically, you need to close the FileInputStream. IOUtils.copy() won't do it for you.