Why are results of path.toString() failing to show

2019-07-14 05:55发布

问题:

In my Java code I use a FileVisitor to traverse a filesystem and creating a structure of Paths, then later on this is converted to a json object for rendering in html.

Running on Windows it runs okay even against a linux filesystem, running on Linux against the same (now local) filesystem it fails to render special characters properly when call toString() on a path

i.e Windows debug output

CreateFolderTree:createJsonData:SEVERE: AddingNode(1):Duarte Lôbo- Requiem

and html displays ok as

Duarte Lôbo- Requiem

but linux debug output gives

CreateFolderTree:createJsonData:SEVERE: AddingNode(1):Duarte L??bo- Requiem

and html displays as two black diamond with question mark in them instead of the ô char

Why is this happening, the Paths are provided by the the FileVisitor class so must be getting constructed properly (i.e I am not hacking it myself) , and then i just call toString() on the path.

Is it a fonts problem, I have had some issues with fonts on the linux system but here I am just returning Strings to the html so cannot see a conection.

Probably an encoding issue, but I cant see a place where I am explicitly setting an encoding

Bulk of code below, debugging showing invalid output for linux is in the createJsonData() method

Edit:I have fixed the logging issue so that the output is written as UTF-8

  FileHandler fe = new FileHandler(logFileName, LOG_SIZE_IN_BYTES, 10, true);
  fe.setEncoding(StandardCharsets.UTF_8.name());

So we now see Windows is outputting correctly

CreateFolderTree:createJsonData:SEVERE: AddingNode(1):Duarte Lôbo- Requiem

but Linux is outputting

CreateFolderTree:createJsonData:SEVERE: AddingNode(1):Duarte L��bo- Requiem

and if I view this in HexEditor it gives this output for L��bo

4C EF BF BD EF BF BD 62 6F

Edit:Partial Solution

I came across What exactly is sun.jnu.encoding?

and found it was recommended to add this

 -Dsun.jnu.encoding=UTF-8

and it worked files were now displayed okay

Unfortunately if user then clicked on such a file and sent back to server I now get this error

java.lang.NullPointerException
    at java.base/sun.nio.fs.UnixPath.normalizeAndCheck(Unknown Source)
    at java.base/sun.nio.fs.UnixPath.<init>(Unknown Source)
    at java.base/sun.nio.fs.UnixFileSystem.getPath(Unknown Source)
    at java.base/java.nio.file.Paths.get(Unknown Source)
    at com.jthink.songkong.server.callback.ServerFixSongs.configureFileMapping(ServerFixSongs.java:59)
    at com.jthink.songkong.server.callback.ServerFixSongs.startTask(ServerFixSongs.java:88)
    at com.jthink.songkong.server.CmdRemote.lambda$null$36(CmdRemote.java:107) 

I tried adding -Dfile.encoding=UTF-8 both in addtion or instead of the jnu option and that didnt help , the jnu option was the one I needed.

I shoudn't have to add this undocumented sun-jnu-encoding option so it seems to be that the server is broken in some way ?

Code

   import com.google.common.base.Strings;
    import com.google.gson.Gson;
    import com.google.gson.GsonBuilder;
    import com.jthink.songkong.analyse.analyser.Counters;
    import com.jthink.songkong.analyse.general.Errors;
    import com.jthink.songkong.cmdline.SongKong;
    import com.jthink.songkong.fileloader.RecycleBinFolderNames;
    import com.jthink.songkong.server.fs.Data;
    import com.jthink.songkong.server.fs.PathWalker2;
    import com.jthink.songkong.server.fs.State;
    import com.jthink.songkong.ui.MainWindow;
    import com.jthink.songkong.ui.progressdialog.FixSongsCounters;
    import spark.Request;
    import spark.Response;

    import java.io.IOException;
    import java.net.UnknownHostException;
    import java.nio.file.*;
    import java.nio.file.attribute.BasicFileAttributes;
    import java.util.ArrayList;
    import java.util.HashSet;
    import java.util.Map;
    import java.util.Set;
    import java.util.logging.Level;


    /**
     * Count the number of files that can be loaded, for information purposes only
     */
    public class CreateFolderTree
    {
        private Path treeRoot;

        Set<Path> keys = new HashSet<Path>();


        public static class VisitFolder
                extends SimpleFileVisitor<Path>
        {

            private Set<Path> keys;
            private Integer maxDepth;
            private int depth;

            public VisitFolder(Set<Path> keys, Integer maxDepth)
            {
                this.keys=keys;
                this.maxDepth = maxDepth;
            }

            /**
             *
             * @param dir
             * @param attrs
             * @return
             * @throws IOException
             */
             /*
             * Ignore some dirs
             * @param dir
             * @param attrs
             * @return
             * @throws IOException
             */
            public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
                    throws IOException
            {
                try
                {
                    if (dir.toFile().getName().equals(".AppleDouble"))
                    {
                        return FileVisitResult.SKIP_SUBTREE;
                    }
                    else if (dir.toString().equals("/proc"))
                    {
                        return FileVisitResult.SKIP_SUBTREE;
                    }
                    else if (dir.toString().equals("/dev"))
                    {
                        return FileVisitResult.SKIP_SUBTREE;
                    }
                    else if (RecycleBinFolderNames.isMatch(dir.toFile().getName()))
                    {
                        MainWindow.logger.log(Level.SEVERE, "Ignoring " + dir.toString());
                        return FileVisitResult.SKIP_SUBTREE;
                    }
                    else if (dir.toString().toLowerCase().endsWith(".tar"))
                    {
                        return FileVisitResult.SKIP_SUBTREE;
                    }

                    depth++;

                    if(depth > maxDepth)
                    {
                        depth--;
                        return FileVisitResult.SKIP_SUBTREE;
                    }
                    keys.add(dir);
                    return super.preVisitDirectory(dir, attrs);
                }
                catch(IOException e)
                {
                    MainWindow.logger.warning("Unable visit dir:"+dir + ":"+e.getMessage());
                    return FileVisitResult.SKIP_SUBTREE;
                }
            }


            /**
             *
             * Tar check due to http://stackoverflow.com/questions/14436032/why-is-java-7-files-walkfiletree-throwing-exception-on-encountering-a-tar-file-o/14446993#14446993
             * SONGKONG-294:Ignore exceptions if file is not readable
             *
             * @param file
             * @param exc
             * @return
             * @throws IOException
             */
            @Override
            public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException
            {

                if (file.toString().endsWith(".tar")) {
                    //We dont log to reports as this is a bug in Java that we are handling not a problem in SongKong
                    MainWindow.logger.log(Level.SEVERE, exc.getMessage());
                    return FileVisitResult.CONTINUE;
                }

                try
                {
                    FileVisitResult result = super.visitFileFailed(file, exc);
                    return result;
                }
                catch(IOException e)
                {
                    MainWindow.logger.warning("Unable to visit file:"+file + ":"+e.getMessage());
                    return FileVisitResult.CONTINUE;
                }
            }

            /**
             * SONGKONG-294:Ignore exception if folder is not readable
             *
             * @param dir
             * @param exc
             * @return
             * @throws IOException
             */
            @Override
            public FileVisitResult postVisitDirectory(Path dir, IOException exc)
                    throws IOException
            {
                depth--;
                try
                {
                    FileVisitResult result = super.postVisitDirectory(dir, exc);
                    return result;
                }
                catch(IOException e)
                {
                    MainWindow.logger.warning("Unable to count files in dir(2):"+dir);
                    return FileVisitResult.CONTINUE;
                }
            }
        }

        public CreateFolderTree(Path treeRoot)
        {
            this.treeRoot = treeRoot;
        }

        public String start(int depth)
        {
            VisitFolder visitFolder;
            try
            {

                if(treeRoot==null)
                {
                    for (Path path : FileSystems.getDefault().getRootDirectories())
                    {
                        visitFolder = new VisitFolder(keys, depth);
                        Files.walkFileTree(path, visitFolder);
                    }
                }
                else
                {
                    visitFolder = new VisitFolder(keys, depth);
                    Files.walkFileTree(treeRoot, visitFolder);
                }

                PathWalker2 pw = new PathWalker2();
                for (Path key : keys)
                {
                    //SONGKONG-505: Illegal character in Filepath problem prevented reportFile creation
                    try
                    {
                        pw.addPath(key);
                    }
                    catch (InvalidPathException ipe)
                    {
                        MainWindow.logger.log(Level.SEVERE, ipe.getMessage(), ipe);
                    }
                }
                Gson gson = new GsonBuilder().create();
                return gson.toJson(createJsonData(pw.getRoot()));
            }
            catch (Exception e)
            {
                handleException(e);
            }
            return "";
        }

        public void handleException(Exception e)
        {
            MainWindow.logger.log(Level.SEVERE, "Unable to count files:"+e.getMessage(), e);
            Errors.addError("Unable to count files:"+e.getMessage());
            MainWindow.logger.log(Level.SEVERE, e.getMessage());
            Counters.getErrors().getCounter().incrementAndGet();
            SongKong.refreshProgress(FixSongsCounters.SONGS_ERRORS);
        }

        /**
         * Add this node and recursively its children,  returning json data representing the tree
         *
         * @param node
         * @return
         */
        private Data createJsonData(PathWalker2.Node node)
        {
            Data data = new Data();
            if(node.getFullPath()!=null)
            {
                data.setId(node.getFullPath().toString());
                if(node.getFullPath().getFileName()!=null)
                {
                    MainWindow.logger.severe("AddingNode(1):"+node.getFullPath().getFileName().toString());
                    data.setText(node.getFullPath().getFileName().toString());
                }
                else
                {
                    MainWindow.logger.severe("AddingNode(2):"+node.getFullPath().toString());
                    data.setText(node.getFullPath().toString());
                }
            }
            else
            {
                try
                {
                    data.setText(java.net.InetAddress.getLocalHost().getHostName());
                    data.setId("#");
                    State state = new State();
                    state.setOpened(true);
                    data.setState(state);
                }
                catch(UnknownHostException uhe)
                {
                    data.setText("Server");
                }
            }

            //Recursively add each child folder of this node
            Map<String, PathWalker2.Node> children = node.getChildren();
            if(children.size()>0)
            {
                data.setChildren(new ArrayList<>());
                for (Map.Entry<String, PathWalker2.Node> next : children.entrySet())
                {
                    data.getChildren().add(createJsonData(next.getValue()));
                }
            }
            else
            {
                data.setBooleanchildren(true);
            }
            return data;
        }

        public static String createFolderJsonData(Request request, Response response)
        {
            if(Strings.nullToEmpty(request.queryParams("id")).equals("#"))
            {
                CreateFolderTree cft = new CreateFolderTree(null);
                String treeData = cft.start(1).replace("booleanchildren", "children");
                return treeData;
            }
            else
            {
                CreateFolderTree cft = new CreateFolderTree(Paths.get(request.queryParams("id")));
                String treeData = cft.start(2    ).replace("booleanchildren", "children");
                return treeData;
            }
        }

    }


    import java.nio.file.Path;
    import java.util.Collections;
    import java.util.Map;
    import java.util.TreeMap;

    /** Constructs a tree of folders based on a list of filepaths
     *
     * i.e a give it a list if all folder that  contain files that have been modified and it creates a hierachy
     * that can then be used to generate a data structure for use by jstree
     *
     */
    public class PathWalker2
    {
        private final Node root;


        public PathWalker2()
        {
            root = new Node();
        }

        public Node getRoot()
        {
            return root;
        }

        /**
         * Represent a node on the tree (may/not have children)
         */
        public static class Node
        {
            //Keyed on name and node
            private final Map<String, Node> children = new TreeMap<>();

            private Path fullPath;

            public Node addChild(String name)
            {

                if (children.containsKey(name))
                    return children.get(name);

                Node result = new Node();
                children.put(name, result);
                return result;
            }

            public Map<String, Node> getChildren()
            {
                return Collections.unmodifiableMap(children);
            }

            public void setFullPath(Path fullPath)
            {
                this.fullPath = fullPath;
            }

            public Path getFullPath()
            {
                return fullPath;
            }
        }

        /**
         * @param path
         */
        public void addPath(Path path)
        {
            Node node = root.addChild((path.getRoot().toString().substring(0, path.getRoot().toString().length() - 1)));

            //For each segment of the path add as child if not already added
            for (int i = 0; i < path.getNameCount(); i++)
            {
                node = node.addChild(path.getName(i).toString());
            }

            //Set full path of this node
            node.setFullPath(path);
        }


    }

回答1:

So as always with encoding problems this has been a lot of work to debug. Not only are there a lot of different things that affect it, they also affect it at different times, so the first task is always to check where does it go wrong first.

As the deal with the � showed, once it goes wrong, it can then go more wrong and if you try to debug starting from the end result, it's like peeling layers from a rotten onion.


In this case the root of the problem was in the OS locale, which was set to POSIX. This old standard makes your OS act like it's from the 70's, with ASCII encoding and other outdated details. The ASCII encoding will prevent the OS from understanding filenames, text or anything containing more exotic characters. This causes weird issues because the JVM is doing just fine by itself, but any time it communicates with the OS (printing to a text file, asking to open a file with a certain name) there's a chance of corruption because the OS doesn't understand what the JVM is saying.

It's like someone is talking to you and every once in a while he puts a word of Chinese in there. You're writing down what he says in English, but every Chinese word you replace with "Didn't understand???".

The locale (in /etc/default/locale) usually contains sane defaults, but as we saw here, you can't always trust that. For any modern systems you'll want locale values like en_EN.UTF-8. You never want to see POSIX there in this day and age.



回答2:

For html you either need to set a proper charset matching your needs or better stick with ASCII and use html-encoding for all non-ASCII characters. This works even if no specific charset is defined for you html display.

https://en.wikipedia.org/wiki/Unicode_and_HTML



回答3:

It seems that your debug output goes through multiple conversions between charsets. The text you send to the console seems to be converted to bytes using UTF-8 as encoding resulting into the conversion from ô to ô. Then there seems to be another conversion from the byte-data back into characters using the system's charset. Windows' console uses cp1252 as charset while Linux has different settings on a per installation basis. In your case it seems to be ASCII leading to the conversion to the two ? for the UTF-8 encoded data because these bytes have values that aren't defined in ASCII.

I don't know the logging framework you're using or what the specific setup of the Logger is you're using, so I can't tell you how to fix that, but for the Linux-variant, you might check the console's charset and change it to UTF-8 to see if that has the desired effect.