可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm making a cross-platform application that renames files based on data retrieved online. I'd like to sanitize the Strings I took from a web API for the current platform.
I know that different platforms have different file-name requirements, so I was wondering if there's a cross-platform way to do this?
Edit: On Windows platforms you cannot have a question mark '?' in a file name, whereas in Linux, you can. The file names may contain such characters and I would like for the platforms that support those characters to keep them, but otherwise, strip them out.
Also, I would prefer a standard Java solution that doesn't require third-party libraries.
回答1:
As suggested elsewhere, this is not usually what you want to do. It is usually best to create a temporary file using a secure method such as File.createTempFile().
You should not do this with a whitelist and only keep 'good' characters. If the file is made up of only Chinese characters then you will strip everything out of it. We can't use a whitelist for this reason, we have to use a blacklist.
Linux pretty much allows anything which can be a real pain. I would just limit Linux to the same list that you limit Windows to so you save yourself headaches in the future.
Using this C# snippet on Windows I produced a list of characters that are not valid on Windows. There are quite a few more characters in this list than you may think (41) so I wouldn't recommend trying to create your own list.
foreach (char c in new string(Path.GetInvalidFileNameChars()))
{
Console.Write((int)c);
Console.Write(",");
}
Here is a simple Java class which 'cleans' a file name.
public class FileNameCleaner {
final static int[] illegalChars = {34, 60, 62, 124, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 58, 42, 63, 92, 47};
static {
Arrays.sort(illegalChars);
}
public static String cleanFileName(String badFileName) {
StringBuilder cleanName = new StringBuilder();
for (int i = 0; i < badFileName.length(); i++) {
int c = (int)badFileName.charAt(i);
if (Arrays.binarySearch(illegalChars, c) < 0) {
cleanName.append((char)c);
}
}
return cleanName.toString();
}
}
EDIT:
As Stephen suggested you probably also should verify that these file accesses only occur within the directory you allow.
The following answer has sample code for establishing a custom security context in Java and then executing code in that 'sandbox'.
How do you create a secure JEXL (scripting) sandbox?
回答2:
or just do this:
String filename = "A20/B22b#öA\\BC#Ä$%ld_ma.la.xps";
String sane = filename.replaceAll("[^a-zA-Z0-9\\._]+", "_");
Result: A20_B22b_A_BC_ld_ma.la.xps
Explanation:
[a-zA-Z0-9\\._]
matches a letter from a-z lower or uppercase, numbers, dots and underscores
[^a-zA-Z0-9\\._]
is the inverse. i.e. all characters which do not match the first expression
[^a-zA-Z0-9\\._]+
is a sequence of characters which do not match the first expression
So every sequence of characters which does not consist of characters from a-z, 0-9 or . _ will be replaced.
回答3:
This is based on the accepted answer by Sarel Botha which works fine as long as you don't encounter any characters outside of the Basic Multilingual Plane. If you need full Unicode support (and who doesn't?) use this code instead which is Unicode safe:
public class FileNameCleaner {
final static int[] illegalChars = {34, 60, 62, 124, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 58, 42, 63, 92, 47};
static {
Arrays.sort(illegalChars);
}
public static String cleanFileName(String badFileName) {
StringBuilder cleanName = new StringBuilder();
int len = badFileName.codePointCount(0, badFileName.length());
for (int i=0; i<len; i++) {
int c = badFileName.codePointAt(i);
if (Arrays.binarySearch(illegalChars, c) < 0) {
cleanName.appendCodePoint(c);
}
}
return cleanName.toString();
}
}
Key changes here:
- Use codePointCount i.c.w.
length
instead of just length
- use codePointAt instead of
charAt
- use appendCodePoint instead of
append
- No need to cast
char
s to int
s. In fact, you should never deal with char
s as they are basically broken for anything outside the BMP.
回答4:
There's a pretty good built-in Java solution - Character.isXxx().
Try Character.isJavaIdentifierPart(c)
:
String name = "name.é+!@#$%^&*(){}][/=?+-_\\|;:`~!'\",<>";
StringBuilder filename = new StringBuilder();
for (char c : name.toCharArray()) {
if (c=='.' || Character.isJavaIdentifierPart(c)) {
filename.append(c);
}
}
Result is "name.é$_".
回答5:
Here is the code I use:
public static String sanitizeName( String name ) {
if( null == name ) {
return "";
}
if( SystemUtils.IS_OS_LINUX ) {
return name.replaceAll( "/+", "" ).trim();
}
return name.replaceAll( "[\u0001-\u001f<>:\"/\\\\|?*\u007f]+", "" ).trim();
}
SystemUtils
is from Apache commons-lang3
回答6:
It is not clear from your question, but since you are planning to accept pathnames from a web form (?) you probably ought block attempts renaming certain things; e.g. "C:\Program Files". This implies that you need to canonicalize the pathnames to eliminate "." and ".." before you make your access checks.
Given that, I wouldn't attempt to remove illegal characters. Instead, I'd use "new File(str).getCanonicalFile()" to produce the canonical paths, next check that they satisfy your sandboxing restrictions, and finally use "File.exists()", "File.isFile()", etc to check that the source and destination are kosher, and are not the same file system object. I'd deal with illegal characters by attempting to do the operations and catching the exceptions.
回答7:
If you want to use more than like [A-Za-z0-9], then check MS Naming Conventions, and dont forget to filter out "...Characters whose integer representations are in the range from 1 through 31,...", like the example of Aaron Digulla does. The code e.g. from David Carboni would not be sufficient for these chars.
回答8:
Paths.get(...)
throws a detailed exception with the position of the illegal character.
public static String removeInvalidChars(final String fileName)
{
try
{
Paths.get(fileName);
return fileName;
}
catch (final InvalidPathException e)
{
if (e.getInput() != null && e.getInput().length() > 0 && e.getIndex() >= 0)
{
final StringBuilder stringBuilder = new StringBuilder(e.getInput());
stringBuilder.deleteCharAt(e.getIndex());
return removeInvalidChars(stringBuilder.toString());
}
throw e;
}
}