Is Base64 encoding safe to use for filenames on Windows and Linux systems? From my research I have found that replacing all /
characters of the result with -
or _
should resolve any issues.
Can anyone provide more details on this?
Currently in Java I am using the following peice of code:
MessageDigest md5Digest = MessageDigest.getInstance("MD5");
md5Digest.reset();
md5Digest.update(plainText.getBytes());
byte[] digest = md5Digest.digest();
BASE64Encoder encoder = new BASE64Encoder();
hash = encoder.encode(digest);
hash.replace('/','_');
A filename created by Base64 is only safe if you use a different character from /, which you do, as NTFS does not allow that character to be used in file names. As long as you do that, pretty much all commonly used file systems in common use will be OK.
However, if the filesystem is case-insensitive, as is the case on Windows, you can get collisions because the Base64 alphabet contains both upper and lower-case.
You might want to consider using the hexadecimal representation of your MD5 hash instead, since this is a fairly standard way of representing those as a string.
I'm not sure what you are using the encoding for, but consider percent encoding file names.
One-liner for C#:
Needs the following to the beginning of the file:
Modified Base64 (when
/
,=
and+
are replaced) is safe to create names but does not guarantee reverse transformation due to case insensitivity of many file systems and urls.Base64 is case sensitive, so it will not guarantee 1-to-1 mapping in cases of case insensitive file systems (all Windows files systems, ignoring POSIX subsystem cases). Most urls also case insensitive preventing 1-to-1 mapping.
I would use Base32 in this case - you'll get names a bit longer, but Base32 encoded values are 100% safe for file/uri usage without replacing any characters and guarantees 1-to-1 mapping even in cases of insensitive environment (FAT/Win32 NTFS access).
Unfortunately there is usually no built-in support for this encoding in frameworks. On other hand code is relatively simple to write yourself or find online.
http://en.wikipedia.org/wiki/Base32.
RFC 3548 suggests not only to replace the
/
character. The URL and Filename safe Alphabet replaces:/
character with the underscore_
+
character with the minus-
.But maybe you better use a HEX-String. It is been a while, when i stored a hash value in a filename. I started with using Base64 String but switched to a Hex-String. I don't remember why i switched, maybe because Windows makes no difference between 'a' and 'A' as AndiDog said.
Usually MD5 hashes (hashes in general) are represented as hexadecimal strings instead of Base64, which then only contain [a-f0-9]. Those names would be supported by all filesystems.
If you really want to use Base64, your solution (replacing slashes) will not work correctly as Windows filesystems don't make a difference between 'A' and 'a'. Maybe you want to use Base32 instead? But mind that Base32 makes 8 bits out of 4, so it will be easier to just take the hexadecimal representation.
In general, the following characters are not allowed in Windows and/or Linux: \ / : * ? " < > |