On my site I let users upload files.
If the file is valid and uploaded it is moved to a folder (using PHP).
All users upload to the same folder.
I think I need to rename the uploaded files.
Is there something like a default naming convention to let users upload files with the same filename?
There are no standard conventions, but there a couple of best-practices:
Organizing your files into (User and/or Date) Aware Folders
Something like:
/uploads/USER/ or
/uploads/[USER/]YEAR/[MONTH/[DAY/[HOUR/[MINUTE/]]]]
This will have some benefits:
- organize files per user and/or date
- make it harder to reach the maximum number of files per directory
(Not) Renaming / Sanitizing Filenames
Renaming or not is a choice you will have to make, depending on your website, user base, how obscure you would like to be and, obviously your architecture. Would you prefer to have a file named kate_at_the_beach.jpg
or 1304357611.jpg
? This is really up to you to decide, but search engines (obviouslly) like the first one better.
One thing you should do is always sanitize and normalize the filenames, personally I would only allow the following chars: 0-9
, a-z
, A-Z
, _
, -
, .
- if you choose this sanitation alphabet. normalization basically means just converting the filename to either lower or upper case (to avoid losing files if for instance you switch from a case sensitive file-system to a case insensitive one, like Windows).
Here is some sample code I use in phunction (shameless plug, I know :P):
$filename = '/etc/hosts/@Álix Ãxel likes - beer?!.jpg';
$filename = Slug($filename, '_', '.'); // etc_hosts_alix_axel_likes_beer.jpg
function Slug($string, $slug = '-', $extra = null)
{
return strtolower(trim(preg_replace('~[^0-9a-z' . preg_quote($extra, '~') . ']+~i', $slug, Unaccent($string)), $slug));
}
function Unaccent($string) // normalizes (romanization) accented chars
{
if (strpos($string = htmlentities($string, ENT_QUOTES, 'UTF-8'), '&') !== false)
{
$string = html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1', $string), ENT_QUOTES, 'UTF-8');
}
return $string;
}
Handling Duplicate Filenames
As the documentation entry on move_uploaded_file()
states:
If the destination file already
exists, it will be overwritten.
So, before you call move_uploaded_file()
you better check if the file already exists, if it does then you should (if you don't want to lose your older file) rename your new file, usually appending a time / random / unique token before the file extension, doing something like this:
if (file_exists($output . $filename) === true)
{
$token = '_' . time(); // see below
$filename = substr_replace($filename, $token, strrpos($filename, '.'), 0);
}
move_uploaded_file($_FILES[$input]['tmp_name'], $output . $filename);
This will have the effect of inserting the $token
before the file extension, like I stated above. As for the choice of the $token
value you have several options:
time()
- ensures uniqueness every second but sucks handling duplicate files
- random - not a very good idea, since it doesn't ensure uniqueness and doesn't handle duplicates
- unique - using an hash of the file contents is my favorite approach, since it guarantees content uniqueness and saves you HD space since you'll only have at most 2 identical files (one with the original filename and another one with the hash appended), sample code:
(Dummy text so that the next line gets formatted as code.)
$token = '_' . md5_file($_FILES[$input]['tmp_name']);
Hope it helps! ;)
There is no such convention, but usually, the name is randomly generated to make guessing less probable. Allowing the filename without sanitizing is strongly discouraged, take at least a whitelist approach in which you remove all characters except for those in the whitelist. The key is security, uploading is a risky feature and can be dangerous if not properly handled.
Just make some convention internally yourself. You could for example just store the files as userId_timestamp
in the folder, and keep the original filename in some database. Or you just make it userId_originalFilename
or some other combination of things that make it unique.
In a similar case, I save the info in a table (with the user ID as foreign key), format the autonumeric ID with leading zeroes for the filename (ie 000345.jpg) and store the original name in the table.
Could you use some combination of the user's name and the upload date?