I need identify which file is binary and which is a text in a directory.
I tried use mimetypes but it isnt a good idea in my case because it cant identify all files mimes, and I have strangers ones here... I just need know, binary or text. Simple ? But I couldn´t find a solution...
Thanks
It might be possible to use libmagic to guess the MIME type of the file using python-magic. If you get back something in the "text/*" namespace, it is likely a text file, while anything else is likely a binary file.
Thanks everybody, I found a solution that suited my problem. I found this code at http://code.activestate.com/recipes/173220/ and I changed just a little piece to suit me.
It works fine.
It's inherently not simple. There's no way of knowing for sure, although you can take a reasonably good guess in most cases.
Things you might like to do:
But it's all heuristic - it's quite possible to have a file which is a valid text file and a valid image file, for example. It would probably be nonsense as a text file, but legitimate in some encoding or other...
If your script is running on *nix, you could use something like this: