say I have a blob of text 5000 characters. I run it through a hashing program and generates a 40 char long hash. now i run another blob of text, 10000 characters. it still generates a hash 40 chars long. that's true for text of any length.
my question is if the hashes are all unique, wouldn't i be able to compress anything into a 40 char string?
They are not unique, but you're much more likely to drop dead of a heart attack before you find two different documents with the same hash for a high quality algorithm, e.g. SHA-1
Don't get confused by the .Net GetHashCode(). It's not very good as it's only 32 bits compared to 640 bits in the original question (if each character is 8 bits).
Not all hashes are guaranteed to be unique. The wikipedia entry on the topic is pretty good: http://en.wikipedia.org/wiki/Hash_function
I think this is a great explanation: http://www.cprogramming.com/tutorial/computersciencetheory/hash-table.html
You can compress the signature of any text into a hash, but you cannot reverse calculate what the text was to give you that hash. Simply speaking the only way to find out what the text was that gave you the hash would be to brute-force text through the hash to try and find a match.
See Wikipedia
Consider looking at this from the point of view of the Pigeonhole Principle. If you're stuffing n items into a smaller number of buckets k, there will necessarily be some buckets with multiple items. So to answer your question, no hashes are not unique.