I have a website that will eventually display multiple languages. I notice the common fonts used in web CSS (ex: Arial, Verdana, Times New Roman, Tahoma) and even the newer Vista/Office 2007/VS2008 fonts (Calibri,Cambria, Candara, Corbel, etc) are significantly larger (~350K) than your average (US only?) TTF font (~50k) so these fonts contain most/all the major character sets that common languages (Spanish, French, German, etc) use.
My question is, would somebody confirm that these fonts listed above are acceptable for international use of the major (let's say top 8) spoken languages? If so, then I'm guessing the only purpose of unicode fonts; such "Arial Unicode" (a massive 22mb) is only for dealing with extremely niche dialog, eastern glyphs (Chinese, Japanese) and dead languages?
I'm just looking for some confirmation from developers that have their desktop apps/web apps rendering multiple languages and have a visual confirmation, I'm already in the 99% sure bin but you know what they say about assumption.
I just checked the character set of Calibri and Cambria and confirm that they cover all the major languages of Europe (hence also of America). I can check for the other ClearType fonts if that makes you feel more comfortable, but I doubt the coverage is any different.
As has already been stressed, though, your choice of fonts depends a lot on the exact set of languages you're targetting: usually, you can be confident that any commercial font covers at least all the languages of Western Europe (English, French, German, Italian, Spanish, Portuguese, etc.). If you want to add Polish, Czech, Slovak, etc. to that, you need fonts with “Central European” coverage (usually tagged as CE in the Adobe font library). For Greek, you of course need a font with Greek characters, and for Russian a font with Cyrillic characters (which usually covers Ukrainian, Bulgarian, Serbian, etc. as well). The ClearType fonts support all of the above, btw.
In the end, nothing beats checking it for yourself: if you know what language you want to support, you can examine the font you intend on using, and check for the characters you're looking for. Michael's Everson Alphabets of Europe is a great resource here. It might seem tedious to have to go through all the individual languages, but it's something you really have to do it order to make an informed decision.
If you need to support Arabic or Hebrew, Indic scripts or scripts from South-East Asia, or ideographic scripts, then the picture is wholly different, as has already been said, and I doubt any single font would fit here, except for the very rare fonts that attempt to be comprehensive (which, by the way, is an illusion).
Your right in your assumption that not all fonts have full unicode support when limited to web safe fonts the choice gets narrowed even further
Unicode Fonts
Only two fonts available by default on the Windows platform, Arial and Lucida Sans, provide a wide Unicode character repertoire. A bug in Verdana (and the different handling of it by various user agents) hinders its usability where combining characters are desired.
Wikipedia: web safe fonts,
Comparison of web safe fonts availability
I admit to not knowing which of those 2 is actually the safest to apply to the most common languages.
Specifying the lang property as well as specifying UTF-8 encoding makes browsers succesfully substitute unknown characters with glyphs from there default OS font.
Great article about this here
Non-english operating systems will often map standard fonts like Arial or Verdana into the appropriate local-language font. For example, Japanese might map Helvetica to Osaka, or Arial to MS Gothic, while a Chinese OS will map Arial to Simsun. So as long as you use the common fonts (Arial, etc), you can be reasonably sure that the client OS will choose a suitable font.
I personally use this : http://www.google.com/get/noto/
It covers most languages .
If you limit your requirements to languages that are based on latin scripts with some extensions, then you'll have a much larger choice as if you were to require "wide unicode support". Most professional, high-quality fonts should support those easily.
A good indication of this is that Microsoft uses the same basic fonts in their western language software products.
On the other side of the software world there are the DejaVu fonts which provide great coverage for those scripts and are widely used on many Linux distributions.
The real pain with fonts starts when you want to display arabic scripts and far easern characters (and to a lesser extend greek and cyrillic characters).
And as a side note: a good browser should choose a different font if the selected font isn't capable of rendering a given glyph. Be sure to provide a generic font-family in your CSS (possibly in addition to a more specific font choice) to help the browser with this fall-back.
If you're wondering which fonts include which Unicode characters, the Windows Character Map accessory will show all the implemented characters for a font.
Of course you will need to know the languages you are targeting and the characters they use before this will be helpful.
Our site has dozens of translations and for Arabic text at least we get lots of requests for Tahoma, as it is the only thing that looks acceptable and is also installed on older versions of Windows.