What could go wrong if I convert ANSI encoded file

2019-08-19 08:35发布

I have an existing ASP.NET 2.0 website, stored in Team Foundation Server 2005. Some of the pages/controls are encoded as ANSI (according to Notepad++) and the Content-Type header is set to:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252"/>

I would like to change all pages to UTF-8, and therefore the Content-Type header to:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

Other than changing the meta element, I assume I also need to change the encoding of all the files. I can do this in Notepad++ though if anyone has any quicker methods, please mention them.

What sort of problems might I face when it comes to merging/comparing in TFS?

5条回答
劳资没心,怎么记你
2楼-- · 2019-08-19 09:05

Something useful I just discovered is that you can right-click on a file on Source Control Explorer, then choose Properties. You can then see/modify the encoding as far as TFS is concerned.

查看更多
贪生不怕死
3楼-- · 2019-08-19 09:10

Pick a file that has a character above the 0-127 ASCII range. Open that with notepad, choose Save As and pick UTF-8 for the encoding. Then see if the character is successfully converted.

To automate the procedure, you could write an application that converts all the files from ASCII to UTF-8, using 1252 as code page. If you don't have characters above 127, you do not need to worry about all these.

查看更多
叛逆
4楼-- · 2019-08-19 09:19

This is not necessarily true. I don't know about ASP.net but we do all our php coding here in Ansi and serve the pages in UTF8. All our database information is stored in UTF8 as well.

查看更多
Viruses.
5楼-- · 2019-08-19 09:21

It depends on how much of the text in your codebase is using characters outside the ASCII range of 0..127.

You might want to scan for those first, to see how much impact it will have. If your codebase is primarily in English, then you probably don't have much to worry about.

查看更多
Summer. ? 凉城
6楼-- · 2019-08-19 09:23

I would write a Python script

for fn in os.listdir(srcdir):
    data = open(srcdir+"\\"+fn, "rb").read().decode("windows-1252")
    data = data.replace("charset=windows-1252", "charset=utf-8")
    open(srcdir+"\\"+fn, "wb").write(data.encode("utf-8"))

The update of the charset assumes that this specific string won't occur elsewhere; you can make it more robust by checking for a longer string, checking whether the old text actually exists in the file, doing proper XML parsing, etc.

You might need to put an UTF-8 signature in front of the UTF-8-encoded data; you find one in codecs.BOM_UTF8

I don't know what consequence this change has for TFS.

查看更多
登录 后发表回答