Google Analytics Character encoding of __utm cooki

2019-08-02 18:34发布

I'm trying to figure out what encoding GA uses when it saves cookies. For example, I can use non-western characters when setting the utm_source parameter and they show up fine in the GA reports. However, if I look at the __utmz cookie, it does not match the value for utm_source parameter, instead is seems to be encoded somehow, I know there is URL encoding, but this is something different.

Example:

1) Visit www.example.com?utm_source=ХЦЧШЩЬЫЪЭЮЯ

2) View cookies. The __utmz cookie saves whatever value was given to utm_source param. It contains the value ХЦЧШЩЬЫЪЭЮЯ which seems to be encoded.

3) click around on website then view GA reports. You see ХЦЧШЩЬЫЪЭЮЯ as visit source, which is correct.

I'm trying to write some JavaScript that will read the __utmz cookie and save it in a Google App Engine Datastore then successfully display it in an HTML page. I've tried all types of encode(utf-8) decode(utf-8) solutions but nothing seems to work. I assume this is because I don't have the original encoding used when setting the cookie.

1条回答
聊天终结者
2楼-- · 2019-08-02 18:49

The encoding used is UTF-8. When ХЦЧШЩЬЫЪЭЮЯ is UTF-8 encoded and then then the bytes of the UTF-8 encoded value are displayed as if they were windows-1252 encoded, you get ХЦЧШЩЬЫЪЭЮЯ. For example, the first character X, cyrillic capital letter ha, is U+0425, which is bytes 0xD0 0xA5 when UTF-8 encoded. When these bytes are interpreted as windows-1252 (or ISO-8859-1) encoded character data, they mean U+00D0 U+00A5, i.e. Ð¥.

查看更多
登录 后发表回答