I'm trying to figure out what encoding GA uses when it saves cookies. For example, I can use non-western characters when setting the utm_source parameter and they show up fine in the GA reports. However, if I look at the __utmz cookie, it does not match the value for utm_source parameter, instead is seems to be encoded somehow, I know there is URL encoding, but this is something different.
Example:
1) Visit www.example.com?utm_source=ХЦЧШЩЬЫЪЭЮЯ
2) View cookies. The __utmz cookie saves whatever value was given to utm_source param. It contains the value ХЦЧШЩЬЫЪÐЮЯ which seems to be encoded.
3) click around on website then view GA reports. You see ХЦЧШЩЬЫЪЭЮЯ as visit source, which is correct.
I'm trying to write some JavaScript that will read the __utmz cookie and save it in a Google App Engine Datastore then successfully display it in an HTML page. I've tried all types of encode(utf-8) decode(utf-8) solutions but nothing seems to work. I assume this is because I don't have the original encoding used when setting the cookie.
The encoding used is UTF-8. When ХЦЧШЩЬЫЪЭЮЯ is UTF-8 encoded and then then the bytes of the UTF-8 encoded value are displayed as if they were windows-1252 encoded, you get ХЦЧШЩЬЫЪÐЮЯ. For example, the first character X, cyrillic capital letter ha, is U+0425, which is bytes 0xD0 0xA5 when UTF-8 encoded. When these bytes are interpreted as windows-1252 (or ISO-8859-1) encoded character data, they mean U+00D0 U+00A5, i.e. Ð¥.