This topic is related to the Parsing a CS:GO script file in Python theme, but there is another problem. I'm working on a content from CS:GO and now i'm trying to make a python tool importing all data from from /scripts/ folder into Python dictionaries.
The next step after parsing data is parsing Language resource file from /resources and making relations between dictionaries and language.
There is an original file for Eng localization: https://github.com/spec45as/PySteamBot/blob/master/csgo_english.txt
The file format is similar to the previous task, but I have faced with another problems. All language files is in UTF-16-LE encode, i couldn't understand the way of working with encoded files and strings in Python (I'm mostly working with Java)
I have tried to make some solutions, based on open(fileName, encoding='utf-16-le').read()
, but i don't know how to work with such encoded strings in pyparsing.
pyparsing.ParseException: Expected quoted string, starting with " ending with " (at char 0), (line:1, col:1)
Another problem is lines with \"-like expressions, for example:
"musickit_midnightriders_01_desc" "\"HAPPY HOLIDAYS, ****ERS!\"\n -Midnight Riders"
How to parse these symbols if I want to leave these lines as they are?
There are a few new wrinkles to this input file that were not in the original CS:GO example:
\"
escaped quotes in some of the value strings[$WIN32]
,[$OSX]
)The first two are addressed by modifying the definition of
value_qs
. Since values are now more fully-featured than keys, I decided to use separate QuotedString definitions for them:The third requires a bit more refactoring. The use of these qualifying conditions is similar to
#IFDEF
macros in C - they enable/disable the definition only if the environment matches the condition. Some of these conditions were even boolean expressions:[!$PS3]
[$WIN32||$X360||$OSX]
[!$X360&&!$PS3]
This could lead to duplicate keys in the definition file, such as in these lines:
which contain 3 definitions for the key "Menu_Dlg_Leaderboards_Lost_Connection", depending on what environment values were set.
In order to not lose these values when parsing the file, I chose to modify the key at parse time by appending the condition if one was present. This code implements the change:
So that in the sample above, you would get 3 keys:
Lastly, the handling of comments, probably the easiest. Pyparsing has built-in support for skipping over comments, just like whitespace. You just need to define the expression for the comment, and have the top-level parser ignore it. To support this feature, several common comment forms are pre-defined in pyparsing. In this case, the solution is just to change the final parser defintion to:
And LASTLY lastly, there is a minor bug in the implementation of QuotedString, in which standard whitespace string literals like
\t
and\n
are not handled, and are just treated as an unnecessarily-escaped 't' or 'n'. So for now, when this line is parsed:For the value string you just get:
instead of:
I will have to fix this behavior in the next release of pyparsing.
Here is the final parser code: