Automatic conversion of the advanced string format

2019-07-05 05:29发布

Is there any automatic way to convert a piece of code from python's old style string formatting (using %) to the new style (using .format)? For example, consider the formatting of a PDB atom specification:

spec = "%-6s%5d %4s%1s%3s %1s%4d%1s   %8.3f%8.3f%8.3f%6.2f%6.2f          %2s%2s"

I've been converting some of these specifications by hand as needed, but this is both error prone, and time-consuming as I have many such specifications.

2条回答
贼婆χ
2楼-- · 2019-07-05 06:14

The docs explain some of the differences. As far as I can tell -- although I'm not very familiar with old-style format strings -- is that the functionality of the new style is a superset of the functionality of the oldstyle.

You'd have to do more tweaking to handle edge cases, but I think something simple like

re.replace(r'%(\w+)([sbcdoXnf...])', r'{\1\2}', your_string)

would get you 90% of the way there. The remaining translation -- going from things like %x to {0:x} -- will be too complex for a regular expression to handle (without writing some ridiculously complex conditionals inside of your regex).

查看更多
Juvenile、少年°
3楼-- · 2019-07-05 06:25

The functionality of the two forms does not match up exactly, so there is no way you could automatically translate every % string into an equivalent {} string or (especially) vice-versa.

Of course there is a lot of overlap, and many of the sub-parts of the two formatting languages are the same or very similar, so someone could write a partial converter (which could, e.g., raise an exception for non-convertible code).

For a small subset of the language like what you seem to be using, you could do it pretty trivially with a simple regex—every pattern starts with % and ends with one of [sdf], and something like {:\1\2} as a replacement pattern ought to be all you need.

But why bother? Except as an exercise in writing parsers, what would be the benefit? The % operator is not deprecated, and using % with an existing % format string will obviously do at least as well as using format with a % format string converted to {}.

If you are looking at this as an exercise in writing parsers, I believe there's an incomplete example buried inside pyparsing.


Some differences that are hard to translate, off the top of my head:

  • * for dynamic field width or precision; format has a similar feature, but does it differently.
  • %(10)s, because format tries to interpret the key name as a number first, then falls back to a dict key.
  • %(a[b])s, because format doesn't quote or otherwise separate the key from the rest of the field, so a variety of characters simply can't be used.
  • %c takes integers or single-char strings; :c only integers.
  • %r/%s/%a analogues are not part of the format string, but a separate part of the field (which also comes on the opposite side).
  • %g and :g have slightly different cutoff rules.
  • %a and !a don't do the exact same thing.

The actual differences aren't listed anywhere; you will have to dig them out by a thorough reading of the Format Specification Mini-Language vs. the printf-style String Formatting language.

查看更多
登录 后发表回答