Is there any automatic way to convert a piece of code from python's old style string formatting (using %
) to the new style (using .format
)? For example, consider the formatting of a PDB atom specification:
spec = "%-6s%5d %4s%1s%3s %1s%4d%1s %8.3f%8.3f%8.3f%6.2f%6.2f %2s%2s"
I've been converting some of these specifications by hand as needed, but this is both error prone, and time-consuming as I have many such specifications.
The docs explain some of the differences. As far as I can tell -- although I'm not very familiar with old-style format strings -- is that the functionality of the new style is a superset of the functionality of the oldstyle.
You'd have to do more tweaking to handle edge cases, but I think something simple like
would get you 90% of the way there. The remaining translation -- going from things like
%x
to{0:x}
-- will be too complex for a regular expression to handle (without writing some ridiculously complex conditionals inside of your regex).The functionality of the two forms does not match up exactly, so there is no way you could automatically translate every
%
string into an equivalent{}
string or (especially) vice-versa.Of course there is a lot of overlap, and many of the sub-parts of the two formatting languages are the same or very similar, so someone could write a partial converter (which could, e.g., raise an exception for non-convertible code).
For a small subset of the language like what you seem to be using, you could do it pretty trivially with a simple regex—every pattern starts with
%
and ends with one of[sdf]
, and something like{:\1\2}
as a replacement pattern ought to be all you need.But why bother? Except as an exercise in writing parsers, what would be the benefit? The
%
operator is not deprecated, and using%
with an existing%
format string will obviously do at least as well as usingformat
with a%
format string converted to{}
.If you are looking at this as an exercise in writing parsers, I believe there's an incomplete example buried inside pyparsing.
Some differences that are hard to translate, off the top of my head:
*
for dynamic field width or precision;format
has a similar feature, but does it differently.%(10)s
, becauseformat
tries to interpret the key name as a number first, then falls back to a dict key.%(a[b])s
, becauseformat
doesn't quote or otherwise separate the key from the rest of the field, so a variety of characters simply can't be used.%c
takes integers or single-char strings;:c
only integers.%r
/%s
/%a
analogues are not part of the format string, but a separate part of the field (which also comes on the opposite side).%g
and:g
have slightly different cutoff rules.%a
and!a
don't do the exact same thing.The actual differences aren't listed anywhere; you will have to dig them out by a thorough reading of the Format Specification Mini-Language vs. the
printf
-style String Formatting language.