When using ruamel.yaml version 0.15.92 with Python 3.6.6 on CentOS 7, I cannot seem to update the value of an anchored scalar in a sequence without destroying the anchor itself or creating invalid YAML from the next dump.
I have attempted to recreate the original node type with the new value (old PlainScalarString
-> new PlainScalarString
, old FoldedScalarString
-> new FoldedScalarString
, etc), copying the anchor
to it. While this restores the anchor to the updated scalar value, it also creates invalid YAML because the first alias later in the YAML file duplicates the same anchor name and assigns to it the old value of the scalar I'm trying to update.
I then attempted to replace all of the affected aliases with actual alias text -- like *anchor_name
-- but that causes the value to become quoted like '*anchor_name'
, rendering the alias useless.
I reverted that and then attempted to suppress the duplicate anchor name (by setting always_dump=False
on every affected alias). While that does suppress the duplicate anchor name, it unfortunately just dumps the old value of the anchored scalar.
My entire test data is as follows; assume this is named test.yaml:
# Header comment
---
# Post-header comment
# Reusable aliases
aliases:
- &plain_value This is unencrypted
- &string_password ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]
- &block_password >
ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
DQYJKoZIhvcNAQEBBQAEggEAojErrxuNcdX6oR+VA/I3PyuV2CwXx166nIUp
asEHo1/CiCIoE3qCnjK2FJF8vg+l3AqRmdb7vYrqQ+30RFfHSlB9zApSw8NW
tnEpawX4hhKAxnTc/JKStLLu2k7iZkhkor/UA2HeVJcCzEeYAwuOQRPaolmQ
TGHjvm2w6lhFDKFkmETD/tq4gQNcOgLmJ+Pqhogr/5FmGOpJ7VGjpeUwLteM
er3oQozp4l2bUTJ8wk9xY6cN+eeOIcWXCPPdNetoKcVropiwrYH8QV4CZ2Ky
u0vpiybEuBCKhr1EpfqhrtuG5s817eOb7+Wf5ctR0rPuxlTUqdnDY31zZ3Kb
mcjqHDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBATq6BjaxU2bfcLL5S
bxzsgCDsWzggzxsCw4Dp0uYLwvMKjJEpMLeFXGrLHJzTF6U2Nw==]
top_key: unencrypted value
top_alias: *plain_value
top::hash:
ignore: more
# This pulls its string-form value from above
stringified_alias: *string_password
sub:
ignore: value
key: unencrypted subbed-value
# This pulls its block-form value from above
blocked_alias: *block_password
sub_more:
# This is a stringified EYAML value, NOT an alias
inline_string: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqkv6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6HtkolM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoKB4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
# Also NOT an alias, in block form
block_string: >
ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
DQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12
hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5
TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqk
v6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6Htko
lM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4
osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoK
B4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064
EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
# Signature line
There are two forms of this issue, so here are two code examples for reproducing the conditions:
First, "How can we most simply update the value of an anchored scalar in a sequence without destroying the anchor or its aliases?" This looks like:
with open('test.yaml', 'r') as f:
yaml_data = yaml.load(f)
yaml_data['aliases'][1] = "New string password"
yaml.dump(yaml_data, sys.stdout)
Note that this destroys the anchor. I would very much prefer the solution look as similar to this first snippet as possible; perhaps something like yaml_data['aliases'][1].set_value("New string password") # Changes only the scalar value while preserving the original anchor, comments, position, et al.
.
Second, "If we must instead wrap the new value in some object to preserve the anchor (and other attributes of the entry being replaced), what is the simplest approach which also preserves all aliases that refer to it (such that they adopt the updated value) when dumped?" My attempt to solve this requires quite a lot more code including recursive functions. Since SO guidelines advise against dumping large code, I will offer the relevant bits. Please assume the unlisted code is working perfectly well.
### <snip def FindEYAMLPaths(...) returns lists of paths through the YAML to every value starting with 'ENC['>
### <snip def GetYAMLValue(...) returns the node -- as a PlainScalarString, FoldedScalarString, et al. -- identified by a path from FindEYAMLPaths>
### <snip def DisableAnchorDump(...) sets `anchor.always_dump=False` if the node has an anchor attribute>
def ReplaceYAMLValue(value, data, path=None):
if path is None:
return
ref = data
last_ref = path.pop()
for p in path:
ref = ref[p]
# All I'm trying to do here is change the scalar value without disrupting its comments, anchor, positioning, or any of its aliases.
# This succeeds in changing the scalar value and preserving its original anchor, but disrupts its aliases which insist on preserving the old value.
if isinstance(ref[last_ref], PlainScalarString):
ref[last_ref] = PlainScalarString(value, anchor=ref[last_ref].anchor.value)
elif isinstance(ref[last_ref], FoldedScalarString):
ref[last_ref] = FoldedScalarString(value, anchor=ref[last_ref].anchor.value)
else:
ref[last_ref] = value
with open('test.yaml', 'r') as f:
yaml_data = yaml.load(f)
seen_anchors = []
for path in FindEYAMLPaths(yaml_data):
if path is None:
continue
node = GetYAMLValue(yaml_data, deque(path))
if hasattr(node, 'anchor'):
test_anchor = node.anchor.value
if test_anchor is not None:
if test_anchor in seen_anchors:
# This is expected to just be an alias, pointing at the newly updated anchor
DisableAnchorDump(node)
continue
seen_anchors.append(test_anchor)
ReplaceYAMLValue("New string password", yaml_data, path)
yaml.dump(yaml_data, sys.stdout)
Note that this produces valid YAML except that all of the affected aliases are gone, replaced instead by the old value of the anchored scalar.
I expect to be able to change the value of an aliased scalar in a sequence without disrupting any other part of the YAML content. Based on other posts I've seen about ruamel.yaml, I fully accept that I may need to dump the updated YAML to file and reload it for the in-memory aliases to update to the new value. I simply expect to change:
Input File
aliases:
- &some_anchor Old value
usage: *some_anchor
to:
Output File
aliases:
- &some_anchor NEW VALUE
usage: *some_anchor
Instead, here's the output from the above two examples:
First, notice that the original anchor was destroyed and the value for top::hash:stringified_alias:
now carries the original anchor and old value instead of the alias to the newly updated scalar value at ['aliases'][1]:
---
# Post-header comment
# Reusable aliases
aliases:
- &plain_value This is unencrypted
- New string password
- &block_password >
ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
DQYJKoZIhvcNAQEBBQAEggEAojErrxuNcdX6oR+VA/I3PyuV2CwXx166nIUp
asEHo1/CiCIoE3qCnjK2FJF8vg+l3AqRmdb7vYrqQ+30RFfHSlB9zApSw8NW
tnEpawX4hhKAxnTc/JKStLLu2k7iZkhkor/UA2HeVJcCzEeYAwuOQRPaolmQ
TGHjvm2w6lhFDKFkmETD/tq4gQNcOgLmJ+Pqhogr/5FmGOpJ7VGjpeUwLteM
er3oQozp4l2bUTJ8wk9xY6cN+eeOIcWXCPPdNetoKcVropiwrYH8QV4CZ2Ky
u0vpiybEuBCKhr1EpfqhrtuG5s817eOb7+Wf5ctR0rPuxlTUqdnDY31zZ3Kb
mcjqHDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBATq6BjaxU2bfcLL5S
bxzsgCDsWzggzxsCw4Dp0uYLwvMKjJEpMLeFXGrLHJzTF6U2Nw==]
# ... snip ...
top::hash:
ignore: more
# This pulls its string-form value from above
stringified_alias: &string_password ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]
# ... snip ...
Second, notice that ['aliases'][1] now looks correct -- it is the new value with the original anchor -- but where I expect to see aliases to it, I instead see the old value. I expect to see *string_password
instead of ENC[...]
.
---
# Post-header comment
# Reusable aliases
aliases:
- &plain_value This is unencrypted
- &string_password New string password
- &block_password >-
New string password
# ... snip ...
top::hash:
ignore: more
# This pulls its string-form value from above
stringified_alias: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]
# ... snip ...
If you read in an anchored scalar, like your
This is unencrypted
, usingruamel.yaml
, you get aPlainScalarString
object (or one of the otherScalarString
subclasses), which is an extremely thin layer around the basic string type. That layer has an attribute to store an anchor if applicable (other uses are primarily to maintain quoting/literal/folding style information). And any aliases using that anchor refer to the sameScalarString
instance.When dumping the anchor attribute is not used to create aliases, that is is done in the normal way by having multiple references to the same object. The attribute is only used to write the anchor id and also does so if there is an attribute but no further references (i.e. an anchor without aliases).
So it is not surprising that if you replace such an object with multiple references (either at the anchor spot or any of the alias spots) that the reference disappears. If you then also force the same anchor name on some other object, you get duplicate anchors, contrary to the normal anchor/alias generation there is no check done on "forced" anchors.
Since the
ScalarString
is such a thin wrapper, they are essentially immutable objects, just like the string itself. Unlike with aliased dicts and lists which are collection objects that can be emptied and then filled (instead of replaced by a new instance), you cannot do that withstring
.The implementation of ScalarString can of course be changed, so you can have your
set_values()
method, but involves creating alternative classes for all the objects (PlainScalarString
,FoldedScalarString
). You would have to make sure these get used for constructing and for representing and then preferable also behave like normal strings as far as you need it, so at least you can print. That is relatively easy to do but requires copying and slightly modifyging several tens of lines of codeI think it is easier to leave the
ScalarStrings
in place as is (i.e being immutable) and do what you need to do if you want to change all occurences (i.e. references): update all the references to the original. If your datastructure would contain millions of nodes that might be prohibitively time consuming, but still would be afraction of what loading and dumping the YAML itself would take:which gives:
As you can see the anchors are preserved and it doesn't matter for
update_aliased_scalar
if you provide the anchored "place" or one of the aliased places as a reference.The above
recurse
also handles keys that are aliased, as it is perfectly fine for a key in a YAML mapping to have an anchor or to be an alias. You can even have an anchored key with a value that is an alias to the corresponding key.It would be very nice to have support for in-place modification of existing anchored fields with types ScalarFloat/ScalarInt etc. YAML is often used for config files. One common use case I encountered is to create multiple config files from a very large template config file with only small changes made to the new files. I would load the template file into CommentedMap, modify a small set of keys in place and dump it back into a new yaml config file. This flow works very nicely if the keys to be changed are not anchored. When they are anchored, the anchors are duplicated in the new files as reported by OP and render them invalid. Manually addressing each anchored key in post-processing can be daunting when there are a large number of them.