How to change an anchored scalar in a sequence wit

2020-04-21 04:41发布

问题:

When using ruamel.yaml version 0.15.92 with Python 3.6.6 on CentOS 7, I cannot seem to update the value of an anchored scalar in a sequence without destroying the anchor itself or creating invalid YAML from the next dump.

I have attempted to recreate the original node type with the new value (old PlainScalarString -> new PlainScalarString, old FoldedScalarString -> new FoldedScalarString, etc), copying the anchor to it. While this restores the anchor to the updated scalar value, it also creates invalid YAML because the first alias later in the YAML file duplicates the same anchor name and assigns to it the old value of the scalar I'm trying to update.

I then attempted to replace all of the affected aliases with actual alias text -- like *anchor_name -- but that causes the value to become quoted like '*anchor_name', rendering the alias useless.

I reverted that and then attempted to suppress the duplicate anchor name (by setting always_dump=False on every affected alias). While that does suppress the duplicate anchor name, it unfortunately just dumps the old value of the anchored scalar.

My entire test data is as follows; assume this is named test.yaml:

# Header comment
---
# Post-header comment

# Reusable aliases
aliases:
  - &plain_value This is unencrypted
  - &string_password ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]
  - &block_password >
    ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
    DQYJKoZIhvcNAQEBBQAEggEAojErrxuNcdX6oR+VA/I3PyuV2CwXx166nIUp
    asEHo1/CiCIoE3qCnjK2FJF8vg+l3AqRmdb7vYrqQ+30RFfHSlB9zApSw8NW
    tnEpawX4hhKAxnTc/JKStLLu2k7iZkhkor/UA2HeVJcCzEeYAwuOQRPaolmQ
    TGHjvm2w6lhFDKFkmETD/tq4gQNcOgLmJ+Pqhogr/5FmGOpJ7VGjpeUwLteM
    er3oQozp4l2bUTJ8wk9xY6cN+eeOIcWXCPPdNetoKcVropiwrYH8QV4CZ2Ky
    u0vpiybEuBCKhr1EpfqhrtuG5s817eOb7+Wf5ctR0rPuxlTUqdnDY31zZ3Kb
    mcjqHDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBATq6BjaxU2bfcLL5S
    bxzsgCDsWzggzxsCw4Dp0uYLwvMKjJEpMLeFXGrLHJzTF6U2Nw==]

top_key: unencrypted value
top_alias: *plain_value

top::hash:
  ignore: more
  # This pulls its string-form value from above
  stringified_alias: *string_password
  sub:
    ignore: value
    key: unencrypted subbed-value
    # This pulls its block-form value from above
    blocked_alias: *block_password
  sub_more:
    # This is a stringified EYAML value, NOT an alias
    inline_string: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqkv6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6HtkolM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoKB4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
    # Also NOT an alias, in block form
    block_string: >
      ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
      DQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12
      hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5
      TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqk
      v6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6Htko
      lM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4
      osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoK
      B4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064
      EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]

# Signature line

There are two forms of this issue, so here are two code examples for reproducing the conditions:

First, "How can we most simply update the value of an anchored scalar in a sequence without destroying the anchor or its aliases?" This looks like:

with open('test.yaml', 'r') as f:
  yaml_data = yaml.load(f)

yaml_data['aliases'][1] = "New string password"
yaml.dump(yaml_data, sys.stdout)

Note that this destroys the anchor. I would very much prefer the solution look as similar to this first snippet as possible; perhaps something like yaml_data['aliases'][1].set_value("New string password") # Changes only the scalar value while preserving the original anchor, comments, position, et al..

Second, "If we must instead wrap the new value in some object to preserve the anchor (and other attributes of the entry being replaced), what is the simplest approach which also preserves all aliases that refer to it (such that they adopt the updated value) when dumped?" My attempt to solve this requires quite a lot more code including recursive functions. Since SO guidelines advise against dumping large code, I will offer the relevant bits. Please assume the unlisted code is working perfectly well.

### <snip def FindEYAMLPaths(...) returns lists of paths through the YAML to every value starting with 'ENC['>
### <snip def GetYAMLValue(...) returns the node -- as a PlainScalarString, FoldedScalarString, et al. -- identified by a path from FindEYAMLPaths>
### <snip def DisableAnchorDump(...) sets `anchor.always_dump=False` if the node has an anchor attribute>

def ReplaceYAMLValue(value, data, path=None):
  if path is None:
    return

  ref = data
  last_ref = path.pop()
  for p in path:
    ref = ref[p]

  # All I'm trying to do here is change the scalar value without disrupting its comments, anchor, positioning, or any of its aliases.
  # This succeeds in changing the scalar value and preserving its original anchor, but disrupts its aliases which insist on preserving the old value.
  if isinstance(ref[last_ref], PlainScalarString):
    ref[last_ref] = PlainScalarString(value, anchor=ref[last_ref].anchor.value)
  elif isinstance(ref[last_ref], FoldedScalarString):
    ref[last_ref] = FoldedScalarString(value, anchor=ref[last_ref].anchor.value)
  else:
    ref[last_ref] = value


with open('test.yaml', 'r') as f:
  yaml_data = yaml.load(f)

seen_anchors = []
for path in FindEYAMLPaths(yaml_data):
  if path is None:
    continue

  node = GetYAMLValue(yaml_data, deque(path))
  if hasattr(node, 'anchor'):
    test_anchor = node.anchor.value
    if test_anchor is not None:
      if test_anchor in seen_anchors:
        # This is expected to just be an alias, pointing at the newly updated anchor
        DisableAnchorDump(node)
        continue
      seen_anchors.append(test_anchor)

  ReplaceYAMLValue("New string password", yaml_data, path)

yaml.dump(yaml_data, sys.stdout)

Note that this produces valid YAML except that all of the affected aliases are gone, replaced instead by the old value of the anchored scalar.

I expect to be able to change the value of an aliased scalar in a sequence without disrupting any other part of the YAML content. Based on other posts I've seen about ruamel.yaml, I fully accept that I may need to dump the updated YAML to file and reload it for the in-memory aliases to update to the new value. I simply expect to change:

Input File

aliases:
  - &some_anchor Old value

usage: *some_anchor

to:

Output File

aliases:
  - &some_anchor NEW VALUE

usage: *some_anchor

Instead, here's the output from the above two examples:

First, notice that the original anchor was destroyed and the value for top::hash:stringified_alias: now carries the original anchor and old value instead of the alias to the newly updated scalar value at ['aliases'][1]:

---
# Post-header comment

# Reusable aliases
aliases:
  - &plain_value This is unencrypted
  - New string password
  - &block_password >
    ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
    DQYJKoZIhvcNAQEBBQAEggEAojErrxuNcdX6oR+VA/I3PyuV2CwXx166nIUp
    asEHo1/CiCIoE3qCnjK2FJF8vg+l3AqRmdb7vYrqQ+30RFfHSlB9zApSw8NW
    tnEpawX4hhKAxnTc/JKStLLu2k7iZkhkor/UA2HeVJcCzEeYAwuOQRPaolmQ
    TGHjvm2w6lhFDKFkmETD/tq4gQNcOgLmJ+Pqhogr/5FmGOpJ7VGjpeUwLteM
    er3oQozp4l2bUTJ8wk9xY6cN+eeOIcWXCPPdNetoKcVropiwrYH8QV4CZ2Ky
    u0vpiybEuBCKhr1EpfqhrtuG5s817eOb7+Wf5ctR0rPuxlTUqdnDY31zZ3Kb
    mcjqHDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBATq6BjaxU2bfcLL5S
    bxzsgCDsWzggzxsCw4Dp0uYLwvMKjJEpMLeFXGrLHJzTF6U2Nw==]

# ... snip ...

top::hash:
  ignore: more
  # This pulls its string-form value from above
  stringified_alias: &string_password ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]

# ... snip ...

Second, notice that ['aliases'][1] now looks correct -- it is the new value with the original anchor -- but where I expect to see aliases to it, I instead see the old value. I expect to see *string_password instead of ENC[...].

---
# Post-header comment

# Reusable aliases
aliases:
  - &plain_value This is unencrypted
  - &string_password New string password
  - &block_password >-
    New string password

# ... snip ...

top::hash:
  ignore: more
  # This pulls its string-form value from above
  stringified_alias: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]

# ... snip ...

回答1:

If you read in an anchored scalar, like your This is unencrypted, using ruamel.yaml, you get a PlainScalarString object (or one of the other ScalarString subclasses), which is an extremely thin layer around the basic string type. That layer has an attribute to store an anchor if applicable (other uses are primarily to maintain quoting/literal/folding style information). And any aliases using that anchor refer to the same ScalarString instance.

When dumping the anchor attribute is not used to create aliases, that is is done in the normal way by having multiple references to the same object. The attribute is only used to write the anchor id and also does so if there is an attribute but no further references (i.e. an anchor without aliases).

So it is not surprising that if you replace such an object with multiple references (either at the anchor spot or any of the alias spots) that the reference disappears. If you then also force the same anchor name on some other object, you get duplicate anchors, contrary to the normal anchor/alias generation there is no check done on "forced" anchors.

Since the ScalarString is such a thin wrapper, they are essentially immutable objects, just like the string itself. Unlike with aliased dicts and lists which are collection objects that can be emptied and then filled (instead of replaced by a new instance), you cannot do that with string.

The implementation of ScalarString can of course be changed, so you can have your set_values() method, but involves creating alternative classes for all the objects (PlainScalarString, FoldedScalarString). You would have to make sure these get used for constructing and for representing and then preferable also behave like normal strings as far as you need it, so at least you can print. That is relatively easy to do but requires copying and slightly modifyging several tens of lines of code

I think it is easier to leave the ScalarStrings in place as is (i.e being immutable) and do what you need to do if you want to change all occurences (i.e. references): update all the references to the original. If your datastructure would contain millions of nodes that might be prohibitively time consuming, but still would be afraction of what loading and dumping the YAML itself would take:

import sys
from pathlib import Path
import ruamel.yaml

in_file = Path('test.yaml')

def update_aliased_scalar(data, obj, val):
    def recurse(d, ref, nv):
        if isinstance(d, dict):
            for i, k in [(idx, key) for idx, key in enumerate(d.keys()) if key is ref]:
                d.insert(i, nv, d.pop(k))
            for k, v in d.non_merged_items():
                if v is ref:
                    d[k] = nv
                else:
                    recurse(v, ref, nv)
        elif isinstance(d, list):
            for idx, item in enumerate(d):
                if item is ref:
                    d[idx] = nv
                else:
                    recurse(item, ref, nv)

    if hasattr(obj, 'anchor'):
        recurse(data, obj, type(obj)(val, anchor=obj.anchor.value))
    else:
        recurse(data, obj, type(obj)(val))

yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.preserve_quotes = True
data = yaml.load(in_file)

update_aliased_scalar(data, data['aliases'][1], "New string password")
update_aliased_scalar(data, data['top::hash']['sub']['blocked_alias'], "New block password\n")

yaml.dump(data, sys.stdout)

which gives:

# Post-header comment

# Reusable aliases
aliases:
  - &plain_value This is unencrypted
  - &string_password New string password
  - &block_password >
    New block password

top_key: unencrypted value
top_alias: *plain_value

top::hash:
  ignore: more
  # This pulls its string-form value from above
  stringified_alias: *string_password
  sub:
    ignore: value
    key: unencrypted subbed-value
    # This pulls its block-form value from above
    blocked_alias: *block_password
  sub_more:
    # This is a stringified EYAML value, NOT an alias
    inline_string: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqkv6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6HtkolM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoKB4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
    # Also NOT an alias, in block form
    block_string: >
      ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
      DQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12
      hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5
      TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqk
      v6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6Htko
      lM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4
      osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoK
      B4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064
      EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]

# Signature line

As you can see the anchors are preserved and it doesn't matter for update_aliased_scalar if you provide the anchored "place" or one of the aliased places as a reference.

The above recurse also handles keys that are aliased, as it is perfectly fine for a key in a YAML mapping to have an anchor or to be an alias. You can even have an anchored key with a value that is an alias to the corresponding key.



回答2:

It would be very nice to have support for in-place modification of existing anchored fields with types ScalarFloat/ScalarInt etc. YAML is often used for config files. One common use case I encountered is to create multiple config files from a very large template config file with only small changes made to the new files. I would load the template file into CommentedMap, modify a small set of keys in place and dump it back into a new yaml config file. This flow works very nicely if the keys to be changed are not anchored. When they are anchored, the anchors are duplicated in the new files as reported by OP and render them invalid. Manually addressing each anchored key in post-processing can be daunting when there are a large number of them.