Answer a question

I have a python project where I'd like to use YAML (pyYaml 3.11), particularly because it is "pretty" and easy for users to edit in a text editor if and when necessary. My problem, though, is if I bring the YAML into a python application (as I will need to) and edit the contents (as I will need to) then writing the new document is typically not quite as pretty as what I started with.

The pyyaml documentation is pretty poor - does not even document the parameters to the dump function. I found http://dpinte.wordpress.com/2008/10/31/pyaml-dump-option/. However, I'm still missing the information I need. (I started to look at the source, but it doesn't seem the most inviting. If I don't get the solution here, then that's my only recourse.)

I start with a document that looks like this:

- color green :
     inputs :
        - port thing :
            widget-hint : filename
            widget-help : Select a filename
        - port target_path : 
            widget-hint : path
            value : 'thing' 
     outputs:
        - port value:
             widget-hint : string
     text : |
            I'm lost and I'm found
            and I'm hungry like the wolf.

After loading into python (yaml.safe_load( s )), I try a couple ways of dumping it out:

>>> print yaml.dump( d3, default_flow_style=False, default_style='' )
- color green:
    inputs:
    - port thing:
        widget-help: Select a filename
        widget-hint: filename
    - port target_path:
        value: thing
        widget-hint: path
    outputs:
    - port value:
        widget-hint: string
    text: 'I''m lost and I''m found

      and I''m hungry like the wolf.

      '
>>> print yaml.dump( d3, default_flow_style=False, default_style='|' )
- "color green":
    "inputs":
    - "port thing":
        "widget-help": |-
          Select a filename
        "widget-hint": |-
          filename
    - "port target_path":
        "value": |-
          thing
        "widget-hint": |-
          path
    "outputs":
    - "port value":
        "widget-hint": |-
          string
    "text": |
      I'm lost and I'm found
      and I'm hungry like the wolf.

Ideally, I would like "short strings" to not use quotes, as in the first result. But I would like multi-line strings to be written as blocks, as with the second result. I guess fundamentally, I'm trying to minimize an explosion of unnecessary quotes in the file which I perceive would make it much more annoying to edit in a text editor.

Does anyone have any experience with this?

Answers

If you can use ruamel.yaml (disclaimer: I am the author of this enhanced version of PyYAML) you can round-trip the original format (YAML document stored in a file org.yaml):

import sys
import ruamel.yaml
from pathlib import Path

file_org = Path('org.yaml')
    
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
data = yaml.load(file_org)
yaml.dump(data, sys.stdout)

which gives:

- color green:
    inputs:
    - port thing:
        widget-hint: filename
        widget-help: Select a filename
    - port target_path:
        widget-hint: path
        value: 'thing'
    outputs:
    - port value:
        widget-hint: string
    text: |
      I'm lost and I'm found
      and I'm hungry like the wolf.

Your input is inconsistently indented/formatted, and although there is for more control in ruamel.yaml over the output than in PyYAML, you cannot get your exact original back:

  • you sometimes (color green :) have a space before the value indicator (:) and sometimes you don't (outputs:). Apart from special control over root level keys, ruamel.yaml always puts the value indicator directly adjoint to the key.
  • your root level sequence is indented two columns with offset for the block sequence indicator (-) of zero (this is the default ruamel.yaml uses). Others are indented five with three offset. ruamel.yaml cannot format sequences individually/inconstently, I recommend using the default since your root collection is a sequence.
  • your mappings are sometimes indented three columns (value for key color green) sometimes two (e.g. value for key port target_path). Again ruamel.yaml cannot format these individually/inconstently
  • Your block style literal scalar is indented more than the standard two spaces if you don't append a block indentation indicator to the | indicator (e.g. using |4). So this extra indention will be lost

As you see setting yaml.preserv_quotes keeps the superfluous quotes around 'thing' as that is not what you want, it is not set in the rest of this examples.

The following "normalises" all three examples:

import sys
import ruamel.yaml
from pathlib import Path
LT = ruamel.yaml.scalarstring.LiteralScalarString

file_org = Path('org.yaml')
file_plain = Path('plain.yaml')
file_block = Path('block.yaml')

def normalise(d):
    if isinstance(d, dict):
        for k, v in d.items():
             d[k] = normalise(v)
        return d
    if isinstance(d, list):
        for idx, elem in enumerate(d):
            d[idx] = normalise(elem)
        return d
    if not isinstance(d, str):
        return d
    if '\n' in d:
        if isinstance(d, LT):
            return d     # already a block style literal scalar
        return LT(d)
    return str(d)

yaml = ruamel.yaml.YAML()
for fn in [file_org, file_plain, file_block]:
    data = normalise(yaml.load(file_org))
    yaml.dump(data, fn)

assert file_org.read_bytes() == file_plain.read_bytes()
assert file_org.read_bytes() == file_block.read_bytes()
print(file_block.read_text())

which gives:

- color green:
    inputs:
    - port thing:
        widget-hint: filename
        widget-help: Select a filename
    - port target_path:
        widget-hint: path
        value: thing
    outputs:
    - port value:
        widget-hint: string
    text: |
      I'm lost and I'm found
      and I'm hungry like the wolf.

So, as you indicated, you get block style literal scalars if a scalar has newlines, and no block style and no quotes if a scalar it doesn't have a newline.

Logo

学AI,认准AI Studio!GPU算力,限时免费领,邀请好友解锁更多惊喜福利 >>>

更多推荐