I have a python project where I'd like to use YAML (pyYaml 3.11), particularly because it is "pretty" and easy for users to edit in a text editor if and when necessary. My problem, though, is if I bring the YAML into a python application (as I will need to) and edit the contents (as I will need to) then writing the new document is typically not quite as pretty as what I started with.
The pyyaml documentation is pretty poor - does not even document the parameters to the dump function. I found http://dpinte.wordpress.com/2008/10/31/pyaml-dump-option/. However, I'm still missing the information I need. (I started to look at the source, but it doesn't seem the most inviting. If I don't get the solution here, then that's my only recourse.)
I start with a document that looks like this:
- color green :
inputs :
- port thing :
widget-hint : filename
widget-help : Select a filename
- port target_path :
widget-hint : path
value : 'thing'
outputs:
- port value:
widget-hint : string
text : |
I'm lost and I'm found
and I'm hungry like the wolf.
After loading into python (yaml.safe_load( s )), I try a couple ways of dumping it out:
>>> print yaml.dump( d3, default_flow_style=False, default_style='' )
- color green:
inputs:
- port thing:
widget-help: Select a filename
widget-hint: filename
- port target_path:
value: thing
widget-hint: path
outputs:
- port value:
widget-hint: string
text: 'I''m lost and I''m found
and I''m hungry like the wolf.
'
>>> print yaml.dump( d3, default_flow_style=False, default_style='|' )
- "color green":
"inputs":
- "port thing":
"widget-help": |-
Select a filename
"widget-hint": |-
filename
- "port target_path":
"value": |-
thing
"widget-hint": |-
path
"outputs":
- "port value":
"widget-hint": |-
string
"text": |
I'm lost and I'm found
and I'm hungry like the wolf.
Ideally, I would like "short strings" to not use quotes, as in the first result. But I would like multi-line strings to be written as blocks, as with the second result. I guess fundamentally, I'm trying to minimize an explosion of unnecessary quotes in the file which I perceive would make it much more annoying to edit in a text editor.
Does anyone have any experience with this?
If you can use ruamel.yaml (disclaimer: I am the author of this enhanced version of PyYAML) you can round-trip the original format (YAML document stored in a file org.yaml
):
import sys
import ruamel.yaml
from pathlib import Path
file_org = Path('org.yaml')
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
data = yaml.load(file_org)
yaml.dump(data, sys.stdout)
which gives:
- color green:
inputs:
- port thing:
widget-hint: filename
widget-help: Select a filename
- port target_path:
widget-hint: path
value: 'thing'
outputs:
- port value:
widget-hint: string
text: |
I'm lost and I'm found
and I'm hungry like the wolf.
Your input is inconsistently indented/formatted, and although there is for more control in ruamel.yaml over the output than in PyYAML, you cannot get your exact original back:
- you sometimes (
color green :
) have a space before the value indicator (:
) and sometimes you don't (outputs:
). Apart from special control over root level keys, ruamel.yaml always puts the value indicator directly adjoint to the key.
- your root level sequence is indented two columns with offset for the block sequence indicator (
-
) of zero (this is the default ruamel.yaml uses). Others are indented five with three offset. ruamel.yaml cannot format sequences individually/inconstently, I recommend using the default since your root collection is a sequence.
- your mappings are sometimes indented three columns (value for key
color green
) sometimes two (e.g. value for key port target_path
). Again ruamel.yaml cannot format these individually/inconstently
- Your block style literal scalar is indented more than the standard two spaces if you don't append a block indentation indicator to the
|
indicator (e.g. using |4
). So this extra indention will be lost
As you see setting yaml.preserv_quotes
keeps the superfluous quotes around 'thing'
as that is not what you want, it is not set in the rest of this examples.
The following "normalises" all three examples:
import sys
import ruamel.yaml
from pathlib import Path
LT = ruamel.yaml.scalarstring.LiteralScalarString
file_org = Path('org.yaml')
file_plain = Path('plain.yaml')
file_block = Path('block.yaml')
def normalise(d):
if isinstance(d, dict):
for k, v in d.items():
d[k] = normalise(v)
return d
if isinstance(d, list):
for idx, elem in enumerate(d):
d[idx] = normalise(elem)
return d
if not isinstance(d, str):
return d
if '\n' in d:
if isinstance(d, LT):
return d # already a block style literal scalar
return LT(d)
return str(d)
yaml = ruamel.yaml.YAML()
for fn in [file_org, file_plain, file_block]:
data = normalise(yaml.load(file_org))
yaml.dump(data, fn)
assert file_org.read_bytes() == file_plain.read_bytes()
assert file_org.read_bytes() == file_block.read_bytes()
print(file_block.read_text())
which gives:
- color green:
inputs:
- port thing:
widget-hint: filename
widget-help: Select a filename
- port target_path:
widget-hint: path
value: thing
outputs:
- port value:
widget-hint: string
text: |
I'm lost and I'm found
and I'm hungry like the wolf.
So, as you indicated, you get block style literal scalars if a scalar has newlines, and no block style and no quotes if a scalar it doesn't have a newline.
所有评论(0)