Validating a yaml document in python

Question

One of the benefits of XML is being able to validate a document against an XSD. YAML doesn't have this feature, so how can I validate that the YAML document I open is in the format expected by my application?

See also: stackoverflow.com/questions/45812387/…
– dreftymac
Commented Sep 15, 2017 at 6:02 — dreftymac, Commented Sep 15, 2017 at 6:02

Jack Kelly · Accepted Answer · 2014-03-06 17:14:02Z

66

Given that JSON and YAML are pretty similar beasts, you could make use of JSON-Schema to validate a sizable subset of YAML. Here's a code snippet (you'll need PyYAML and jsonschema installed):

from jsonschema import validate
import yaml

schema = """
type: object
properties:
  testing:
    type: array
    items:
      enum:
        - this
        - is
        - a
        - test
"""

good_instance = """
testing: ['this', 'is', 'a', 'test']
"""

validate(yaml.load(good_instance), yaml.load(schema)) # passes

# Now let's try a bad instance...

bad_instance = """
testing: ['this', 'is', 'a', 'bad', 'test']
"""

validate(yaml.load(bad_instance), yaml.load(schema))

# Fails with:
# ValidationError: 'bad' is not one of ['this', 'is', 'a', 'test']
#
# Failed validating 'enum' in schema['properties']['testing']['items']:
#     {'enum': ['this', 'is', 'a', 'test']}
#
# On instance['testing'][3]:
#     'bad'

One problem with this is that if your schema spans multiple files and you use "$ref" to reference the other files then those other files will need to be JSON, I think. But there are probably ways around that. In my own project, I'm playing with specifying the schema using JSON files whilst the instances are YAML.

answered Mar 6, 2014 at 17:14

Jack Kelly

2,5342 gold badges24 silver badges35 bronze badges

2

plus one (because SE doesn't like "+1"): jsonschema is storage-format-agnostic so it will work with input and schemas of any type as long as they deserialize to a Python object.
– Jason S
Commented Jan 5, 2015 at 18:02
9

...but I suggest you use safe_load rather than load.
– Jason S
Commented Jan 5, 2015 at 18:03
5

I built a website to track tooling support for to using JSON Schema with YAML. Currently, there is editor support in Visual Studio Code (via extension) and a command-line validation tool.
– vossad01
Commented Jun 30, 2017 at 1:53
What's the point of using a schema anyway? I mean if the yaml has to be the exact text, it's kind of useless.
– Thomas
Commented Dec 1, 2020 at 17:04
1

@Thomas an example: a user enters an integer to a config yaml file where instead a float is expected, then the application may start to behave unexpectedly. Just an example.
– bonobo
Commented Oct 28, 2022 at 9:19

| Show 2 more comments

Stabledog · Accepted Answer · 2020-12-03 13:38:36Z

I find Cerberus to be very reliable with great documentation and straightforward to use.

Here is a basic implementation example:

my_yaml.yaml:

name: 'my_name'
date: 2017-10-01
metrics:
    percentage:
    value: 87
    trend: stable

Defining the validation schema in schema.py:

{
    'name': {
        'required': True,
        'type': 'string'
    },
    'date': {
        'required': True,
        'type': 'date'
    },
    'metrics': {
        'required': True,
        'type': 'dict',
        'schema': {
            'percentage': {
                'required': True,
                'type': 'dict',
                'schema': {
                    'value': {
                        'required': True,
                        'type': 'number',
                        'min': 0,
                        'max': 100
                    },
                    'trend': {
                        'type': 'string',
                        'nullable': True,
                        'regex': '^(?i)(down|equal|up)$'
                    }
                }
            }
        }
    }
}

Using the PyYaml to load a yaml document:

import yaml
def load_doc():
    with open('./my_yaml.yaml', 'r') as stream:
        try:
            return yaml.load(stream)
        except yaml.YAMLError as exception:
            raise exception

## Now, validating the yaml file is straightforward:
from cerberus import Validator
schema = eval(open('./schema.py', 'r').read())
    v = Validator(schema)
    doc = load_doc()
    print(v.validate(doc, schema))
    print(v.errors)

Keep in mind that Cerberus is an agnostic data validation tool, which means that it can support formats other than YAML, such as JSON, XML and so on.

You should do print(v.validate(doc)) directly, cause you have already instanciated the Validator class with the schema object. — nixmind, Commented Nov 23, 2021 at 11:06
At the time of writing, the package is no longer actively maintained: github.com/pyeve/cerberus/issues/577#issuecomment-1282216209 — Asky McAskface, Commented Jan 6, 2023 at 13:44

Tom Pohl · Accepted Answer · 2021-12-08 14:15:53Z

You can load YAML document as a dict and use library schema to check it:

from schema import Schema, And, Use, Optional, SchemaError
import yaml

schema = Schema(
        {
            'created': And(datetime.datetime),
            'author': And(str),
            'email': And(str),
            'description': And(str),
            Optional('tags'): And(str, lambda s: len(s) >= 0),
            'setup': And(list),
            'steps': And(list, lambda steps: all('=>' in s for s in steps), error='Steps should be array of string '
                                                                                  'and contain "=>" to separate'
                                                                                  'actions and expectations'),
            'teardown': And(list)
        }
    )

with open(filepath) as f:
   data = yaml.load(f)
   try:
       schema.validate(data)
   except SchemaError as e:
       print(e)

Interesting how this very natural python validator got only 8 upvotes against the clunky JSON-scheme and Cerberus. Probably many people probably instinctively sees json notation as "proper" for a yaml validatior. However, natual data representation usually comes with optional simplifications and that is where Schema shines and json is terse. — Barney Szabolcs, Commented Jul 17, 2022 at 9:32

Unapiedra · Accepted Answer · 2021-09-05 22:24:45Z

16

Pydantic has not been mentioned.

From their example:

from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel


class User(BaseModel):
    id: int
    name = 'John Doe'
    signup_ts: Optional[datetime] = None
    friends: List[int] = []


# Parse your YAML into a dictionary, then validate against your model.
external_data = {
    'id': '123',
    'signup_ts': '2019-06-01 12:22',
    'friends': [1, 2, '3'],
}
user = User(**external_data)

answered Sep 5, 2021 at 22:24

Unapiedra

16.1k12 gold badges70 silver badges99 bronze badges

Add a comment |

nealmcb · Accepted Answer · 2012-03-23 07:34:14Z

14

Try Rx, it has a Python implementation. It works on JSON and YAML.

From the Rx site:

"When adding an API to your web service, you have to choose how to encode the data you send across the line. XML is one common choice for this, but it can grow arcane and cumbersome pretty quickly. Lots of webservice authors want to avoid thinking about XML, and instead choose formats that provide a few simple data types that correspond to common data structures in modern programming languages. In other words, JSON and YAML.

Unfortunately, while these formats make it easy to pass around complex data structures, they lack a system for validation. XML has XML Schemas and RELAX NG, but these are complicated and sometimes confusing standards. They're not very portable to the kind of data structure provided by JSON, and if you wanted to avoid XML as a data encoding, writing more XML to validate the first XML is probably even less appealing.

Rx is meant to provide a system for data validation that matches up with JSON-style data structures and is as easy to work with as JSON itself."

edited Mar 23, 2012 at 7:34

nealmcb

13.4k7 gold badges70 silver badges94 bronze badges

answered Jul 16, 2010 at 7:13

Lior

2,6312 gold badges17 silver badges15 bronze badges

This looks interesting. It's not clear how well it will handle python objects that are encoded in the yaml, but it's worth a try.
– Jon
Commented Jul 16, 2010 at 7:35
4

Can you share some usage examples? looked in the documentation but could not understand
– NI6
Commented Jun 14, 2017 at 10:22
7

There hasn't been a release of Rx since 2014, and no commits since 2015.
– Graham Lea
Commented May 24, 2021 at 13:21

Add a comment |

hc_dev · Accepted Answer · 2022-02-10 09:52:14Z

12

Yes - having support for validation is vital for lots of important use cases. See e.g. YAML and the importance of Schema Validation « Stuart Gunter

As already mentioned, there is Rx, available for various languages, and Kwalify for Ruby and Java.

See also the PyYAML discussion: YAMLSchemaDiscussion.

A related effort is JSON Schema, which even had some IETF standardization activity: draft-zyp-json-schema-03 - A JSON Media Type for Describing the Structure and Meaning of JSON Documents

edited Feb 10, 2022 at 9:52

hc_dev

9,3291 gold badge30 silver badges42 bronze badges

answered Mar 22, 2012 at 21:50

nealmcb

13.4k7 gold badges70 silver badges94 bronze badges

Add a comment |

Hari Priya Thangavel · Accepted Answer · 2022-03-19 04:13:27Z

I worked on a similar project where I need to validate the elements of YAML.

First, I thought 'PyYAML tags' is the best and simple way. But later decided to go with 'PyKwalify' which actually defines a schema for YAML.

PyYAML tags:

The YAML file has a tag support where we can enforce this basic checks by prefixing the data type. (e.g) For integer - !!int "123"

More on PyYAML: http://pyyaml.org/wiki/PyYAMLDocumentation#Tags This is good, but if you are going to expose this to the end user, then it might cause confusion. I did some research to define a schema of YAML.

Validate the YAML with its corresponding schema for basic data type check.
Custom validations like IP address, random strings can be added in schema.
Have YAML schema separately leaving YAML data simple and readable.

PyKwalify:

There is a package called PyKwalify which serves this purpose: https://pypi.python.org/pypi/pykwalify

This package best fits my requirements. I tried this with a small example in my local set up, and is working. Heres the sample schema file.

#sample schema

type: map
mapping:
    Emp:
        type:    map
        mapping:
            name:
                type:      str
                required:  yes
            email:
                type:      str
            age:
                type:      int
            birth:
                type:     str

Valid YAML file for this schema

---
Emp:
    name:   "abc"
    email:  "[email protected]"
    age:    yy
    birth:  "xx/xx/xxxx"

Thanks

PyKwalify seems great. Unfortunately it does not mention supporting Python 3.10 and 3.11 and a version hasn't been released since 2020. Great answer though. — tommy.carstensen, Commented May 22, 2023 at 6:43
@tommy.carstensen FWIW whilst there hasn't been much activity, there are 4 commits to the codebase since the last release, one very recently. See github.com/Grokzen/pykwalify/commits/master — Jeremy Davis, Commented Dec 5, 2023 at 2:14

Gringo Suave · Accepted Answer · 2013-02-06 07:17:08Z

4

These look good. The yaml parser can handle the syntax erorrs, and one of these libraries can validate the data structures.

http://pypi.python.org/pypi/voluptuous/ (I've tried this one, it is decent, if a bit sparse.)
http://discorporate.us/projects/flatland/ (not clear how to validate files at first glance)

edited Feb 6, 2013 at 7:17

answered Aug 29, 2012 at 22:35

Gringo Suave

31.8k7 gold badges94 silver badges82 bronze badges

2

The upvote is for Voluptuous, flatland looks like something totally different to a yaml validation library.
– Chris Withers
Commented Jun 17, 2015 at 17:49

Add a comment |

jaiks · Accepted Answer · 2017-05-17 13:38:50Z

1

You can use python's yaml lib to display message/char/line/file of your loaded file.

#!/usr/bin/env python

import yaml

with open("example.yaml", 'r') as stream:
    try:
        print(yaml.load(stream))
    except yaml.YAMLError as exc:
        print(exc)

The error message can be accessed via exc.problem

Access exc.problem_mark to get a <yaml.error.Mark> object.

This object allows you to access attributes

name
column
line

Hence you can create your own pointer to the issue:

pm = exc.problem_mark
print("Your file {} has an issue on line {} at position {}".format(pm.name, pm.line, pm.column))

answered May 17, 2017 at 13:38

jaiks

5264 silver badges9 bronze badges

6

This only validates basic YAML formatting - not any kind of schema.
– Saustrup
Commented Sep 12, 2018 at 6:55

Add a comment |

yaccob · Accepted Answer · 2018-05-25 23:48:14Z

I wrapped some existing json-related python libraries aiming for being able to use them with yaml as well.

The resulting python library mainly wraps ...

jsonschema - a validator for json files against json-schema files, being wrapped to support validating yaml files against json-schema files in yaml-format as well.
jsonpath-ng - an implementation of JSONPath for python, being wrapped to support JSONPath selection directly on yaml files.

... and is available on github:

https://github.com/yaccob/ytools

It can be installed using pip:

pip install ytools

Validation example (from https://github.com/yaccob/ytools#validation):

import ytools
ytools.validate("test/sampleschema.yaml", ["test/sampledata.yaml"])

What you don't get out of the box yet, is validating against external schemas that are in yaml format as well.

ytools is not providing anything that hasn't existed before - it just makes the application of some existing solutions more flexible and more convenient.

ars · Accepted Answer · 2010-07-16 07:12:00Z

-7

I'm not aware of a python solution. But there is a ruby schema validator for YAML called kwalify. You should be able to access it using subprocess if you don't come across a python library.

answered Jul 16, 2010 at 7:12

ars

123k23 gold badges150 silver badges135 bronze badges

3

I'm really looking for a pythonic solution. This is always a last resort.
– Jon
Commented Jul 16, 2010 at 7:34

Add a comment |

Collectives™ on Stack Overflow

Validating a yaml document in python

11 Answers 11

PyYAML tags:

PyKwalify:

Not the answer you're looking for? Browse other questions tagged
python
yaml
validation
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

PyYAML tags:

PyKwalify:

Not the answer you're looking for? Browse other questions tagged pythonyamlvalidation or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
yaml
validation
or ask your own question.