JSON schema for data description vs data validatio

2019-04-24 13:22发布

In what I can find about using JSON schema, there seems to be a confusing conflation of (or at least a lack of distinction among) the tasks of describing valid data, validating stored data, and validating input data.

A typical example looks like:

var schema = {
    type: 'object',
    properties: {
        id: { type: 'integer', required: true },
        name: { type: 'string', required: true },
        description: { type: 'string', required: false }
    }
};

This works well for describing what valid data in a data store should look like, and therefore for validating it (the latter isn't terribly useful—if it's in a store it should be valid already):

var storedData = {
    id: 123,
    name: 'orange',
    description: 'delicious'
};

It doesn't work that well for validating input. id is most likely left for the application to generate and not for the user to provide as part of the input. The following input fails validation because it lacks the id which the schema declares to be required:

var inputData = {
    name: 'orange',
    description: 'delicious'
};

Fine, one might say, the schema isn't meant to validate direct input, validation should only occur after the application added an id and the data is what is meant to be stored.

If the schema isn't meant to validate direct input, however, what is 1) the point of JavaScript validators running in the browser, presumably being fed direct input and 2) the point of the obviously input-oriented readonly schema feature in the spec?

Ground gets shakier when thinking of properties that can be set once but not updated (like a username), as well as different access levels (e.g. the admin and the owner of the orange should be able to change the description, while for other users it should stay readonly).

What is the best (or at least working) practice to deal with this? A different schema for each use case, like below?

var baseSchema = {
    type: 'object',
    properties: {
        id: { type: 'integer', required: true },
        name: { type: 'string', required: true },
        description: { type: 'string', required: false }
    }
};

var ownerUpdateSchema = {
    type: 'object',
    properties: {
        id: { type: 'integer', required: false, readonly: true },
        name: { type: 'string', required: true },
        description: { type: 'string', required: false }
    }
};

var userUpdateSchema = {
    type: 'object',
    properties: {
        id: { type: 'integer', required: false, readonly: true },
        name: { type: 'string', required: false, readonly: true },
        description: { type: 'string', required: false, readonly: true }
    }
};

Or something else?

2条回答
爷的心禁止访问
2楼-- · 2019-04-24 14:03

Side-note: "required" is now an array in the parent element in v4, and "readOnly" is capitalised differently - I'll be using that form for my examples

I agree that validating the stored data is pretty rare. And if you're just describing the data, then you don't need to specify that "id" is required.

Another thing to say is that these schemas should all have URIs at which they can be referenced (e.g. /schemas/baseSchema). At that point, you can extend the schemas to make "id" required in some of them:

var ownerInputSchema = {
    type: 'object',
    properties: {
        id: {type: 'integer', readOnly: true},
        name: {type: 'string'},
        description: {type: 'string'}
    },
    required: ['name']
};

var userInputSchema = {
    allOf: [{"$ref": "/schemas/inputSchema"}],
    properties: {
        name: {readOnly: true}
    }
};

var storedSchema = {
    allOf: [{"$ref": "/schemas/inputSchema"}],
    required: ["id"]
}

Although, as I said above, I'm not sure storedSchema should be necessary. What you end up with is one "owner" schema that describes the data format (as served, and as editable by the data owner), and you have a secondary schema that extends that to declare readOnly on an additional property.

查看更多
等我变得足够好
3楼-- · 2019-04-24 14:26

Well, I think the purpose of Json-Schema is more clearly defined in v4. The goal is assist you in a data structure validation (whether it is going to be stored, it has been sent to you across the wire, or you are creating in an interactive fashion).

readOnly is not a Json-Schema validation property because it has not validation constraints. In Json-Schema v4 readOnly is part of the hyper-schema definition. It can be used to express that you can not change this property in a POST request.

Json-schema does not define how you should implement the interaction with the user, if you allow for transitory "bad" data, or if any error has to be corrected before you can add more data to the system. This is up to you.

查看更多
登录 后发表回答