Is graphql schema circular reference an anti-patte

2019-05-26 01:42发布

问题:

graphql schema like this:

type User {
  id: ID!
  location: Location
}

type Location {
  id: ID!
  user: User
}

Now, the client sends a graphql query. Theoretically, the User and Location can circular reference each other infinitely.

I think it's an anti-pattern. For my known, there is no middleware or way to limit the nesting depth of query both in graphql and apollo community.

This infinite nesting depth query will cost a lot of resources for my system, like bandwidth, hardware, performance. Not only server-side, but also client-side.

So, if graphql schema allow circular reference, there should be some middlewares or ways to limit the nesting depth of query. Or, add some constraints for the query.

Maybe do not allow circular reference is a better idea?

I prefer to sending another query and doing multiple operations in one query. It's much more simple.

回答1:

It depends.

It's useful to remember that the same solution can be a good pattern in some contexts and an antipattern in others. The value of a solution depends on the context that you use it. — Martin Fowler

It's a valid point that circular references can introduce additional challenges. As you point out, they are a potential security risk in that they enable a malicious user to craft potentially very expensive queries. In my experience, they also make it easier for client teams to inadvertently overfetch data.

On the other hand, circular references allow an added level of flexibility. Running with your example, if we assume the following schema:

type Query {
  user(id: ID): User
  location(id: ID): Location
}

type User {
  id: ID!
  location: Location
}

type Location {
  id: ID!
  user: User
}

it's clear we could potentially make two different queries to fetch effectively the same data:

{
  # query 1
  user(id: ID) {
    id
    location {
      id
    }
  }

  # query 2
  location(id: ID) {
    id
    user {
      id
    }
  }
}

If the primary consumers of your API are one or more client teams working on the same project, this might not matter much. Your front end needs the data it fetches to be of a particular shape and you can design your schema around those needs. If the client always fetches the user, can get the location that way and doesn't need location information outside that context, it might make sense to only have a user query and omit the user field from the Location type. Even if you need a location query, it might still not make sense to expose a user field on it, depending on your client's needs.

On the flip side, imagine your API is consumed by a larger number of clients. Maybe you support multiple platforms, or multiple apps that do different things but share the same API for accessing your data layer. Or maybe you're exposing a public API designed to let third-party apps integrate with your service or product. In these scenarios, your idea of what a client needs is much blurrier. Suddenly, it's more important to expose a wide variety of ways to query the underlying data to satisfy the needs of both current clients and future ones. The same could be said for an API for a single client whose needs are likely to evolve over time.

It's always possible to "flatten" your schema as you suggest and provide additional queries as opposed to implementing relational fields. However, whether doing so is "simpler" for the client depends on the client. The best approach may be to enable each client to choose the data structure that fits their needs.

As with most architectural decisions, there's a trade-off and the right solution for you may not be the same as for another team.

If you do have circular references, all hope is not lost. Some implementations have built-in controls for limiting query depth. GraphQL.js does not, but there's libraries out there like graphql-depth-limit that do just that. It'd be worthwhile to point out that breadth can be just as large a problem as depth -- regardless of whether you have circular references, you should look into implementing pagination with a max limit when resolving Lists as well to prevent clients from potentially requesting thousands of records at a time.

As @DavidMaze points out, in addition to limiting the depth of client queries, you can also use dataloader to mitigate the cost of repeatedly fetching the same record from your data layer. While dataloader is typically used to batch requests to get around the "n+1 problem" that arises from lazily loading associations, it can also help here. In addition to batching, dataloader also caches the loaded records. That means subsequent loads for the same record (inside the same request) don't hit the db but are fetched from memory instead.



回答2:

The pattern you show is fairly natural for a "graph" and I don't think it's especially discouraged in GraphQL. The GitHub GraphQL API is the thing I often look at when I wonder "how do people build larger GraphQL APIs", and there are routinely object cycles there: a Repository has a RepositoryOwner, which can be a User, which has a list of repositories.

At least graphql-ruby has a control to limit nesting depth. Apollo doesn't obviously have this control, but you might be able to build a custom data source or use the DataLoader library to avoid repeatedly fetching objects you already have.