RavenDB: How can I properly index a cartesian prod

2019-09-14 20:26发布

问题:

This question is a spin-off of RavenDB: Why do I get null-values for fields in this multi-map/reduce index?, but I realized, the problem was another.

Consider my extremely simplified domain, rewritten to a movie rental store scenario for abstraction:

public class User
{
    public string Id { get; set; }
}

public class Movie
{
    public string Id { get; set; }
}

public class MovieRental
{
    public string Id { get; set; }
    public string MovieId { get; set; }
    public string UserId { get; set; }
}

It's a text-book many-to-many example.

The index I want to create is this:

For a given user, give me a list of every movie in the database (filtering/search left out for the moment) along with an integer describing how many times (or zero) the user has rented this movie.

Basically like this:

Users:

| Id     |
|--------|
| John   |
| Lizzie |
| Albert |

Movies:

| Id           |
|--------------|
| Robocop      |
| Notting Hill |
| Inception    |

MovieRentals:

| Id        | UserId | MovieId      |
|-----------|--------|--------------|
| rental-00 | John   | Robocop      |
| rental-01 | John   | Notting Hill |
| rental-02 | John   | Notting Hill |
| rental-03 | Lizzie | Robocop      |
| rental-04 | Lizzie | Robocop      |
| rental-05 | Lizzie | Inception    |

Ideally, I want an index to query, that would look like this:

| UserId | MovieId      | RentalCount |
|--------|--------------|-------------|
| John   | Robocop      | 1           |
| John   | Notting Hill | 2           |
| John   | Inception    | 0           |
| Lizzie | Robocop      | 2           |
| Lizzie | Notting Hill | 0           |
| Lizzie | Inception    | 1           |
| Albert | Robocop      | 0           |
| Albert | Notting Hill | 0           |
| Albert | Inception    | 0           |

Or declaratively:

  • I always want a full list of all the movies (eventually I will add filtering/searching) - even when providing a user that has never rented a single movie yet
  • I want a count of the rentals for each user, just the integer
  • I want to be able to sort by the rental-count - i.e. show the most-rented movies for a given user at the top of the list

However, I can't find a way to make the "cross-join" above and save it in the index. Instead, I initially thought I got it right with this maneuver below, but it does not allow me to sort (see failing test):

{"Not supported computation: x.UserRentalCounts.SingleOrDefault(rentalCount => (rentalCount.UserId == value(UnitTestProject2.MovieRentalTests+<>c__DisplayClass0_0).user_john.Id)).Count. You cannot use computation in RavenDB queries (only simple member expressions are allowed)."}

My question is basically: how can I - or can I at all - index so, that my requirements are fulfilled?


Below is my mentioned example, that does not fulfill my requirements, but that's where I am right now. It uses the following packages (VS2015):

packages.config

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="Microsoft.Owin.Host.HttpListener" version="3.0.1" targetFramework="net461" />
  <package id="NUnit" version="3.5.0" targetFramework="net461" />
  <package id="RavenDB.Client" version="3.5.2" targetFramework="net461" />
  <package id="RavenDB.Database" version="3.5.2" targetFramework="net461" />
  <package id="RavenDB.Tests.Helpers" version="3.5.2" targetFramework="net461" />
</packages>

MovieRentalTests.cs

using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
using Raven.Client.Indexes;
using Raven.Client.Linq;
using Raven.Tests.Helpers;

namespace UnitTestProject2
{
    [TestFixture]
    public class MovieRentalTests : RavenTestBase
    {
        [Test]
        public void DoSomeTests()
        {
            using (var server = GetNewServer())
            using (var store = NewRemoteDocumentStore(ravenDbServer: server))
            {
                //Test-data
                var user_john = new User { Id = "John" };
                var user_lizzie = new User { Id = "Lizzie" };
                var user_albert = new User { Id = "Albert" };


                var movie_robocop = new Movie { Id = "Robocop" };
                var movie_nottingHill = new Movie { Id = "Notting Hill" };
                var movie_inception = new Movie { Id = "Inception" };

                var rentals = new List<MovieRental>
                {
                    new MovieRental {Id = "rental-00", UserId = user_john.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-01", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
                    new MovieRental {Id = "rental-02", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
                    new MovieRental {Id = "rental-03", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-04", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-05", UserId = user_lizzie.Id, MovieId = movie_inception.Id}
                };

                //Init index
                new Movies_WithRentalsByUsersCount().Execute(store);

                //Insert test-data in db
                using (var session = store.OpenSession())
                {
                    session.Store(user_john);
                    session.Store(user_lizzie);
                    session.Store(user_albert);

                    session.Store(movie_robocop);
                    session.Store(movie_nottingHill);
                    session.Store(movie_inception);

                    foreach (var rental in rentals)
                    {
                        session.Store(rental);
                    }

                    session.SaveChanges();

                    WaitForAllRequestsToComplete(server);
                    WaitForIndexing(store);
                }

                //Test of correct rental-counts for users
                using (var session = store.OpenSession())
                {
                    var allMoviesWithRentalCounts =
                        session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
                            .ToList();

                    var robocopWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_robocop.Id);
                    Assert.AreEqual(1, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
                    Assert.AreEqual(2, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
                    Assert.AreEqual(0, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);

                    var nottingHillWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_nottingHill.Id);
                    Assert.AreEqual(2, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
                    Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
                    Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);
                }

                // Test that you for a given user can sort the movies by view-count
                using (var session = store.OpenSession())
                {
                    var allMoviesWithRentalCounts =
                        session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
                            .OrderByDescending(x => x.UserRentalCounts.SingleOrDefault(rentalCount => rentalCount.UserId == user_john.Id).Count)
                            .ToList();

                    Assert.AreEqual(movie_nottingHill.Id, allMoviesWithRentalCounts[0].MovieId);
                    Assert.AreEqual(movie_robocop.Id, allMoviesWithRentalCounts[1].MovieId);
                    Assert.AreEqual(movie_inception.Id, allMoviesWithRentalCounts[2].MovieId);
                }
            }
        }

        public class Movies_WithRentalsByUsersCount :
            AbstractMultiMapIndexCreationTask<Movies_WithRentalsByUsersCount.ReducedResult>
        {
            public Movies_WithRentalsByUsersCount()
            {
                AddMap<MovieRental>(rentals =>
                    from r in rentals
                    select new ReducedResult
                    {
                        MovieId = r.MovieId,
                        UserRentalCounts = new[] { new UserRentalCount { UserId = r.UserId, Count = 1 } }
                    });

                AddMap<Movie>(movies =>
                    from m in movies
                    select new ReducedResult
                    {
                        MovieId = m.Id,
                        UserRentalCounts = new[] { new UserRentalCount { UserId = null, Count = 0 } }
                    });

                Reduce = results =>
                    from result in results
                    group result by result.MovieId
                    into g
                    select new
                    {
                        MovieId = g.Key,
                        UserRentalCounts = (
                                from userRentalCount in g.SelectMany(x => x.UserRentalCounts)
                                group userRentalCount by userRentalCount.UserId
                                into subGroup
                                select new UserRentalCount { UserId = subGroup.Key, Count = subGroup.Sum(b => b.Count) })
                            .ToArray()
                    };
            }

            public class ReducedResult
            {
                public string MovieId { get; set; }
                public UserRentalCount[] UserRentalCounts { get; set; }
            }

            public class UserRentalCount
            {
                public string UserId { get; set; }
                public int Count { get; set; }
            }
        }

        public class User
        {
            public string Id { get; set; }
        }

        public class Movie
        {
            public string Id { get; set; }
        }

        public class MovieRental
        {
            public string Id { get; set; }
            public string MovieId { get; set; }
            public string UserId { get; set; }
        }
    }
}

回答1:

Since your requirement says "for a given user", if you really are looking only for a single user, you can do this with a Multi-Map index. Use the Movies table itself to produce the baseline zero-count records and then map in the actual MovieRentals records for the user on top of that.

If you really need it for all users crossed with all movies, I don't believe there is a way to do this cleanly with RavenDB as this would be considered reporting which is noted as one of the sour spots for RavenDB.

Here are some options if you really want to try to do this with RavenDB:

1) Create dummy records in the DB for every user and every movie and use those in your index with a 0 count. Whenever a movie or user is added/updated/deleted, update the dummy records accordingly.

2) Generate the zero-count records yourself in memory on request and merge that data with the data that RavenDB gives you back for the non-zero counts. Query for all users, query for all movies, create the baseline zero-count records, then do the actual query for non-zero counts and layer that on top. Finally, apply paging/filtering/sorting logic.

3) Use the SQL replication bundle to replicate the Users, Movies, and MovieRental tables out to SQL and use SQL for this "reporting" query.