Multi-tenanted DB. Strategy for Document ID and au

2019-04-02 20:58发布

问题:

I'm weighing up having separate DBs (one per company) vs one multi-tenanted DB (with all companies). Criteria:

  • A user can belong to one company only and can't access documents of other companies.
  • An administrator of the system needs to maintain DBs for all firms.
  • Number of companies/tenants - from hundreds to tens of thousands
  • There is one entry point with authentication, for all companies/tenants (it'll resolve the tenant and address it to the right DB).

Question #1. Are there any "good practices" for designing a multi-tenanted database in RavenDB?

There is a similar post for MongoDB. Would it be the same for RavenDB? More records will affect indexes, but would it potentially make some tenants suffer from active usage of an index by other tenants?


If I were to design a multi-tenanted DB for RavenDB, then I see the implementation as

  • have a Tag per Company/Tenant, so all users of one company have permission to the company tag and all top-level documents have the tag (see KB on Auth Bundle)
  • have a Tenant ID tag as a prefix for each Document ID (due to the official recommendation to use sequential identifiers and I'm happy with generating IDs on the server)

Question #2.1. Is tagging the best way to utilise the Authorization Bundle for resolving users' permissions and prevent accessing documents of other tenants?

Question #2.2. How important is to have the Tenant ID in the ID prefix of top-level documents? I guess, the main consideration here is performance once permissions gets resolved via tags or I'm missing something?

回答1:

If you are going to have a few hundreds companies, then a db per company is fine. If you are going to have tens of thousands, then you want to put it all in a single db.

A db can consume non trivial amount of resources, and having a LOT of them can be a lot more expensive than a single larger db.

I would recommend not using the authorization bundle, it requires us to do an O(N) filtering. It is better to add TenantId = XYZ in the query directly, maybe through a query listener.

Don't worry too much about sequential identifiers. They have an impact, but they aren't THAT important unless you are generating tens of thousands per second.


See an example of the listeners to handle multi-tenancy.

A query listener to add the current Tenant ID to all queries (filter out entries from other tenants):

public class TenantedEntityQueryListener : IDocumentQueryListener
{
    private readonly ICurrentTenantIdResolver _resolver;

    public TenantedEntityQueryListener(ICurrentTenantIdResolver resolver) : base(resolver) 
    {
        _resolver = resolver;
    }

    public void BeforeQueryExecuted(IDocumentQueryCustomization customization)
    {
        var type = customization.GetType();
        var entityType = type.GetInterfaces()
                             .SingleOrDefault(i => i.IsClosedTypeOf(typeof(IDocumentQuery<>))
                                                || i.IsClosedTypeOf(typeof(IAsyncDocumentQuery<>)))
                             ?.GetGenericArguments()
                             .Single();
        if (entityType != null && entityType.IsAssignableTo<ITenantedEntity>())
        {
            // Add the "AND" to the the WHERE clause 
            // (the method has a check under the hood to prevent adding "AND" if the "WHERE" is empty)
            type.GetMethod("AndAlso").Invoke(customization, null);
            // Add "TenantId = 'Bla'" into the WHERE clause
            type.GetMethod( "WhereEquals", 
                            new[] { typeof(string), typeof(object) }
                          )
                .Invoke(customization,
                    new object[]
                    {
                        nameof(ITenantedEntity.TenantId),
                        _resolver.GetCurrentTenantId()
                    }
                );
        }
    }
}

A store listener to set the current Tenant ID to all tenanted entities:

public class TenantedEntityStoreListener : IDocumentStoreListener
{
    private readonly ICurrentTenantIdResolver _resolver;

    public TenantedEntityStoreListener(ICurrentTenantIdResolver resolver) : base(resolver)
    {
        _resolver = resolver;
    }

    public bool BeforeStore(string key, object entityInstance, RavenJObject metadata, RavenJObject original)
    {
        var tenantedEntity = entityInstance as ITenantedEntity;
        if (tenantedEntity != null)
        {
            tenantedEntity.TenantId = _resolver.GetCurrentTenantId();
            return true;
        }

        return false;
    }

    public void AfterStore(string key, object entityInstance, RavenJObject metadata) {}
}

The interface, implemented by top-level entities supporting multi-tenancy:

public interface ITenantedEntity
{
    string TenantId { get; set; }
}


回答2:

My attempt to engage @AyendeRahien in a discussion of the technical implementation by editing his post was unsuccessful :), so below I'll address my concerns from the above:

1. Multi-tenanted DB vs multiple DBs

Here are some Ayende's thoughts on multi-tenancy in general.

In my view the question boils down to

  • expected number of tenants
  • DB size for each tenant.

Simply, in a case of a couple of tenants with a huge number of records, adding the tenant information into the indexes will unnecessary increase the index size and handling the tenant ID will bring some overhead you'd rather avoid, so go for two DBs then.

2. Design of multi-tenanted DB

Step #1. Add TenantId property to all persistent documents you want to support multi-tenancy.

/// <summary>
///     Interface for top-level entities, which belong to a tenant
/// </summary>
public interface ITenantedEntity
{
    /// <summary>
    ///     ID of a tenant
    /// </summary>
    string TenantId { get; set; }
}

/// <summary>
///     Contact information [Tenanted document]
/// </summary>
public class Contact : ITenantedEntity
{
    public string Id { get; set; }

    public string TenantId { get; set; }

    public string Name { get; set; }
}

Step #2. Implement facade for the Raven's session (IDocumentSession or IAsyncDocumentSession) to take care of multi-tenanted entities.

Sample code below:

/// <summary>
///     Facade for the Raven's IAsyncDocumentSession interface to take care of multi-tenanted entities
/// </summary>
public class RavenTenantedSession : IAsyncDocumentSession
{
    private readonly IAsyncDocumentSession _dbSession;
    private readonly string _currentTenantId;

    public IAsyncAdvancedSessionOperations Advanced => _dbSession.Advanced;

    public RavenTenantedSession(IAsyncDocumentSession dbSession, ICurrentTenantIdResolver tenantResolver)
    {
        _dbSession = dbSession;
        _currentTenantId = tenantResolver.GetCurrentTenantId();
    }

    public void Delete<T>(T entity)
    {
        if (entity is ITenantedEntity tenantedEntity && tenantedEntity.TenantId != _currentTenantId)
            throw new ArgumentException("Attempt to delete a record for another tenant");
        _dbSession.Delete(entity);
    }

    public void Delete(string id)
    {
        throw new NotImplementedException("Deleting by ID hasn't been implemented");
    }

    #region SaveChanges & StoreAsync---------------------------------------

    public Task SaveChangesAsync(CancellationToken token = new CancellationToken()) => _dbSession.SaveChangesAsync(token);

    public Task StoreAsync(object entity, CancellationToken token = new CancellationToken())
    {
        SetTenantIdOnEntity(entity);

        return _dbSession.StoreAsync(entity, token);
    }

    public Task StoreAsync(object entity, string changeVector, string id, CancellationToken token = new CancellationToken())
    {
        SetTenantIdOnEntity(entity);

        return _dbSession.StoreAsync(entity, changeVector, id, token);
    }

    public Task StoreAsync(object entity, string id, CancellationToken token = new CancellationToken())
    {
        SetTenantIdOnEntity(entity);

        return _dbSession.StoreAsync(entity, id, token);
    }

    private void SetTenantIdOnEntity(object entity)
    {
        var tenantedEntity = entity as ITenantedEntity;
        if (tenantedEntity != null)
            tenantedEntity.TenantId = _currentTenantId;
    }
    #endregion SaveChanges & StoreAsync------------------------------------

    public IAsyncLoaderWithInclude<object> Include(string path)
    {
        throw new NotImplementedException();
    }

    public IAsyncLoaderWithInclude<T> Include<T>(Expression<Func<T, string>> path)
    {
        throw new NotImplementedException();
    }

    public IAsyncLoaderWithInclude<T> Include<T, TInclude>(Expression<Func<T, string>> path)
    {
        throw new NotImplementedException();
    }

    public IAsyncLoaderWithInclude<T> Include<T>(Expression<Func<T, IEnumerable<string>>> path)
    {
        throw new NotImplementedException();
    }

    public IAsyncLoaderWithInclude<T> Include<T, TInclude>(Expression<Func<T, IEnumerable<string>>> path)
    {
        throw new NotImplementedException();
    }

    #region LoadAsync -----------------------------------------------------

    public async Task<T> LoadAsync<T>(string id, CancellationToken token = new CancellationToken())
    {
        T entity = await _dbSession.LoadAsync<T>(id, token);

        if (entity == null
         || entity is ITenantedEntity tenantedEntity && tenantedEntity.TenantId == _currentTenantId)
            return entity;

        throw new ArgumentException("Incorrect ID");
    }

    public async Task<Dictionary<string, T>> LoadAsync<T>(IEnumerable<string> ids, CancellationToken token = new CancellationToken())
    {
        Dictionary<string, T> entities = await _dbSession.LoadAsync<T>(ids, token);

        if (typeof(T).GetInterfaces().Contains(typeof(ITenantedEntity)))
            return entities.Where(e => (e.Value as ITenantedEntity)?.TenantId == _currentTenantId).ToDictionary(i => i.Key, i => i.Value);

        return null;
    }
    #endregion LoadAsync --------------------------------------------------

    #region Query ---------------------------------------------------------

    public IRavenQueryable<T> Query<T>(string indexName = null, string collectionName = null, bool isMapReduce = false)
    {
        var query = _dbSession.Query<T>(indexName, collectionName, isMapReduce);

        if (typeof(T).GetInterfaces().Contains(typeof(ITenantedEntity)))
            return query.Where(r => (r as ITenantedEntity).TenantId == _currentTenantId);

        return query;
    }

    public IRavenQueryable<T> Query<T, TIndexCreator>() where TIndexCreator : AbstractIndexCreationTask, new()
    {
        var query = _dbSession.Query<T, TIndexCreator>();

        var lastArgType = typeof(TIndexCreator).BaseType?.GenericTypeArguments?.LastOrDefault();

        if (lastArgType != null && lastArgType.GetInterfaces().Contains(typeof(ITenantedEntity)))
            return query.Where(r => (r as ITenantedEntity).TenantId == _currentTenantId);

        return query;
    }
    #endregion Query ------------------------------------------------------

    public void Dispose() => _dbSession.Dispose();
}

The code above may need some love if you need Include() as well.

My final solution doesn't use listeners for RavenDb v3.x as I suggested earlier (see my comment on why) or events for RavenDb v4 (because it's hard to modify the query in there).

Of course, if you write patches of JavaScript functions you'd have have to handle multi-tenancy manually.