Should this information be calculated in real time

2019-03-03 12:50发布

I am working on a group project and we are having a discussion about whether to calculate data that we want from an existing database and store it in a new database to query later, or calculate the data from the existing database every time we need to use it. I was wondering what the pros and cons may be for either implementation. Is there any advice you could give?

Edit: Here is more elaborate explanation. We have a large database that has a lot of information being submitted to it daily. We are building a system to track certain points of data. For example, we are getting the count of how many times a user does something that is entered in the database. Using this example (are actual idea is a bit more complex), we are discussing to methods of getting the count of actions per users. The first method is to create a database that stores the users and their action count, and query this database every time we need the action count. The second method would be to query the large database and count the actions per user every time we need to use it. I hope this explanation helps explain. Thoughts?

Edit 2: Two more things that may be useful to point out is 1: I only have read access to the large database and 2: My ultimate goal is to display this information on a web page for end users.

1条回答
相关推荐>>
2楼-- · 2019-03-03 13:20

This is a generic question about optimization by caching. The following was my answer to essentially the same question. Even though that question provided a bunch of different details, none of them were specific enough to merit a non-generic answer either:

The more you want to calculate at query time, the more you want views, calculated columns and stored or user routines. The more you want to calculate at normalized base update time, the more you want cascades and triggers. The more you want to calculate at some other (scheduled or ad hoc) time, the more you use snapshots aka materialized views and updated denormalized bases. You can combine these. Any time the database is accessed it can be enabled by and restricted by stored routines or other api.

Until you can show that they are in adequate, views and calculated columns are the simplest.

The whole idea of a DBMS is to store a representation of your application state as the database (which normalization reduces the redundancy of) and then you query and let the DBMS implement and optimize calculation of the answer. You haven't presented a reason for not doing that in the most straightforward way possible.

[sic]

Always make sure an application is reading its own personal ("external") database that is a view of "the" ("conceptual") database so that when you change the implemention of the former (plus the rest of some combined interfact) by the latter (plus the rest of some compbined mechanisms) your applications do not have to change ("logical independence"). Here the applications are your users' and your trackers'.

Ultimately you must instrument and guestimate. When it is worth it you start caching. Preferably as much as possible in terms of high-level notions like views and snapshots and as little as possible in non-DBMS code. One of he benefits of the relational model is that it is easy to describe a strightforward relational interface in terms of another straightforward relational interface. You protect your applications from change by offering an interface that hides secrets of implementation or which of a family of interfaces is the current one.

查看更多
登录 后发表回答