There were quite a few questions about saving datetime & timezones info in DB but more on the overall level. Here I'd like to address a specific case.
System specs
- We have an Orders system database
- It is a multi-tenant system where tenants can use arbitrary timezone (it is arbitrary but single timezone per tenant, saved in Tenants table once and never changes)
Business rule needed to be covered in DB
- When tenant places an Order into the system, order number gets computed based on their local datetime (its not literally a number but some kind of an identifier like
ORDR-13432-Year-Month-Day
). Precise calculation is not important for the moment, it's just important that it's dependant on tenants local datetime
- We also do want to be able to select all Orders, on the system level, placed between some UTC datetimes regardless of the tenant (for general system statistics/reporting)
Our initial idea
- Our initial idea was to save UTC datetime across whole DB and, of course, keep tenants timezone offset relative to UTC and have application that consumes DB always convert datetimes to UTC so that DB itself always operate with UTC.
Approach 1
It's problematic because OrderDateTime
in this query means different moment in time, based on tenant. Of course, this query might include join to Tenants
table to get local datetime offset which would then calculate OrderDateTime
on the fly to make adjustments. It's possible, but not sure if it's a good way to do it?
Approach 2
- On the other hand, when saving UTC datetime, then when we do calculation of OrderNumber since the day/month/year in UTC might differ from the one in local datetime
Let's take extreme example; let's say tenant is 6 hours ahead of UTC and his local datetime is 2017-01-01 02:00
.
UTC would be 2016-12-31 20:00
. Order placed at that moment should get OrderNumber 'ORDR-13432-2017-1-1'
but if saving UTC it would get ORDR-13432-2016-12-31
.
In this case, at the moment of creating Order in DB, we should get UTC datetime, tenants offset and compile OrderNumber based on recalculated tenants localtime but still save DateTime column in UTC.
Questions
- What is the preferred way of handling this kind of situation?
- Is there a nice solution with saving UTC datetimes because that one would be pretty nice for us because of the system-level reporting?
- If going with saving UTC, is Approach 2) good way to handle those cases or is there some better/recommended way?
[UPDATE]
Based on comments from Gerard Ashton and Hugo:
Initial question was not clear with respect to the detail if tenant can change timezone or not and what happens if political authority changes the timezone properties or some terirory's timezone.
Of course that is of an extreme importance, but it is not in the center of this question . We might address that in a separate question.
For the sake of this question, lets assume tenant will not change location. Timezone properties or timezone itself for that location might change and those changes will be handled in the system separately from this question.
Hugo's answer is mostly correct, but I'll add a few key points:
When you're storing the customer's time zone, do NOT store a numerical offset. As others have pointed out, the offset from UTC is only for a single point in time, and can easily change for DST and for other reasons. Instead, you should store a time zone identifier, preferably an IANA time zone identifier as a string, such as "America/Los_Angeles"
. Read more in the timezone tag wiki.
Your OrderDateTime
field should absolutely represent the time in UTC. However, depending on your database platform, you have several choices for how to store this.
For example, if using Microsoft SQL Server, a good approach is to store the local time in a datetimeoffset
column, which preserves the offset from UTC. Note that any index you create on that column will be based on the UTC equivalent, so you will get good query performance when doing your range query.
If using other database platforms, you may instead wish to store the UTC value in a timestamp
field. Some databases also have timestamp with time zone
, but understand that it doesn't mean it stores the time zone or offset, it just means that it can do conversions for you implicitly as you store and retrieve values. If you intend to always represent UTC, then often timestamp
(without time zone) or just datetime
is more appropriate.
Since either of the above methods will store a UTC time, you'll also need to consider how to perform operations that need an index of local time values. For example, you might need to create a daily report, based on the day of the user's time zone. For that, you'd need to group by the local date. If you try to compute that at query time from your UTC value, you'll end up scanning the entire table.
A good approach to deal with this is to create a separate column for the local date
(or perhaps even the local datetime
depending on your needs, but not a datetimeoffset
or timestamp
). This could be a completely isolated column that you populate separately, or it could be a computed/calculated column based on your other column. Use this column in an index so you can filter or group by local date.
If you go for the computed-column approach, you'll need to know how to convert between time zones in the database. Some databases have a convert_tz
function built-in that understands IANA time zone identifiers.
If you're using Microsoft SQL Server, you can use the new AT TIME ZONE
function in SQL 2016 and Azure SQL DB, but that only works with Microsoft time zone identifiers. To use IANA time zone identifiers, you'll need a third party solution, such as my SQL Server Time Zone Support project.
At query time, avoid using the BETWEEN
statement. It is fully inclusive. It works ok for whole dates, but when you have time involved you're better off doing a half-open range query, such as:
... WHERE OrderDateTime >= @t1 AND OrderDateTime < @t2
For example, if @t1
were the start of today, @t2
would be the start of tomorrow.
Regarding the scenario discussed in comments where the user's time zone has changed:
If you choose to calculate the local date in the database, the only scenario you need to worry about is if a location or business switches time zones without a "zone split" occurring. A zone split is when a new time zone identifier is introduced which covers the area that changed, including their old and new rules.
For example, the latest zone added to the IANA tzdb at the time of writing this is America/Punta_Arenas
, which was a zone split when the southern part of Chile decided to stay at UTC-3 when the rest of Chile (America/Santiago
) went back to UTC-4 at the end of DST.
However, if a minor locality on the border of two time zones decides to change which side they follow, and a zone split wasn't warranted, then you'd potentially be using the rules of their new time zone against their old data.
If you store the local date separately (computed in the application, not the DB), then you'll have no problems. The user changes their time zone to the new one, all old data is still intact, and new data is stored with the new time zone.
I'd recommend to always use UTC internally, and convert to a timezone only when displaying the date to the user. So I tend to prefer approach 2.
If there's a business rule saying that the tenant's local date/time must be part of the identifier, so be it. But internally, you keep the order date in UTC.
Using your example: a tenant whose timezone is in UTC+06:00
, so the tenant's local time is 2017-01-01 02:00
, which is equivalent to 2016-12-31 20:00
in UTC.
The order identifier would be ORDR-13432-2017-1-1
and the order date would be UTC 2016-12-31 20:00Z
.
To get all orders between 2 dates, this query is straighforward:
SELECT * FROM ORDERS WHERE OrderDateTime BETWEEN UTCDateTime1 AND UTCDateTime2
Because OrderDateTime
is in UTC.
If looking for a specific tenant, then you can get the corresponding timezone, convert the date accordingly and search for it. Using the same example above (tenant's timezone is in UTC+06:00
), to get all orders made in 2017-01-01
(in tenant's local time):
--get tenant timezone
--startUTC=tenant's local 2017-01-01 00:00 converted to UTC (2016-12-31T18:00Z)
--endUTC=tenant's local 2017-01-01 23:59:59.999 converted to UTC (2017-01-01T17:59:59.999)
SELECT * FROM ORDERS WHERE OrderDateTime between startUTC and endUTC
This will get ORDR-13432-2017-1-1
correctly.
To make queries for multiple tenants in different timezones, both approaches require a join, so none are "better" for this case.
Unless you create an extra column with the tenant's local date/time (the UTC OrderDateTime
converted to tenant's timezone). It'll be redundant, but it can help you with queries that searches in more than one timezone. If that's a reasonable trade-off it will depend on how frequent those queries will be made.