Column with alternate serials

2019-05-27 01:49发布

I would like to create a table of user_widgets which is primary keyed by a user_id and user_widget_id, where user_widget_id works like a serial, except for that it starts at 1 per each user.

Is there a common or practical solution for this? I am using PostgreSQL, but an agnostic solution would be appreciated as well.

Example table: user_widgets

 |  user_id  |  user_widget_id  |  user_widget_name    |
 +-----------+------------------+----------------------+
 |  1        |  1               | Andy's first widget  |
 +-----------+------------------+----------------------+
 |  1        |  2               | Andy's second widget |
 +-----------+------------------+----------------------+
 |  1        |  3               | Andy's third widget  |
 +-----------+------------------+----------------------+
 |  2        |  1               | Jake's first widget  |
 +-----------+------------------+----------------------+
 |  2        |  2               | Jake's second widget |
 +-----------+------------------+----------------------+
 |  2        |  3               | Jake's third widget  |
 +-----------+------------------+----------------------+
 |  3        |  1               | Fred's first widget  |
 +-----------+------------------+----------------------+

Edit:

I just wanted to include some reasons for this design.

1. Less information disclosure, not just "Security through obscurity"

In a system where user's should not be aware of one another, they also should not be aware of eachother's widget_id's. If this were a table of inventory, weird trade secrets, invoices, or something more sensitive, they be able to start have their own uninfluenced set of ID's for those widgets. In addition to the obvious routine security checks, this adds an implicit security layer where the table has to be filtered by both the widget id and the user id.

2. Data Imports

Users should be permitted to import their data from some other system without having to trash all of their legacy IDs (if they have integer IDs).

3. Cleanliness

Not terribly dissimilar from my first point, but I think that users who create less content than other may be baffled or annoyed by significant jumps in their widget ID's. This of course is more superficial than functional, but could still be valuable.

A possible solution

One of the answers suggests the application layer handles this. I could store a next_id column on that user's table that gets incremented. Or perhaps even just count the rows per user, and not allow deletion of records (using a deleted/deactivated flag instead). Could this be done with a trigger function, or even a stored procedure rather than in the application layer?

2条回答
Fickle 薄情
2楼-- · 2019-05-27 01:58

I am going to join in, in questioning the specific requirements. In general, if you are trying to order things of this sort, that might be better left to the application. If you knew me you'd realize this was really saying something. My concern is that every case I can think of may require re-ordering on the part of the application because otherwise the numbers would be irrelevant.

So I would just:

CREATE TABLE user_widgets (
      user_id int references users(id),
      widget_id int,
      widget_name text not null,
      primary key(user_id, widget_id)
);

And I'd leave it at that.

Now based on your justification, this addresses all of your concerns (imports). However I have once in a long while had to do something similar. The use case I had was a case where a local tax jurisdiction required that packing slips(!) be sequentially numbered without gaps, separate from invoices. Counting records, btw won't meet your import requirements.

What we did was create a table with one row per sequence and use that and then tie that in with a trigger.

查看更多
beautiful°
3楼-- · 2019-05-27 02:02

If you have a table:

CREATE TABLE user_widgets (
  user_id int
 ,user_widget_name text  --should probably be a foreign key to a look-up table
  PRIMARY KEY (user_id, user_widget_name)
)

You could assign user_widget_id dynamically and query:

WITH x AS (
   SELECT *, row_number() OVER (PARTITION BY user_id
                                ORDER BY user_widget_name) AS user_widget_id 
   FROM   user_widgets
   )
SELECT *
FROM   x
WHERE  user_widget_id = 2;

user_widget_id is applied alphabetically per user in this scenario and has no gaps, Adding, changing or deleting entries can result in changes, obviously.

More about window functions in the manual.


Somewhat more (but not completely) stable:

CREATE TABLE user_widgets (
  user_id int
 ,user_widget_id serial
 ,user_widget_name
  PRIMARY KEY (user_id, user_widget_id)
)

And:

WITH x AS (
   SELECT *, row_number() OVER (PARTITION BY user_id
                                ORDER BY user_widget_id) AS user_widget_nr 
   FROM   user_widgets
   )
SELECT *
FROM   x
WHERE  user_widget_nr = 2;

Addressing question update

You can implement a regime to count existing widgets per user. But you will have a hard time making it bulletproof for concurrent writes. You would have to lock the whole table or use SERIALIZABLE transaction mode - both of which are real downers for performance and need additional code.

But if you guarantee that no rows are deleted you could go with my second approach - one sequence for user_widget_id across the table, that giving you a "raw" ID. A sequence is a proven solution for concurrent load, preserves the relative order in user_widget_id and is fast. You could provide access to the table using a view that dynamically replaces the "raw" user_widget_id with the corresponding user_widget_nr like my query above.

You could (in addition) "materialize" a gapless user_widget_id by replacing it with user_widget_nr at off hours or triggered by events of your choosing.

To improve performance I would have the sequence for user_widget_id start with a very high number. Seems like there can only be a handful of widgets per user.

SELECT setval(user_widgets_user_widget_id_seq', 100000);

If no number is high enough to be safe, add a flag instead. Use the condition WHERE user_widget_id > 100000 to quickly identify "raw" IDs. If your table is huge you may want to add a partial index using the condition (which will be small). For use in the mentioned view in a CASE statement. And in this statement to "materialize" IDs:

UPDATE user_widgets w
SET    user_widget_id = u.user_widget_nr
FROM (
   SELECT user_id, user_widget_id
         ,row_number() OVER (PARTITION BY user_id
                             ORDER BY user_widget_id) AS user_widget_nr 
   FROM   user_widgets
   WHERE  user_widget_id > 100000
   ) u
WHERE  w.user_id = u.user_id
AND    w.user_widget_id = u.user_widget_id;

Possibly follow up with a REINDEX or even VACUUM FULL ANALYZE user_widgets at off hours. Consider a FILLFACTOR below 100, as columns will be updated at least once.

I would certainly not leave this to the application. That introduces multiple additional points of failure.

查看更多
登录 后发表回答