Say I have two tables, user
and comment
. They have table definitions that look like this:
CREATE TABLE `user` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`username` VARCHAR(255) NOT NULL,
`deleted` TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
UNIQUE KEY (`username`)
) ENGINE=InnoDB;
CREATE TABLE `comment` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`user_id` INTEGER NOT NULL,
`comment` TEXT,
`deleted` TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
CONSTRAINT `fk_comment_user_id` FOREIGN KEY (`user_id`)
REFERENCES `user` (`id`)
ON DELETE CASCADE
ON UPDATE CASCADE
) ENGINE=InnoDB;
This is great for enforcing data integrity and all that, but I want to be able to "delete" a user and keep all its comments (for reference's sake).
To this end, I've added deleted
so that I can SET deleted = 1
on a record. By listing everything with deleted = 0
by default, I can hide away all the deleted records until I need them.
So far so good.
The problem comes when:
- A user signs up with a username (say, "Sam"),
- I soft-delete that user (for unrelated reasons), and
- Someone else comes along to sign up as Sam, and suddenly we've violated the UNIQUE constraint on
user
.
I want users to be able to edit their own usernames, so I shouldn't make username
the primary key, and we'll still have the same problem when deleting users.
Any thoughts?
Edit for clarification: Added following RedFilter's answer and comments below.
I'm concerned with the case where the "deleted" users and comments are not visible to the public, but are visible only administrators, or are kept for the purpose of calculating statistics.
This question is a thought experiment, with the user and comment tables just being examples. Still, username
wasn't the best one to use; RedFilter makes valid points about user identity, particularly when the records are presented in a public context.
Regarding "Why isn't username the primary key?": this is just an example, but if I apply this to a real problem I'll be needing to work within the constraints of an existing system that assumes the existence of a surrogate primary key.
Add unique constraint on fields(username, deleted) Change field type for 'deleted' to INTEGER.
During delete operation (it can be done in trigger, or in part of code where you need actually delete user) copy value of id field to deleted field.
This approach allow you:
Field 'Deleted' can't have only 2 value because the following scenario will not work:
Just keep the unique index or contraint on
username
. You do not want new users to be able to use the deleted name, as not only could there be general confusion about identity, but if you are still showing the old posts from the deleted user, then they will mistakenly be understood to be posted by the new user with the same name.When a new user registers, you would normally check to see if the name is in use before allowing registration to complete, so there should be no conflict here.
.
My practical solution for soft-delete is archiving by creating a new table with following columns:
original_id
,table_name
,payload
, (and an optional primary key `id).Where
original_id
is the original id of deleted record,table_name
is the table name of the deleted record ("user"
in your case),payload
is JSON-stringified string from all columns of the deleted record.I also suggest making an index on the column
original_id
for latter data retrievement.By this way of archiving data. You will have these advantages
The is already a discussion here explaining why soft-deletion is not a good idea in practice. Soft-delete introduces some potential troubles in future such as counting records, ...