Complicated relationships vs simple tables Entity

2019-09-18 18:51发布

问题:

Database design question :) Is it smarter to make a lot of interconnected tables (normalize) or is it smarter to duplicate data so queries are simpler?

Here is my situation:

public class TransferRequest
{
    [Key]
    public int TransferRequestId { get; set; }

    public int By { get; set; }

    public int? For { get; set; }

    public int PersonId { get; set; }
    public virtual Person Person { get; set; }

    [ForeignKey("Transfer")]
    public int? ExistingTransferId { get; set; }
    public virtual Transfer ExistingTransfer { get; set; }

    [Required]
    [Range(1, 999)]
    public int Pax { get; set; }

    [Range(0, 999)]
    public int PaxChild { get; set; }

    [Range(0, 999)]
    public int PaxInfant { get; set; }

    public int StartPortId { get; set; }
    public virtual Port StartPort { get; set; }

    public int EndPortId { get; set; }
    public virtual Port EndPort { get; set; }

    [Required]
    [DataType(DataType.DateTime)]
    [UIHint("PickupTimePicker")]
    [Display(Name = "Pickup time"), DisplayFormat(DataFormatString = "{0:dd.MM.yyyy HH:mm}")]
    public DateTime PickupTime { get; set; }

    public bool Cargo { get; set; }

    public string CargoDescription { get; set; }

    public int Status { get; set; }

    [ForeignKey("Transfer")]
    public int? TransferId { get; set; }
    public virtual Transfer Transfer { get; set; }
}

after which this is created:

public class Transfer
{
    public Transfer()
    {
        Crew = new List<CrewOnTransfer>();
        TransferPoints = new List<TransferPoint>();
        TransferRequests = new List<TransferRequest>();
    }

    [Key]
    public int TransferId { get; set; }

    [ForeignKey("Ship")]
    public int ShipId { get; set; }
    public virtual Ship Ship { get; set; }

    [ForeignKey("ShipCrew")]
    public int CaptainId { get; set; }
    public virtual ShipCrew ShipCrew { get; set; }

    public virtual ICollection<CrewOnTransfer> Crew { get; set; }

    public virtual ICollection<TransferPoint> TransferPoints { get; set; }

    public virtual ICollection<TransferRequest> TransferRequests { get; set; }
}

That goes on for Crew:

public class CrewOnTransfer
{
    [Key]
    public int CrewOnTransferId { get; set; }

    [ForeignKey("ShipCrew")]
    public int ShipCrewId { get; set; }
    public virtual ShipCrew ShipCrew { get; set; }

    [ForeignKey("Transfer")]
    public int TransferId { get; set; }
    public virtual Transfer Transfer { get; set; }
}

And so on. You get the idea. Should I simplify the situation? This is getting messy to follow.

回答1:

Duplication is path to corruption - if you update one copy, and forget1 to update the other, you no longer have a good way of determining which of these two copies is valid. You have effectively corrupted the data.

The whole goal of normalization is removal of such duplication, and therefore decrease of the number of ways data can be corrupted. And since normalization is at the data model level, it increases the capability of the database to autonomously "defend" itself from buggy clients.

Performance2 means little if data is incorrect, so normalization is the norm (no pun intended). Denormalization is considered acceptable only if it's done judiciously and in a limited way, to solve a very big performance problem (that cannot be solved otherwise), and you can prove the performance benefit by measuring on representative amounts of data.

In a nutshell: first normalize, then denormalize if measurements justify it.


1 Beware that in a concurrent environment, where two independent clients may try to update different copies (representing same piece of information) in parallel, keeping data synchronized is non-trivial. Even in a single-client environment, bugs are a fact of life.

2 Or simplicity of client code, for that matter.