I'm trying to draw an ER diagram for my project management software describing the following. It contains these entities:
- project - software projects
- tasks - software projects that can be broken into a number of tasks
- employees - employees that belong to this software
And:
A project can be divided into tasks. (Tasks can be created by the admin user, who can assign those tasks to selected projects. Here there is only assignment of tasks to projects, not assignment of employees to projects.)
Employees can be assigned to projects.
(An employee can be assigned to projects. Here there is only assignment of employees to projects, not assignment to tasks of projects.)For a selected task of a selected project we can assign employees from the pool--employees that are assigned to that project in 2. (This time we must specify project, task & employee; all 3 selections are mandatory.)
The input processes for 1, 2 & 3 above can be done in separate pages in the system. You can select any of them first.
For the above relationships I created this ERD:
Consider
- relationship 1 between project and task
- relationship 2 between project and employee
Is there a need for the two separate relationships as in the ER diagram, relationship no 1 & relationship no 2?
or
can we use only relationship 3 among project, employee and task, relationship no 3, for that also?
TL;DR You need all three relationship types/tables. Because if you drop one then in some situations you lose data--there is no way to use the remaining ones to answer all the same questions.
Normalization to higher NFs (normal forms) tells us when we can replace a relationship/table by smaller/simpler ones.
Each relationship table holds the rows that participate in the relationship. We can describe the relationship via a predicate (statement template):
1
Divides_to
holds(T, P)
rows whereproject P divides to task T
2
Has
holds(E, P)
rows whereemployee E is assigned to project P
3 holds
(E, T, P)
rows whereemployee E is assigned to task T on project P
Can we drop 1? If we ignore employees in 3 then we get rows where
some employee is assigned to task T on project P
. But (per above) that is not the rows in 1. Maybe project P divides to task T in 1 but no employee is assigned to task T on project P; then that row in 1 is not a subrow in 3. And there is no task info in 2. So we can't use 3 & 2 to replace 1.Can we drop 2? Similarly, if we ignore tasks in 3 then we don't get 2, and there is no employee info in 1. So we can't use 3 & 1 to replace 2.
Can we drop 3? Using 1 & 2 we can get rows where
employee E is assigned to project P AND project P divides to task T
. But (per above) that is not the rows in 3. They differ if an employee assigned to a project isn't assigned to all its tasks or if a task of a project doesn't have all its employees assigned to it. There's no other way to generate 3 from 1 & 2. So we can't use 1 & 2 to replace 3.So we need all three relationships.
Normalization to higher NFs guides decomposing a relationship into simpler others by which it can be expressed instead.
PS 1 That's also why we need the entity types/tables and not just the relationship types/tables. (If we didn't want them anyway for entity-specific attributes or just ER modeling conventions.) Eg the three relationships can't tell you about employees that aren't assigned to a project or to a task & project. Similarly for tasks & for projects.
PS 2 We ignore an attribute in relational algebra by not
project
ing on it. We ignore a column in SQL by notselect
ing it. The result's predicate is that FOR SOME value for the attribute/column, the old predicate holds. Relationalnatural join
gives the rows whose relationship/predicate is the AND of input relationships/predicates. In SQL for no duplicate rows & no shared nullable columns that'sselect distinct
from
natural join
.PS 3 Your design satisfies certain constraints: If a task-project pair appears in 3 then it must appear in 1 and if an employee-project pair appears in 3 then it must appear in 2. One way to reflect that in ER modeling is by reifying the task-project & employee-project relationships to associative entities then replacing 3 by a what ER calls a binary relationship on the entities. Relationally, the relationship/table is still ternary, where certain subrows happen to identify those entities. A way to get a constrained relationally binary 3 is to add an employee-project PK (primary key) or CK (candidate key) id in 2 and replace the composite FK (foreign key) in 3 by such an id. Then we have a binary on entities and on values. Some pseudo-ER methods do this.
PS 4 This style of (true Chen) ER diagram doesn't typically use SQL nulls. But as it happens you could replace all three relationships by a variant of 3 with nulls. You would
null
-extend the binary relations andunion
them with the ternary. As usual, nulls complicate predicates. Usually we add a nullable column as an alternative to adding a separate table sharing a null-free CK (candidate key). But this is different, without the savings in space or joins; it just complicates things. (Including important constraints.)(Also it's problematic in SQL because SQL
unique
,primary key
&join
are not the relational things by those names because they treatnull
specially.)PS 5 Some answers of mine re such ternary vs binary relation(ship) types/tables/predicates:
Should this ER diagram use a ternary relationship instead
Best Solution - Ternary or Binary Relationship
Why can't you just join in fan trap?
And re design & predicates:
Modeling multiple many to many relationships between the same entities in a relational database
What is the difference between an entity relationship model and a relational model?
Is there any rule of thumb to construct SQL query from a human-readable description?
PS 6
Has
is an unhelpfully generic relationship name/meaning/table. Use meaningful names likeIs_assigned_to
orAssignment
.