Calculate Family Relationship from Genealogical Da

2019-03-08 04:09发布

问题:

I would like to be able to calculate the family relationship between two individuals in a family tree, given the following data schema (simplified from my actual data schema, only showing columns that directly apply to this problem):

individual
----------
id
gender

child
----------
child_id
father_id
mother_id

With this structure, how can one calculate the relationship between two individual id's (i.e. cousin, great grand uncle, etc.).

Also, as there are actually two relationships (i.e. A-B may be nephew whereas B-A is uncle), how can one generate the complement to the other (given uncle, and assuming we know gender, how might we generate nephew?). This is more of a trivial question, the former is what I'm really interested in.

Thanks all!

回答1:

You'll first need to calculate the Lowest Common Ancestor of both A and B. Call this Lowest Common Ancestor C.

Then calculate the distance in steps from C to A (CA) and C to B (CB). These values should be indexed into another table that determines the relationship based on these two values. For example:

CA      CB      Relation
1       2       uncle
2       1       nephew
2       2       cousin
0       1       father
0       2       grandfather

You might keep the basic relations in this table, and add "great-" for additional distances on certain relations like grandfather, ex.: (0, 3) = great-grandfather.

Hopefully this will point you in the right direction. Best of luck!

UPDATE: (I can't comment below your code, since I don't have the reputation yet.)

Your function aggrandize_relationships is a little off, I think. You can simplify it by prefixing "grand" if the offset is 1 or greater, then prefix "great-" (offset - 1) times. Your version might include the prefix "great grand great grand" for very distant relatives.(Not sure if I have the correct parameter in this explanation, but hopefully you get the gist of it. Also, no idea if your family tree is going that far back, but the point remains valid.)

UPDATE TOO: Sorry, the above is incorrect. I misread the default case, and thought it recursively called the function again. In my defense, I wasn't familiar with the "2nd great grandfather" notation, and always used "great great grandfather" myself. Code onward!!



回答2:

Below is my PHP implementation of my algorithm to calculate relationship. This is based upon the data schema I outlined in the original question. This only finds the "closest" i.e. shortest-path relationship between the two individuals, it does not resolve compound relationships such as half-siblings or double cousins.

Note that data access functions such as get_father and get_gender are written in the style of a database abstraction layer I always use. It should be fairly straightforward to understand what is going on, basically all dbms-specific functions such as mysql_query are replaced with generalized functions such as db_query; it is not very complicated at all, especially in the examples in this code, but feel free to post questions in comments if it isn't clear.

<?php
/* Calculate relationship "a is the ___ of b" */

define("GENDER_MALE", 1);
define("GENDER_FEMALE", 2);

function calculate_relationship($a_id, $b_id)
{
  if ($a_id == $b_id) {
    return 'self';
  }

  $lca = lowest_common_ancestor($a_id, $b_id);
  if (!$lca) {
    return false;
  }
  $a_dist = $lca[1];
  $b_dist = $lca[2];

  $a_gen = get_gender($a_id);

  // DIRECT DESCENDANT - PARENT
  if ($a_dist == 0) {
    $rel = $a_gen == GENDER_MALE ? 'father' : 'mother';
    return aggrandize_relationship($rel, $b_dist);
  }
  // DIRECT DESCENDANT - CHILD
  if ($b_dist == 0) {
    $rel = $a_gen == GENDER_MALE ? 'son' : 'daughter';
    return aggrandize_relationship($rel, $a_dist);
  }

  // EQUAL DISTANCE - SIBLINGS / PERFECT COUSINS
  if ($a_dist == $b_dist) {
    switch ($a_dist) {
      case 1:
        return $a_gen == GENDER_MALE ? 'brother' : 'sister';
        break;
      case 2:
        return 'cousin';
        break;
      default:
        return ordinal_suffix($a_dist - 2).' cousin';
    }
  }

  // AUNT / UNCLE
  if ($a_dist == 1) {
    $rel = $a_gen == GENDER_MALE ? 'uncle' : 'aunt';
    return aggrandize_relationship($rel, $b_dist, 1);
  }
  // NEPHEW / NIECE
  if ($b_dist == 1) {
    $rel = $a_gen == GENDER_MALE ? 'nephew' : 'niece';
    return aggrandize_relationship($rel, $a_dist, 1);
  }

  // COUSINS, GENERATIONALLY REMOVED
  $cous_ord = min($a_dist, $b_dist) - 1;
  $cous_gen = abs($a_dist - $b_dist);
  return ordinal_suffix($cous_ord).' cousin '.format_plural($cous_gen, 'time', 'times').' removed';
} //END function calculate_relationship

function aggrandize_relationship($rel, $dist, $offset = 0) {
  $dist -= $offset;
  switch ($dist) {
    case 1:
      return $rel;
      break;
    case 2:
      return 'grand'.$rel;
      break;
    case 3:
      return 'great grand'.$rel;
      break;
    default:
      return ordinal_suffix($dist - 2).' great grand'.$rel;
  }
} //END function aggrandize_relationship

function lowest_common_ancestor($a_id, $b_id)
{
  $common_ancestors = common_ancestors($a_id, $b_id);

  $least_distance = -1;
  $ld_index = -1;

  foreach ($common_ancestors as $i => $c_anc) {
    $distance = $c_anc[1] + $c_anc[2];
    if ($least_distance < 0 || $least_distance > $distance) {
      $least_distance = $distance;
      $ld_index = $i;
    }
  }

  return $ld_index >= 0 ? $common_ancestors[$ld_index] : false;
} //END function lowest_common_ancestor

function common_ancestors($a_id, $b_id) {
  $common_ancestors = array();

  $a_ancestors = get_ancestors($a_id);
  $b_ancestors = get_ancestors($b_id);

  foreach ($a_ancestors as $a_anc) {
    foreach ($b_ancestors as $b_anc) {
      if ($a_anc[0] == $b_anc[0]) {
        $common_ancestors[] = array($a_anc[0], $a_anc[1], $b_anc[1]);
        break 1;
      }
    }
  }

  return $common_ancestors;
} //END function common_ancestors

function get_ancestors($id, $dist = 0)
{
  $ancestors = array();

  // SELF
  $ancestors[] = array($id, $dist);

  // PARENTS
  $parents = get_parents($id);
  foreach ($parents as $par) {
    if ($par != 0) {
      $par_ancestors = get_ancestors($par, $dist + 1);
      foreach ($par_ancestors as $par_anc) {
        $ancestors[] = $par_anc;
      }
    }
  }

  return $ancestors;
} //END function get_ancestors

function get_parents($id)
{
  return array(get_father($id), get_mother($id));
} //END function get_parents

function get_father($id)
{
  $res = db_result(db_query("SELECT father_id FROM child WHERE child_id = %s", $id));
  return $res ? $res : 0;
} //END function get_father

function get_mother($id)
{
  $res = db_result(db_query("SELECT mother_id FROM child WHERE child_id = %s", $id));
  return $res ? $res : 0;
} //END function get_mother

function get_gender($id)
{
  return intval(db_result(db_query("SELECT gender FROM individual WHERE id = %s", $id)));
}

function ordinal_suffix($number, $super = false)
{
  if ($number % 100 > 10 && $number %100 < 14) {
    $os = 'th';
  } else if ($number == 0) {
    $os = '';
  } else {
    $last = substr($number, -1, 1);

    switch($last) {
      case "1":
        $os = 'st';
        break;
      case "2":
        $os = 'nd';
        break;
      case "3":
        $os = 'rd';
        break;
      default:
        $os = 'th';
    }
  }

  $os = $super ? '<sup>'.$os.'</sup>' : $os;

  return $number.$os;
} //END function ordinal_suffix

function format_plural($count, $singular, $plural)
{
  return $count.' '.($count == 1 || $count == -1 ? $singular : $plural);
} //END function plural_format

?>

As I had mentioned previously, the algorithm to determine LCA is far less than optimal. I plan to post a separate question to optimize that, and another to address the problem of calculating compound relationships such as double cousins.

Many thanks to everyone who helped prod me in the right direction! With your tips, this turned out to be much easier than I originally thought.



回答3:

This might help The Tree Relationship Calculator is an object that accepts an XML representation of a tree and will calculate the relationship of any two members within it. This article describes how relationships are calculated, and what terms like second cousin, or first cousin once removed, mean. This code includes an object for calculating relationships, written in JavaScript, as well as a web UI for rendering and interacting with the tree. The example project is setup as a classic ASP page.

http://www.codeproject.com/Articles/30315/Tree-Relationship-Calculator



回答4:

I solved this problem using adjacency list concept in java. One can have a node for every person and have its child relations associated to it on its node itself. Below is the code to find only Siblings and Cousins. However, you can enhance it according to your requirement. I wrote this code only for demonstration.

public class Person {
    String name;
    String gender;
    int age;
    int salary;
    String fatherName;
    String motherName;

    public Person(String name, String gender, int age, int salary, String fatherName,
            String motherName) {
        super();
        this.name = name;
        this.gender = gender;
        this.age = age;
        this.salary = salary;
        this.fatherName = fatherName;
        this.motherName = motherName;
    }

}

Below is the main code to add family people and to find relation among themselves.

import java.util.LinkedList;

public class PeopleAndRelationAdjacencyList {
    private static String MALE = "male";
    private static String FEMALE = "female";

public static void main(String[] args) {
    int size = 25;
    LinkedList<Person> adjListArray[] = new LinkedList[size];
    for (int i = 0; i < size; i++) {
        adjListArray[i] = new LinkedList<>();
    }

    addPerson( adjListArray, "GGM1", MALE, null, null );
    addPerson( adjListArray, "GGF1", FEMALE, null, null );

    addPerson( adjListArray, "GM1", MALE, "GGM1", "GGF1" );
    addPerson( adjListArray, "GM2", MALE, "GGM1", "GGF1" );

    addPerson( adjListArray, "GM1W", FEMALE, null, null );
    addPerson( adjListArray, "GM2W", FEMALE, null, null );

    addPerson( adjListArray, "PM1", MALE, "GM1", "GM1W" );
    addPerson( adjListArray, "PM2", MALE, "GM1", "GM1W" );
    addPerson( adjListArray, "PM3", MALE, "GM2", "GM2W" );

    addPerson( adjListArray, "PM1W", FEMALE, null, null );
    addPerson( adjListArray, "PM2W", FEMALE, null, null );
    addPerson( adjListArray, "PM3W", FEMALE, null, null );

    addPerson( adjListArray, "S1", MALE, "PM1", "PM1W" );
    addPerson( adjListArray, "S2", MALE, "PM2", "PM2W" );
    addPerson( adjListArray, "S3", MALE, "PM3", "PM3W" );
    addPerson( adjListArray, "S4", MALE, "PM3", "PM3W" );

    printGraph(adjListArray);
    System.out.println("Done !");


    getRelationBetweenPeopleForGivenNames(adjListArray, "S3", "S4");
    getRelationBetweenPeopleForGivenNames(adjListArray, "S1", "S2");

}


private static void getRelationBetweenPeopleForGivenNames(LinkedList<Person>[] adjListArray, String name1, String name2) {

    if ( adjListArray[getIndexOfGivenNameInHeadPositionOfList(adjListArray, name1)].peekFirst().fatherName
            .equalsIgnoreCase(
                    adjListArray[getIndexOfGivenNameInHeadPositionOfList(adjListArray, name2)].peekFirst().fatherName) ) {
        System.out.println("SIBLIGS");
        return;
    }

    String name1FatherName = adjListArray[getIndexOfGivenNameInHeadPositionOfList(adjListArray, name1)].peekFirst().fatherName;
    String name2FatherName = adjListArray[getIndexOfGivenNameInHeadPositionOfList(adjListArray, name2)].peekFirst().fatherName;

    if ( adjListArray[getIndexOfGivenNameInHeadPositionOfList(adjListArray, name1FatherName)].peekFirst().fatherName
            .equalsIgnoreCase(
                    adjListArray[getIndexOfGivenNameInHeadPositionOfList(adjListArray, name2FatherName)].peekFirst().fatherName) ) {
        System.out.println("COUSINS");
    }
}



private static void addPerson(LinkedList<Person>[] adjListArray, String name, String gender, String fatherName, String motherName) {
    Person person = new Person(name, gender, 0, 0, fatherName, motherName);
    int indexToPutperson = getEmptyIndexInAdjListToInserterson(adjListArray);
    adjListArray[indexToPutperson].addLast(person);
    if( fatherName!=null ){
        int indexOffatherName = getIndexOfGivenNameInHeadPositionOfList( adjListArray, fatherName);
        adjListArray[indexOffatherName].addLast(person);
    }
    if( motherName!=null ){
        int indexOfMotherName = getIndexOfGivenNameInHeadPositionOfList( adjListArray, motherName);
        adjListArray[indexOfMotherName].addLast(person);
    }
}

private static int getIndexOfGivenNameInHeadPositionOfList( LinkedList<Person>[] adjListArray, String nameToBeSearched ) {
    for (int i = 0; i < adjListArray.length; i++) {
        if( adjListArray[i] != null ){
            if(adjListArray[i].peekFirst() != null){
                if(adjListArray[i].peekFirst().name.equalsIgnoreCase(nameToBeSearched)){
                    return i;
                }
            }
        }
    }
    // handle if father name is not found
    return 0;
}


private static void printGraph(LinkedList<Person>[] adjListArray) {
    for (int v = 0; v < 15; v++) {
        System.out.print("head");

        LinkedList<Person> innerLinkedList = adjListArray[v];
        for (int i = 0; i < innerLinkedList.size(); i++) {
            Person person = innerLinkedList.get(i);
            System.out.print(" -> " + person.name);
        }

        System.out.println("\n");
    }
}

private static int getEmptyIndexInAdjListToInserterson( LinkedList<Person>[] adjListArray) {
    for (int i = 0; i < adjListArray.length; i++) {
        if(adjListArray[i].isEmpty()){
            return i;
        }
    }
    throw new IndexOutOfBoundsException("List of relation is full.");
}

}



回答5:

This might help you, it's a lot of theory and implementation of SQL queries to generate and query tree structures

http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html

In particular, look at the adjacency list model which uses a family tree as example.



回答6:

As strange as it might sound PROLOG seems to be the thing you're looking for. Given following ad-hoc program (http://www.pastey.net/117134 better colouring)

female(alice).
female(eve).
female(kate).

male(bob).
male(carlos).
male(dave).

% mother(_mother, _child).
mother(alice, bob).
mother(kate, alice).

% father(_father, _child)
father(carlos, bob).

child(C, P) :- father(P, C).
child(C, P) :- mother(P, C).

parent(X, Y) :- mother(X, Y).
parent(X, Y) :- father(X, Y).

sister(alice, eve).
sister(eve, alice).
sister(alice, dave).

brother(dave, alice).

% brother(sibling, sibling)
sibling(X, Y) :- brother(X, Y).
sibling(X, Y) :- sister(X, Y).


uncle(U, C) :- sibling(U, PARENT),
    child(C, PARENT),
    male(U).


relationship(U, C, uncle) :- uncle(U, C).
relationship(P, C, parent) :- parent(P, C).
relationship(B, S, brother) :- brother(B, S).
relationship(G, C, grandparent) :- parent(P, C), parent(G, P).

You could ask Prolog interpreter something like that:

relationship(P1, P2, R).

with the answers:


P1 = dave, P2 = bob, R = uncle ;
P1 = alice, P2 = bob, R = parent ;
P1 = kate, P2 = alice, R = parent ;
P1 = carlos, P2 = bob, R = parent ;
P1 = dave, P2 = alice, R = brother ;
P1 = kate, P2 = bob, R = grandparent ;
false.

It's a powerful tool, if you know how and when to use it. This seems exactly like a place for Prolog. I know it's not terribly popular, or easy to embed, but the impressive feature of wolphram alpha shown in one of the comments can be coded using nothing more than constructs used above, and this is Prolog 101.