Extract probabilities and most likely parse tree f

In order to understand cyk algorithm I've worked through example on : https://www.youtube.com/watch?v=VTH1k-xiswM&feature=youtu.be .

The result of which is :

How do I extract the probabilities associated with each parse and extract the most likely parse tree ?

These are two distinct problems for PCFG:

recognition: does the sentence belong to the language generated by the CFG? (output: yes or no)
parsing: what is the highest scoring tree for this sentence? (output: parse tree)

The CKY algorithm video linked in the question only deals with the recognition problem. To perform the parsing problem simultaneously, we need to (i) maintain the score of each parsing item and (ii) keep track of the hierarchical relationships (e.g. if we use the rule S -> NP VP: we must keep track of which NP and which VP are used to predict S).

Notations:

A parsing item [X, i, j]: s means that there is a node labelled X spanning token i (included) to j (excluded) with score s. The score is the log probability of the subtree rooted in X.
The sentence is a sequence of words w_1 w_2 ... w_n.

At an abstract level, PCFG parsing can be formulated as a set of deduction rules:

Scan Rule (read tokens)
```
____________________________{precondition: X -> w_i is a grammar rule
[X, i, i+1]: log p(X -> w_i)
```
Gloss: if there is a rule X -> w_i in the grammar, then we can add the item [X, i, i+1] in the chart.
Complete Rule (create a new constituent bottom up)
```
[X, i, k]: s1     [Y, k, j]: s2
_____________________________________{precondition: Z -> X Y is a grammar rule
[Z, i, j]: s1 + s2 + log p(Z -> X, Y)
```
Gloss: if there are 2 parsing items [X, i, k] and [Y, k, j] in the chart, and a rule Z -> X Y in the grammar, then we can add [Z, i, j] to the chart.

The goal of weighted parsing is to deduce the parsing item [S, 0, n]:s (S: axiom, n: length of sentence) with the highest score s.

Pseudo code for whole algorithm

# The chart stores parsing items and their scores
chart[beginning(int)][end(int)][NonTerminal][score(float)]

# the backtrack table is used to recover the parse tree at the end
backtrack[beginning][end][NonTerminal][item_left, item_right]

# insert a new item in the chart
# for a given span (i, j) and nonterminal X, we only need to
# keep the single best scoring item.
def insert(X, i, j, score, Y, Z, k):
    if X not in chart[i][j] or chart[i][j][X] < score             
        chart[i][j][X] <- score
        backtrack[i][j][X] <- (Y, i, k), (Z, k, j)

n <- length of sentence

for i in range(0, n):
    # apply scan rule
    insert(X, i, i+1, log p(X -> w_i)) for each grammar rule X -> w_i

for span_length in range(2, n):
    for beginning in range(0, n - span_length):
        end <- beginning + span_length

        for k in range(beginning+1, end -1):
            # apply completion rules
            for each grammar rule X -> Y Z such that 
                * Y is in chart[beginning][k]
                * Z is in chart[k][end]

                score_left <- chart[beginning][k][Y]
                score_right <- chart[k][end][Z]

                insert(X, beginning, end, log p(X -> Y Z) + score_left + score_right)

if there is S (axiom) in chart[0][n], then parsing is successful
    the score (log probability) of the best tree is chart[0][n][S]
    the best tree can be recovered recursively by following pointers from backtrack[0][n][S]
else:
    parsing failure, the sentence does not belong to the language
    generated by the grammar

Some notes: