java codility training Genomic-range-query

2020-05-20 05:39发布

The task is:

A non-empty zero-indexed string S is given. String S consists of N characters from the set of upper-case English letters A, C, G, T.

This string actually represents a DNA sequence, and the upper-case letters represent single nucleotides.

You are also given non-empty zero-indexed arrays P and Q consisting of M integers. These arrays represent queries about minimal nucleotides. We represent the letters of string S as integers 1, 2, 3, 4 in arrays P and Q, where A = 1, C = 2, G = 3, T = 4, and we assume that A < C < G < T.

Query K requires you to find the minimal nucleotide from the range (P[K], Q[K]), 0 ≤ P[i] ≤ Q[i] < N.

For example, consider string S = GACACCATA and arrays P, Q such that:

P[0] = 0    Q[0] = 8
P[1] = 0    Q[1] = 2
P[2] = 4    Q[2] = 5
P[3] = 7    Q[3] = 7

The minimal nucleotides from these ranges are as follows:

    (0, 8) is A identified by 1,
    (0, 2) is A identified by 1,
    (4, 5) is C identified by 2,
    (7, 7) is T identified by 4.

Write a function:

class Solution { public int[] solution(String S, int[] P, int[] Q); } 

that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M characters specifying the consecutive answers to all queries.

The sequence should be returned as:

    a Results structure (in C), or
    a vector of integers (in C++), or
    a Results record (in Pascal), or
    an array of integers (in any other programming language).

For example, given the string S = GACACCATA and arrays P, Q such that:

P[0] = 0    Q[0] = 8
P[1] = 0    Q[1] = 2
P[2] = 4    Q[2] = 5
P[3] = 7    Q[3] = 7

the function should return the values [1, 1, 2, 4], as explained above.

Assume that:

    N is an integer within the range [1..100,000];
    M is an integer within the range [1..50,000];
    each element of array P, Q is an integer within the range [0..N − 1];
    P[i] ≤ Q[i];
    string S consists only of upper-case English letters A, C, G, T.

Complexity:

    expected worst-case time complexity is O(N+M);
    expected worst-case space complexity is O(N), 
         beyond input storage 
         (not counting the storage required for input arguments).

Elements of input arrays can be modified.

My solution is:

class Solution {
    public int[] solution(String S, int[] P, int[] Q) {
        final  char c[] = S.toCharArray();
        final int answer[] = new int[P.length];
        int tempAnswer;
        char tempC;

        for (int iii = 0; iii < P.length; iii++) {
            tempAnswer = 4;
            for (int zzz = P[iii]; zzz <= Q[iii]; zzz++) {
                tempC = c[zzz];
                if (tempC == 'A') {
                    tempAnswer = 1;
                    break;
                } else if (tempC == 'C') {
                    if (tempAnswer > 2) {
                        tempAnswer = 2;
                    }
                } else if (tempC == 'G') {
                    if (tempAnswer > 3) {
                        tempAnswer = 3;
                    }

                }
            }
            answer[iii] = tempAnswer;
        }

        return answer;
    }
}

It is not optimal, I believe it's supposed to be done within one loop, any hint how can I achieve it?

You can check quality of your solution here https://codility.com/train/ test name is Genomic-range-query.

30条回答
我只想做你的唯一
2楼-- · 2020-05-20 05:56

If someone is still interested in this exercise, I share my Python solution (100/100 in Codility)

def solution(S, P, Q):

    count = []
    for i in range(3):
        count.append([0]*(len(S)+1))

    for index, i in enumerate(S):
        count[0][index+1] = count[0][index] + ( i =='A')
        count[1][index+1] = count[1][index] + ( i =='C')
        count[2][index+1] = count[2][index] + ( i =='G')

    result = []

    for i in range(len(P)):
      start = P[i]
      end = Q[i]+1

      if count[0][end] - count[0][start]:
          result.append(1)
      elif count[1][end] - count[1][start]:
          result.append(2)
      elif count[2][end] - count[2][start]:
          result.append(3)
      else:
          result.append(4)

    return result
查看更多
SAY GOODBYE
3楼-- · 2020-05-20 05:58

Here is a C# solution, the basic idea is pretty much the same as the other answers, but it may be cleaner:

using System;

class Solution
{
    public int[] solution(string S, int[] P, int[] Q)
    {
        int N = S.Length;
        int M = P.Length;
        char[] chars = {'A','C','G','T'};

        //Calculate accumulates
        int[,] accum = new int[3, N+1];
        for (int i = 0; i <= 2; i++)
        {
            for (int j = 0; j < N; j++)
            {
                if(S[j] == chars[i]) accum[i, j+1] = accum[i, j] + 1;
                else accum[i, j+1] = accum[i, j];
            }
        }

        //Get minimal nucleotides for the given ranges
        int diff;
        int[] minimums = new int[M];
        for (int i = 0; i < M; i++)
        {
            minimums[i] = 4;
            for (int j = 0; j <= 2; j++)
            {
                diff = accum[j, Q[i]+1] - accum[j, P[i]];
                if (diff > 0)
                {
                    minimums[i] = j+1;
                    break;
                }
            }
        }

        return minimums;
    }
}
查看更多
▲ chillily
4楼-- · 2020-05-20 05:58

This is a Swift 4 solution to the same problem. It is based on @codebusta's solution above:

public func solution(_ S : inout String, _ P : inout [Int], _ Q : inout [Int]) -> [Int] {
var impacts = [Int]()
var prefixSum = [[Int]]()
for _ in 0..<3 {
    let array = Array(repeating: 0, count: S.count + 1)
    prefixSum.append(array)
}

for (index, character) in S.enumerated() {
    var a = 0
    var c = 0
    var g = 0

    switch character {
    case "A":
        a = 1

    case "C":
        c = 1

    case "G":
        g = 1

    default:
        break
    }

    prefixSum[0][index + 1] = prefixSum[0][index] + a
    prefixSum[1][index + 1] = prefixSum[1][index] + c
    prefixSum[2][index + 1] = prefixSum[2][index] + g
}

for tuple in zip(P, Q) {
    if  prefixSum[0][tuple.1 + 1] - prefixSum[0][tuple.0] > 0 {
        impacts.append(1)
    }
    else if prefixSum[1][tuple.1 + 1] - prefixSum[1][tuple.0] > 0 {
        impacts.append(2)
    }
    else if prefixSum[2][tuple.1 + 1] - prefixSum[2][tuple.0] > 0 {
        impacts.append(3)
    }
    else {
        impacts.append(4)
    }
}

   return impacts
 }
查看更多
劫难
5楼-- · 2020-05-20 06:00

Here is the solution that got 100 out of 100 in codility.com. Please read about prefix sums to understand the solution:

public static int[] solveGenomicRange(String S, int[] P, int[] Q) {
        //used jagged array to hold the prefix sums of each A, C and G genoms
        //we don't need to get prefix sums of T, you will see why.
        int[][] genoms = new int[3][S.length()+1];
        //if the char is found in the index i, then we set it to be 1 else they are 0
        //3 short values are needed for this reason
        short a, c, g;
        for (int i=0; i<S.length(); i++) {
            a = 0; c = 0; g = 0;
            if ('A' == (S.charAt(i))) {
                a=1;
            }
            if ('C' == (S.charAt(i))) {
                c=1;
            }
            if ('G' == (S.charAt(i))) {
                g=1;
            }
            //here we calculate prefix sums. To learn what's prefix sums look at here https://codility.com/media/train/3-PrefixSums.pdf
            genoms[0][i+1] = genoms[0][i] + a;
            genoms[1][i+1] = genoms[1][i] + c;
            genoms[2][i+1] = genoms[2][i] + g;
        }

        int[] result = new int[P.length];
        //here we go through the provided P[] and Q[] arrays as intervals
        for (int i=0; i<P.length; i++) {
            int fromIndex = P[i];
            //we need to add 1 to Q[i], 
            //because our genoms[0][0], genoms[1][0] and genoms[2][0]
            //have 0 values by default, look above genoms[0][i+1] = genoms[0][i] + a; 
            int toIndex = Q[i]+1;
            if (genoms[0][toIndex] - genoms[0][fromIndex] > 0) {
                result[i] = 1;
            } else if (genoms[1][toIndex] - genoms[1][fromIndex] > 0) {
                result[i] = 2;
            } else if (genoms[2][toIndex] - genoms[2][fromIndex] > 0) {
                result[i] = 3;
            } else {
                result[i] = 4;
            }
        }

        return result;
    }
查看更多
冷血范
6楼-- · 2020-05-20 06:01

/* 100/100 solution C++. Using prefix sums. Firstly converting chars to integer in nuc variable. Then in a bi-dimensional vector we account the occurrence in S of each nucleoside x in it's respective prefix_sum[s][x]. After we just have to find out the lower nucluoside that occurred in each interval K.

*/ . vector solution(string &S, vector &P, vector &Q) {

int n=S.size();
int m=P.size();
vector<vector<int> > prefix_sum(n+1,vector<int>(4,0));
int nuc;

//prefix occurrence sum
for (int s=0;s<n; s++) {
    nuc = S.at(s) == 'A' ? 1 : (S.at(s) == 'C' ? 2 : (S.at(s) == 'G' ? 3 : 4) );        
    for (int u=0;u<4;u++) {
        prefix_sum[s+1][u] = prefix_sum[s][u] + ((u+1)==nuc?1:0);
    }
}

//find minimal impact factor in each interval K
int lower_impact_factor;

for (int k=0;k<m;k++) {

    lower_impact_factor=4;
    for (int u=2;u>=0;u--) {
        if (prefix_sum[Q[k]+1][u] - prefix_sum[P[k]][u] != 0)
            lower_impact_factor = u+1;
    }
    P[k]=lower_impact_factor;
}

return P;

}

查看更多
放荡不羁爱自由
7楼-- · 2020-05-20 06:02

simple php 100/100 solution

function solution($S, $P, $Q) {
    $result = array();
    for ($i = 0; $i < count($P); $i++) {
        $from = $P[$i];
        $to = $Q[$i];
        $length = $from >= $to ? $from - $to + 1 : $to - $from + 1;
        $new = substr($S, $from, $length);

        if (strpos($new, 'A') !== false) {
            $result[$i] = 1;
        } else {
            if (strpos($new, 'C') !== false) {
                $result[$i] = 2;
            } else {
                if (strpos($new, 'G') !== false) {
                    $result[$i] = 3;
                } else {
                   $result[$i] = 4;
                }
            }
        }
    }
    return $result;
}
查看更多
登录 后发表回答