可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Suppose we have the following 3 arrays in Julia:

5.0 3.5 6.0 3.6 7.0 3.0

5.0 4.5 6.0 4.7 8.0 3.0

5.0 4.0 6.0 3.2 8.0 4.0

I want to merge the 3 arrays in one array, by common values of the first column, summing the values of the second column. The result must be the following array:

5.0 12 6.0 11.5 7.0 3.0 8.0 7.0

I tried vcat and reduce but I don't get the pretended result. Is there a relatively simple way to code the instructions, avoiding a time-consuming code? Thank you!

回答1:

Given the following two assumptions:

the first column of each input array is sorted,
the first column of each input array is unique,

then for most input combinations (i.e. number of input arrays, sizes of arrays), the following algorithm should significantly outperform the other answers by taking advantage of the assumptions:

function f_ag(x::Matrix{T}...)::Matrix{T} where {T<:Number}
    isempty(x) && error("Empty input")
    any([ size(y,2) != 2 for y in x ]) && error("Input matrices must have two columns")
    length(x) == 1 && return copy(x[1]) #simple case shortcut
    nxmax = [ size(y,1) for y in x ]
    nxarrinds = find(nxmax .> 0)
    nxrowinds = ones(Int, length(nxarrinds))
    z = Tuple{T,T}[]
    while !isempty(nxarrinds)
        xmin = minimum(T[ x[nxarrinds[j]][nxrowinds[j], 1] for j = 1:length(nxarrinds) ])
        minarrinds = Int[ j for j = 1:length(nxarrinds) if x[nxarrinds[j]][nxrowinds[j], 1] == xmin ]
        rowsum = sum(T[ x[nxarrinds[k]][nxrowinds[k], 2] for k in minarrinds ])
        push!(z, (xmin, rowsum))
        for k in minarrinds
            nxrowinds[k] += 1
        end
        for j = length(nxarrinds):-1:1
            if nxrowinds[j] > nxmax[nxarrinds[j]]
                deleteat!(nxrowinds, j)
                deleteat!(nxarrinds, j)
            end
        end
    end
    return [ z[n][j] for n = 1:length(z), j = 1:2 ]
end

If assumption 2 is violated, that is, the first column is not guaranteed to be unique, you can still take advantage of the sort order, but the algorithm is going to be more complicated again since you'll need to additionally look forward on each minimum index to check for duplicates. I'm not going to put myself through that pain at this point.

Also note, you could adjust the following line:

rowsum = sum(T[ x[nxarrinds[k]][nxrowinds[k], 2] for k in minarrinds ])

to this:

rowsum = input_func(T[ x[nxarrinds[k]][nxrowinds[k], 2:end] for k in minarrinds ])

and now you can input whatever function you like, and also have any number of additional columns in your input matrices.

There are probably some additional optimizations that could be added here, eg pre-allocating z, specialized routine when there are only two input matrices, etc, but I'm not going to bother with them.

回答2:

There are probably many ways to do it. If you want to avoid coding you can use DataFrames package. This is not the fastest solution, but it is short.

Assume you have arrays defined as variables:

x = [5.0  3.5
     6.0  3.6
     7.0  3.0]

y = [5.0  4.5
     6.0  4.7
     8.0  3.0]

z = [5.0  4.0
     6.0  3.2
     8.0  4.0]

Then you can do:

using DataFrames
Matrix(aggregate(DataFrame(vcat(x,y,z)), :x1, sum))

The :x1 part is because by default first column of a DataFrame is called :x1 if you do not give an explicit name to it. In this recipe we convert matrices to a DataFrame aggregate them and convert back the result to a matrix.

回答3:

Without extra package, a possible solution can be something like

function aggregate(m::Array{<:Number,2}...)

    result=sortrows(vcat(m...))

    n = size(result,1)
    if n <= 1
        return result
    end 

    key_idx=1
    key=result[key_idx,1]

    for i in 2:n
      if key==result[i,1]
          result[key_idx,2:end] += result[i,2:end]
      else
          key = result[i,1]
          key_idx += 1
          result[key_idx,1]     = key 
          result[key_idx,2:end] = result[i,2:end]
      end
    end

    return result[1:key_idx,:]
end

Demo:

x = [5.0  3.5
     6.0  3.6
     7.0  3.0]

y = [5.0  4.5
     6.0  4.7
     8.0  3.0]

z = [5.0  4.0
     6.0  3.2
     8.0  4.0]

aggregate(x,y,z)

Prints:

4×2 Array{Float64,2}:
 5.0  12.0
 6.0  11.5
 7.0   3.0
 8.0   7.0

Note: this solution also works with any number of columns

Merge arrays by common column values in julia

问题:

回答1:

回答2:

回答3:

收藏的人(0)

Merge arrays by common column values in julia

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮