How to Mutate a DataFrame?

2019-06-21 13:32发布

I am trying to remove some columns from my data frame and would prefer not to return the modified data frame and reassign it to the old. Instead, I would like the function to just modify the data frame. This is what I tried but it does not seem to be doing what I except. I was under the impression arguments as passed as reference and not by value?

function remove_cols! (df::DataFrame, cols)   
  df = df[setdiff(names(df), cols)];
end

df = DataFrame(x = [1:10], y = [11:20]);
remove_cols!(df, [:y]); # this does not modify the original data frame

Of course the below works but I would prefer if remove_cols! just changed the df in place

df = remove_cols!(df, [:y]);

How can I change the df in place inside my function?

Thanks!

2条回答
虎瘦雄心在
2楼-- · 2019-06-21 13:59

As I understand Julia it uses what is called pass by sharing, meaning that the reference is passed by value. So when you pass the DataFrame to the function a new reference to the DataFrame is created which is local to the function. When you reassign the local df variable with its own reference to the DataFrame it has no effect on the separate global variable and its separate reference to the DataFrame.

There is a function in DataFrames.jl for deleting columns in DataFrames.

查看更多
手持菜刀,她持情操
3楼-- · 2019-06-21 14:13

To answer the question of how to mutate a dataframe in your own function in general, the key is to use functions and operations that mutate the dataframe within the function. For example, see the function below which builds upon the standard dataframe append! function with some added benefits like it can append from any number of dataframes, the order of columns does not matter and missing columns will be added to the dataframes:

function append_with_missing!(df1::DataFrame, dfs::AbstractDataFrame...)

    columns = Dict{Symbol, Type}(zip(names(df1), colwise(eltype, df1)))
    for df in dfs
        columns_temp = Dict(zip(names(df), colwise(eltype, df)))
        merge!(columns, columns_temp)
    end
    for (n, t) in columns, df in [df1; [i for i in dfs]]
       n in names(df) || (df[n] = Vector{Union{Missing,t}}(missing, size(df, 1)))
    end
    for df in dfs
        append!(df1, df[names(df1)])
    end

end

Here, the first dataframe passed itself is mutated with rows added from the other dataframes.

(The functionality for adding missing columns is based upon the answer given by @Bogumił Kamiński here: Breaking change on vcat when columns are missing)

查看更多
登录 后发表回答