julia create an empty dataframe and append rows to

2019-03-22 16:46发布

I am trying out the Julia DataFrames module. I am interested in it so I can use it to plot simple simulations in Gadfly. I want to be able to iteratively add rows to the dataframe and I want to initialize it as empty.

The tutorials/documentation on how to do this is sparse (most documentation describes how to analyse imported data).

To append to a nonempty dataframe is straightforward:

df = DataFrame(A = [1, 2], B = [4, 5])
push!(df, [3 6])

This returns.

3x2 DataFrame
| Row | A | B |
|-----|---|---|
| 1   | 1 | 4 |
| 2   | 2 | 5 |
| 3   | 3 | 6 |

But for an empty init I get errors.

df = DataFrame(A = [], B = [])
push!(df, [3, 6])

Error message:

ArgumentError("Error adding 3 to column :A. Possible type mis-match.")
while loading In[220], in expression starting on line 2

What is the best way to initialize an empty Julia DataFrame such that you can iteratively add items to it later in a for loop?

2条回答
对你真心纯属浪费
2楼-- · 2019-03-22 17:04
using Pkg, CSV, DataFrames

iris = CSV.read(joinpath(Pkg.dir("DataFrames"), "test/data/iris.csv"))

new_iris = similar(iris, nrow(iris))

head(new_iris, 2)
# 2×5 DataFrame
# │ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
# ├─────┼─────────────┼────────────┼─────────────┼────────────┼─────────┤
# │ 1   │ missing     │ missing    │ missing     │ missing    │ missing │
# │ 2   │ missing     │ missing    │ missing     │ missing    │ missing │

for (i, row) in enumerate(eachrow(iris))
    new_iris[i, :] = row[:]
end

head(new_iris, 2)

# 2×5 DataFrame
# │ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
# ├─────┼─────────────┼────────────┼─────────────┼────────────┼─────────┤
# │ 1   │ 5.1         │ 3.5        │ 1.4         │ 0.2        │ setosa  │
# │ 2   │ 4.9         │ 3.0        │ 1.4         │ 0.2        │ setosa  │
查看更多
地球回转人心会变
3楼-- · 2019-03-22 17:21

A zero length array defined using only [] will lack sufficient type information.

julia> typeof([])
Array{None,1}

So to avoid that problem is to simply indicate the type.

julia> typeof(Int64[])
Array{Int64,1}

And you can apply that to your DataFrame problem

julia> df = DataFrame(A = Int64[], B = Int64[])
0x2 DataFrame

julia> push!(df, [3  6])

julia> df
1x2 DataFrame
| Row | A | B |
|-----|---|---|
| 1   | 3 | 6 |
查看更多
登录 后发表回答