Ruby on Rails - Import Data from a CSV file

2019-01-01 11:53发布

问题:

I would like to import data from a CSV file into an existing database table. I do not want to save the CSV file, just take the data from it and put it into the existing table. I am using Ruby 1.9.2 and Rails 3.

This is my table:

create_table \"mouldings\", :force => true do |t|
  t.string   \"suppliers_code\"
  t.datetime \"created_at\"
  t.datetime \"updated_at\"
  t.string   \"name\"
  t.integer  \"supplier_id\"
  t.decimal  \"length\",         :precision => 3, :scale => 2
  t.decimal  \"cost\",           :precision => 4, :scale => 2
  t.integer  \"width\"
  t.integer  \"depth\"
end

Can you give me some code to show me the best way to do this, thanks.

回答1:

require \'csv\'    

csv_text = File.read(\'...\')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end


回答2:

Simpler version of yfeldblum\'s answer, that is simpler and works well also with large files:

require \'csv\'    

CSV.foreach(filename, :headers => true) do |row|
  Moulding.create!(row.to_hash)
end

No need for with_indifferent_access or symbolize_keys, and no need to read in the file to a string first.

It doesnt\'t keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.



回答3:

The smarter_csv gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.

  require \'smarter_csv\'
  options = {}
  SmarterCSV.process(\'input_file.csv\', options) do |chunk|
    chunk.each do |data_hash|
      Moulding.create!( data_hash )
    end
  end

You can use the option chunk_size to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.

See also: https://github.com/tilo/smarter_csv



回答4:

This can help. It has code examples too:

http://csv-mapper.rubyforge.org/

Or for a rake task for doing the same:

http://erikonrails.snowedin.net/?p=212



回答5:

You might try Upsert:

require \'upsert\' # add this to your Gemfile
require \'csv\'    

u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
  selector = { name: row[\'name\'] } # this treats \"name\" as the primary key and prevents the creation of duplicates by name
  setter = row.to_hash
  u.row selector, setter
end

If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.



回答6:

It is better to wrap the database related process inside a transaction block. Code snippet blow is a full process of seeding a set of languages to Language model,

require \'csv\'

namespace :lan do
  desc \'Seed initial languages data with language & code\'
  task init_data: :environment do
    puts \'>>> Initializing Languages Data Table\'
    ActiveRecord::Base.transaction do
      csv_path = File.expand_path(\'languages.csv\', File.dirname(__FILE__))
      csv_str = File.read(csv_path)
      csv = CSV.new(csv_str).to_a
      csv.each do |lan_set|
        lan_code = lan_set[0]
        lan_str = lan_set[1]
        Language.create!(language: lan_str, code: lan_code)
        print \'.\'
      end
    end
    puts \'\'
    puts \'>>> Languages Database Table Initialization Completed\'
  end
end

Snippet below is a partial of languages.csv file,

aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...


回答7:

Use this gem: https://rubygems.org/gems/active_record_importer

class Moulding < ActiveRecord::Base
  acts_as_importable
end

Then you may now use:

Moulding.import!(file: File.open(PATH_TO_FILE))

Just be sure to that your headers match the column names of your table



回答8:

The better way is to include it in a rake task. Create import.rake file inside /lib/tasks/ and put this code to that file.

desc \"Imports a CSV file into an ActiveRecord table\"
task :csv_model_import, [:filename, :model] => [:environment] do |task,args|
  lines = File.new(args[:filename], \"r:ISO-8859-1\").readlines
  header = lines.shift.strip
  keys = header.split(\',\')
  lines.each do |line|
    values = line.strip.split(\',\')
    attributes = Hash[keys.zip values]
    Module.const_get(args[:model]).create(attributes)
  end
end

After that run this command in your terminal rake csv_model_import[file.csv,Name_of_the_Model]



回答9:

I know it\'s old question but it still in first 10 links in google.

It is not very efficient to save rows one-by-one because it cause database call in the loop and you better avoid that, especially when you need to insert huge portions of data.

It\'s better (and significantly faster) to use batch insert.

INSERT INTO `mouldings` (suppliers_code, name, cost)
VALUES
    (\'s1\', \'supplier1\', 1.111), 
    (\'s2\', \'supplier2\', \'2.222\')

You can build such a query manually and than do Model.connection.execute(RAW SQL STRING) (not recomended) or use gem activerecord-import (it was first released on 11 Aug 2010) in this case just put data in array rows and call Model.import rows

refer to gem docs for details



回答10:

It\'s better to use CSV::Table and use String.encode(universal_newline: true). It converting CRLF and CR to LF



回答11:

If you want to Use SmartCSV

all_data = SmarterCSV.process(
             params[:file].tempfile, 
             { 
               :col_sep => \"\\t\", 
               :row_sep => \"\\n\" 
             }
           )

This represents tab delimited data in each row \"\\t\" with rows separated by new lines \"\\n\"