Can't store UTF-8 in RDS despite setting up ne

2019-03-21 14:00发布

问题:

I'm setting up a new instance of a Rails(2.3.5) app on Heroku using Amazon RDS as the database. I'd like to use UTF-8 for everything. Since RDS isn't UTF-8 by default, I set up a new Parameter Group and switched the database to use that one, basically per this. Seems to have worked:

SHOW VARIABLES LIKE '%character%';

character_set_client        utf8
character_set_connection    utf8
character_set_database    utf8
character_set_filesystem    binary
character_set_results      utf8
character_set_server        utf8
character_set_system        utf8
character_sets_dir       /rdsdbbin/mysql-5.1.50.R3/share/mysql/charsets/

Furthermore, I've successfully setup Heroku to use the RDS database. After rake db:migrate, everything looks good:

CREATE TABLE `comments` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `commentable_id` int(11) DEFAULT NULL,
  `parent_id` int(11) DEFAULT NULL,
  `content` text COLLATE utf8_unicode_ci,
  `child_count` int(11) DEFAULT '0',
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `commentable_id` (`commentable_id`),
  KEY `index_comments_on_community_id` (`community_id`),
  KEY `parent_id` (`parent_id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

In the markup, I've included:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Also, I've set:

production:
  encoding: utf8
  collation: utf8_general_ci

...in the database.yml, though I'm not very confident that anything is being done to honor any of those settings in this case, as Heroku seems to be doing its own config when connecting to RDS.

Now, I enter a comment through the form in the app: "Úbe® ƒåiL", but in the database I've got "Úbe® Æ’Ã¥iL"

It looks fine when Rails loads it back out of the database and it is rendered to the page, so whatever it is doing one way, it's undoing the other way. If I look at the RDS database in Sequel Pro, it looks fine if I set the encoding to "UTF-8 Unicode via Latin 1". So it seems Latin-1 is sneaking in there somewhere.

Everything works in development, when connecting to a local MySQL database.

Somebody must have done this before, right? What am I missing?

回答1:

There's a simpler way. You can specify the encoding in your DB connection string. Edit the RDS add-on, and append ?encoding=utf8&collation=utf8_general_ci

Worked well for me, no changes to the project.

e.g.:

  mysql://user:pass@abc.rds.amazonaws.com/my-db?encoding=utf8&collation=utf8_general_ci

Reference: http://blog.arvidandersson.se/2011/09/27/setting-activerecord-connection-to-utf8-on-heroku



回答2:

Ultimately I solved my problem by adding the following in the Rails::Initializer.run block in the environment.rb

class Rails::Configuration
  def database_configuration
    # Heroku overwrites the database.yml file without setting any encoding when deploying to outside server (like Amazon RDS)      
    require 'erb'
    YAML::load(ERB.new(IO.read(database_configuration_file)).result).each_value {|env| env.merge!({"encoding" => "utf8", "collation" => "utf8_general_ci"}) }
  end
end

Heroku overwrites the database.yml file and doesn't include any encoding or coalition settings. By hacking it thusly, the correct settings are always merged in before the database connection is made.