Rails 3 - How to handle PG Error incomplete multib

2019-02-19 10:49发布

In a Rails 3.2 app (Ruby 1.9.2) I am getting the following errors

A PGError occurred in mobile_users#update:

incomplete multibyte character

These are Postgres errors bu I get similar SQLIte error when testing in dev and test modes

The params that cause this error are (auth token deliberately omitted)

  * Parameters: {"mobile_user"=>{"quiz_id"=>"1", "auth"=>"xxx", "name"=>"Joaqu\xEDn"}, "action"=>"update", "controller"=>"mobile_users", "id"=>"1", "format"=>"mobile"}

This is coming in as a JSON HTTP Put request and the update action dealing with this is as follows

  # PUT /mobile_users/1
  # PUT /mobile_users/1.xml
  def update
    @mobile_user = current_mobile_user
    @mobile_user.attributes = params[:mobile_user]

    respond_to do |format|
      if @mobile_user.save
        format.html { redirect_to(@mobile_user, :notice => 'Mobile user was successfully updated.') }
        format.json  { head :ok }
        format.mobile  { head :ok }
        format.xml  { head :ok }
      else
        format.html { render :action => "edit" }
        format.json  { render :json => @mobile_user.errors, :status => :unprocessable_entity }
        format.mobile  { render :json => @mobile_user.errors, :status => :unprocessable_entity }
        format.xml  { render :xml => @mobile_user.errors, :status => :unprocessable_entity }
      end
    end

  end

The offending string is in the above params is "Joaqu\xEDn" which is perfectly valid. the thing is that I need to handle all character sets from any language.

I assume I would need to use the iconv library but in order to do that I would need to detect the character set to convert to UTF8 from and I haven't a clue how to do this.

I am also getting invalid byte sequence in UTF-8 for "name"=>"p\xEDa "

2条回答
Deceive 欺骗
2楼-- · 2019-02-19 11:20

This:

"Joaqu\xEDn"

is the ISO-8859-1 encoded version of "Joaquín" so it is not valid UTF-8 and your databases are right to complain about it. If possible, fix your mobile clients to use UTF-8 in the JSON; if you can't do that then you can fix the encoding with this:

params[:mobile_user][:name].force_encoding('iso-8859-1').encode!('utf-8')

on the server. The problem with fixing it on the server is that you have to guess what the incoming encoding is and your guess might not be correct. There is no way to reliably guess the encoding for a particular string, there is rchardet but it doesn't work with recent versions of Ruby and it appears to have been abandoned; you might be able to fix this gem to work with modern Ruby. There are a few other guessing libraries but they all seem to be have been abandoned as well.

JSON text is always, by definition, Unicode and UTF-8 encoded by default:

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

Any clients that are sending you JSON that isn't in UTF-8 is IMO broken because almost everything will assume that JSON will be UTF-8. Of course, there might be an encoding header somewhere that specifies ISO 8859-1 or maybe the headers say UTF-8 even though it is ISO 8859-1.

查看更多
爷的心禁止访问
3楼-- · 2019-02-19 11:21

I had same problem with user-generated data in parsing files and solved it this way:

require 'iconv'
....
line = Iconv.conv('UTF-8//IGNORE', 'UTF-8', line)
#now variable line has valid utf-8 data

You can try overriding 'name' setter, so that it stripped non-utf8 characters:

def name=(name)
  write_attribute(:name, Iconv.conv('UTF-8//IGNORE', 'UTF-8', name))
end
查看更多
登录 后发表回答