I have table called Message which has column content of type JSON
My Model definition as requested is as following
class Message(db.Model):
...
content = db.Column(JSON)
...
now I perform text matching for simple search with the following query
Message.query.filter(Message.content['summary'].cast(Unicode).match(term))
it works nicely, until term has a utf-8 character, like German umlauts or French accents.
what would be the solution here?
also keep in mind I'm using Python 3
The problem lies in
cast(Unicode)
of a Postgresqljson
column. It simply CASTs thejson
to the text type underlying SQLAlchemy'sUnicode
, in case of PostgresqlVARCHAR
. In other words it produces a string representation of JSON instead of extracting the text content. If your input contained escaped unicode codepoints, they're output as is in this case. Given a simpleTest
model with ajson
column data:It should be evident why a match with unescaped unicode characters will fail. The correct way to extract the text content, unescaping escaped unicode, is to use
astext
, which uses the->>
operator in Postgresql:Quoting the JSON functions and operators documentation:
So in your case:
Note that this only applies to
json
type, notjsonb
, because thejson
type does not convert unicode escapes on input.jsonb
on the other hand converts all unicode escapes to equivalent ASCII or UTF-8 characters for storage. If ourTest
model contained a second columndata2 jsonb
, with the exact same input, then the result would be:Still, you should use
astext
, if you want text instead of a string representation of JSON.