SQLAlchemy Text Matching data inside JSON field wi

I have table called Message which has column content of type JSON

My Model definition as requested is as following

class Message(db.Model):
    ...
    content = db.Column(JSON)
    ...

now I perform text matching for simple search with the following query

Message.query.filter(Message.content['summary'].cast(Unicode).match(term))

it works nicely, until term has a utf-8 character, like German umlauts or French accents.

what would be the solution here?

also keep in mind I'm using Python 3

标签： python postgresql python-3.x utf-8 sqlalchemy

1条回答

我只想做你的唯一

2楼-- · 2019-04-14 14:13

The problem lies in cast(Unicode) of a Postgresql json column. It simply CASTs the json to the text type underlying SQLAlchemy's Unicode, in case of Postgresql VARCHAR. In other words it produces a string representation of JSON instead of extracting the text content. If your input contained escaped unicode codepoints, they're output as is in this case. Given a simple Test model with a json column data:

In [7]: t = Test(data={'summary': 'Tämä on summary.'})

In [8]: session.add(t)

In [9]: session.commit()

In [11]: session.query(Test.data['summary'].cast(Unicode)).scalar()
Out[11]: '"T\\u00e4m\\u00e4 on summary."'

It should be evident why a match with unescaped unicode characters will fail. The correct way to extract the text content, unescaping escaped unicode, is to use astext, which uses the ->> operator in Postgresql:

In [13]: session.query(Test.data['summary'].astext).scalar()
Out[13]: 'Tämä on summary.'

Quoting the JSON functions and operators documentation:

Note: Many of these functions and operators will convert Unicode escapes in JSON strings to the appropriate single character. This is a non-issue if the input is type jsonb, because the conversion was already done; but for json input, this may result in throwing an error, as noted in Section 8.14.

So in your case:

Message.query.\
    filter(Message.content['summary'].astext.match(term))

Note that this only applies to json type, not jsonb, because the json type does not convert unicode escapes on input. jsonb on the other hand converts all unicode escapes to equivalent ASCII or UTF-8 characters for storage. If our Test model contained a second column data2 jsonb, with the exact same input, then the result would be:

In [11]: session.query(Test.data['summary'].cast(Unicode),
    ...:               Test.data2['summary'].cast(Unicode)).first()
Out[11]: ('"T\\u00e4m\\u00e4 on summary."', '"Tämä on summary"')

Still, you should use astext, if you want text instead of a string representation of JSON.

0人赞添加讨论(0) 举报

SQLAlchemy Text Matching data inside JSON field wi

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间