I am having trouble writing and reading back special characters like the Euro-sign (€) into LOB String properties in PostgreSQL 8.4 with Hibernate 3.6.10.
What I know is that PostgreSQL provides two distinct ways to store large character objects in a column of a table. They can be stored either directly into that table column or indirectly in a separate table (it's actually called pg_largeobject). In the latter case, the column holds a reference (OID) to the row in pg_largeobject.
The default behaviour in Hibernate 3.6.10 is the indirect OID approach. However, it is possible to add an extra annotation @org.hibernate.annotations.Type(type="org.hibernate.type.TextType") to the Lob property to get the direct storage behaviour.
Both approaches work fine, except for the moment that I want to work with special characters like the Euro sign (€). In that case the direct storage mechanism keeps working, but the indirect storage mechanism breaks.
I'd like to demonstrate that with an example. I created a test entity with 2 @Lob properties. One follows the direct storage principle, the other the indirect storage:
@Basic
@Lob
@Column(name = "CLOB_VALUE_INDIRECT_STORAGE", length = 2147483647)
public String getClobValueIndirectStorage()
and
@Basic
@Lob
@org.hibernate.annotations.Type(type="org.hibernate.type.TextType")
@Column(name = "CLOB_VALUE_DIRECT_STORAGE", length = 2147483647)
public String getClobValueDirectStorage()
If I create an entity, populate both properties with the Euro sign and then persist it towards the database I see the following when I do a SELECT I see
id | clob_value_direct_storage | clob_value_indirect_storage
----+---------------------------+----------------------------
6 | € | 910579
If I then query the table pg_largeobject I see:
loid | pageno | data
--------+--------+------
910579 | 0 | \254
The 'data' column of pg_largeobject is of type bytea, which means that the information is stored as raw bytes. The expression '\254' represents one single byte and in UTF-8 represents the character '¬'. This is exactly the value that I get back when I load the entity back from the database.
The Euro sign in UTF-8 consists of 3 bytes, so I would have expected the 'data' column to have 3 bytes and not 1.
This does not only occur for the Euro sign, but for many special characters. Is this a problem in Hibernate? Or the JDBC driver? Is there a way I can tweak this behaviour?
Thanks in advance,
Kind regards,
Franck de Bruijn