I am building a query using JPA Criteria API. When I created two restriction predicates using javax.persistence.criteria.Path#in(Collection<?>)
method the generated SQL query was a little bit different than I excpected.
The first predicate which was build over int
attribute produced SQL with all elements of parameter collection inlined: in (10, 20, 30)
.
The second predicate which was build over String
attribute produced parametrized SQL: in (?, ?, ?)
.
Let me show:
Entity:
@Entity
public class A {
@Id
private Integer id;
private int intAttr;
private String stringAttr;
//getter/setters
}
Query:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<A> q = cb.createQuery(A.class);
Root<A> root = q.from(A.class);
q.where(
root.get("intAttr").in(Arrays.asList(10, 20, 30)),
root.get("stringAttr").in(Arrays.asList("a", "b", "c"))
);
entityManager.createQuery(q).getResultList();
Log:
select
a0_.id as id1_0_,
a0_.intAttr as intAttr2_0_,
a0_.stringAttr as stringAt3_0_
from
A a0_
where
(
a0_.intAttr in (
10 , 20 , 30
)
)
and (
a0_.stringAttr in (
? , ? , ?
)
)
org.hibernate.type.descriptor.sql.BasicBinder - binding parameter [1] as [VARCHAR] - [a]
org.hibernate.type.descriptor.sql.BasicBinder - binding parameter [2] as [VARCHAR] - [b]
org.hibernate.type.descriptor.sql.BasicBinder - binding parameter [3] as [VARCHAR] - [c]
My questions:
- Why are elements of Integer list inlined directly to sql and elements of String list are handled as prepared statement parameters?
- Is this feature Hibernate specific or is it guaranteed by JPA?
- From DB perspective which of two should be preferred?
- Is this int-yes string-no inlining somehow related to sql injection?
- Is this somehow related to limitation of number of values in sql IN clause the RDMBS can process?
- How to write a criteria query which will handle Integer parameter list the same way as String parameter list.
Why are strings bound and numeric literals not bound?
One should always do parameter binding for strings (as opposed to putting the literal in the query) to avoid SQL injection.
However, the real question, is why to insert the literal directly into the query instead of using binding. The original reason was:
So iirc the issue that lead me to use literals here had to do with
scale and operations. Meaning (again, iirc) some databases needed to
know type information to be able to properly handle something like ...
? + ? ..., etc. So the choice was to either wrap all such params in
CAST function calls and hope/pray the db implemented a proper CAST
function or use literals. In the end I opted for the literal route
because, well, thats what the user asked for up front. Wrapping in
function calls will limit the databases ability to leverage indexes in
quite a few databases.
Which is better for the db?
It depends on the database and the query and likely won't make a huge difference. For example, Oracle can only do certain partitions when the value is a literal, other databases can only do certain optimizations when the value is a bound parameter. If it becomes an issue (e.g. you profile it and you know that is what is slowing you down) then just switch to the other method.
Is this in the JPA spec?
No.
Is this related to the # of values allowed in an in statement?
No.
Can I have a numeric literal bound instead of inserted directly into the query
Yes, but it is somewhat verbose.
CriteriaBuilder cb = getEntityManager().getCriteriaBuilder();
CriteriaQuery<Foo> query = cb.createQuery(Foo.class);
Root<Foo> root = query.from(Foo.class);
ParameterExpression<Long> paramOne = cb.parameter(Long.class);
Predicate versionPredicate = root.get("bar").in(paramOne);
query.select(root).where(versionPredicate);
TypedQuery<Foo> typedQuery = getEntityManager().createQuery(query);
typedQuery.setParameter(paramOne, 1L);
That will use parameter binding for the long. It is only one parameter but one could easily extrapolate from here for multiple parameters and helper methods could clean things up.
References:
Most of the reasoning is explained and discussed in HHH-6280.
The particular method in question that does this rendering is LiteralExpression.render.
I fully agree with Niels that the string parameters should not be inlined in order to prevent SQL injection.
But I checked it with DataNucleus 4.1.9 and Derby db, and to my surprise the log shows inlining for Strings too. Also it shows that DataNucleus implements the "IN" criteria query with combinations of "OR" conditions. Probably this is inferior to Hibernate, and probably a security risk. Example for the possible dangers of ever-higher level abstractions. You can not be too cautious :-).
The log:
Begin compiling prepared statement:
SELECT 'pack.entities.I' AS NUCLEUS_TYPE,DN_THIS.ID,DN_THIS.INTATTR,DN_THIS.STRINGATTR FROM I DN_THIS
WHERE (((DN_THIS.INTATTR = 10) OR (DN_THIS.INTATTR = 20)) OR (DN_THIS.INTATTR = 30))
AND (((DN_THIS.STRINGATTR = 'a') OR (DN_THIS.STRINGATTR = 'b')) OR (DN_THIS.STRINGATTR = 'c')) :End prepared statement
Tue Apr 26 15:46:01 CEST 2016 Thread[DRDAConnThread_3,5,derby.daemons]
End compiling prepared statement:
SELECT 'pack.entities.I' AS NUCLEUS_TYPE,DN_THIS.ID,DN_THIS.INTATTR,DN_THIS.STRINGATTR FROM I DN_THIS
WHERE (((DN_THIS.INTATTR = 10) OR (DN_THIS.INTATTR = 20)) OR (DN_THIS.INTATTR = 30))
AND (((DN_THIS.STRINGATTR = 'a') OR (DN_THIS.STRINGATTR = 'b')) OR (DN_THIS.STRINGATTR = 'c')) :End prepared statement
Tue Apr 26 15:46:01 CEST 2016 Thread[DRDAConnThread_3,5,derby.daemons] Executing prepared statement:
SELECT 'pack.entities.I' AS NUCLEUS_TYPE,DN_THIS.ID,DN_THIS.INTATTR,DN_THIS.STRINGATTR FROM I DN_THIS
WHERE (((DN_THIS.INTATTR = 10) OR (DN_THIS.INTATTR = 20)) OR (DN_THIS.INTATTR = 30))
AND (((DN_THIS.STRINGATTR = 'a') OR (DN_THIS.STRINGATTR = 'b')) OR (DN_THIS.STRINGATTR = 'c')) :End prepared statement
In issue HHH-9576 a new parameter was added to fix this issue, applicable since version 5.2.12 (?)
<property name="hibernate.criteria.literal_handling_mode" value="bind"/>
If you use this parameter you don't need the verbose solution proposed by Pace anymore.
From hibernate documentation of literal_handling_mode:
This enum defines how literals are handled by JPA Criteria. By default (AUTO), Criteria queries uses bind parameters for any literal that is not a numeric value. However, to increase the likelihood of JDBC statement caching, you might want to use bind parameters for numeric values too. The BIND mode will use bind variables for any literal value. The INLINE mode will inline literal values as-is. To prevent SQL injection, never use INLINE with String variables. Always use constants with the INLINE mode.