Replacing a full ORM (JPA/Hibernate) by a lighter

2019-03-07 16:27发布

I'm developing a new Java web application and I'm exploring new ways (new for me!) to persist the data. I mostly have experience with JPA & Hibernate but, except for simple cases, I think this kind of full ORM can become quite complex. Plus, I don't like working with them that much. I'm looking for a new solution, probably closer to SQL.

The solutions I'm currently investigating :

  • MyBatis
  • JOOQ
  • Plain SQL/JDBC, potentially with DbUtils or some other basic utility libraries.

But there are two use cases I'm worrying about with those solutions, compared to Hibernate. I'd like to know what are the recommended patterns for those use cases.


Use Case 1 - Fetching an entity and accessing some of its associated children and grandchildren entities.

  • Let's say I have a Person entity.
    • This Person has an associated Address entity.
      • This Address has an associated City entity.
        • This City entity has a name property.

The full path to access the name of the city, starting from the person entity, would be :

person.address.city.name

Now, let's say I load the Person entity from a PersonService, with this method :

public Person findPersonById(long id)
{
    // ...
}

Using Hibernate, the entities associated to the Person could be lazily loaded, on demand, so it would be possible to access person.address.city.name and be sure I have access to this property (as long as all the entities in that chain are not nullable).

But using anyone of the 3 solutions I'm investigating, it's more complicated. With those solutions, what are the recommended patterns to take care of this use case? Upfront, I see 3 possible patterns:

  1. All the required associated children and grandchildren entities could be eagerly loaded by the SQL query used.

    But the issue I see with this solution is that there may be some other code that needs to access other entities/properties paths from the Person entity. For example, maybe some code will need access to person.job.salary.currency. If I want to reuse the findPersonById() method I already have, the SQL query will then need to load more information! Not only the associated address->city entity but also the associated job->salary entity.

    Now what if there are 10 other places that need to access other information starting from the person entity? Should I always eagerly load all the potentially required information? Or maybe have 12 different service methods to load a person entity? :

    findPersonById_simple(long id)
    
    findPersonById_withAdressCity(long id)
    
    findPersonById_withJob(long id)
    
    findPersonById_withAdressCityAndJob(long id)
    
    ...
    

    But then everytime I would use a Person entity, I would have to know what has been loaded with it and what hasn't... It could be quite cumbersome, right?

  2. In the getAddress() getter method of the Person entity, could there be a check to see if the address has already been loaded and, if not, lazily load it? It this a frequently used pattern in real life applications?

  3. Are there other patterns that can be used to make sure I can access the entities/properties I need from a loaded Model?


Use Case 2 - Saving an entity and making sure its associated and modified entities are also saved.

I want to be able to save a Person entity using this PersonService's method :

public void savePerson(Person person)
{
    // ...
}

If I have a Person entity and I change person.address.city.name to something else, how can I make sure the City entity modifications will be persisted when I save the Person? Using Hibernate, it can be easy to cascade the save operation to the associated entities. What about the solutions I'm investigating?

  1. Should I use some kind of dirty flag to know what associated entities also have to be saved when I save the person?

  2. Are there any other known patterns useful to deal with this use case?


Update : There is a discussion about this question on the JOOQ forum.

7条回答
等我变得足够好
2楼-- · 2019-03-07 17:03

Persistence Approaches

The spectrum of solutions from simple/basic to sophisticated/rich is:

  • SQL/JDBC - hard-code SQL within objects
  • SQL-Based Framework (e.g. jOOQ, MyBatis) - Active Record Pattern (separate general object represents row data and handles SQL)
  • ORM-Framework (e.g. Hibernate, EclipseLink, DataNucleus) - Data Mapper Pattern (Object per Entity) plus Unit Of Work Pattern (Persistence Context / Entity Manager)

You seek to implement one of the first two levels. That means shifting focus away from the object model towards SQL. But your question asks for Use Cases involving the object model being mapped to SQL (i.e. ORM behaviour). You wish to add functionality from the third level against functionality from one of the first two levels.

We could try to implement this behaviour within an Active Record. But this would need rich metadata to be attached to each Active Record instance - the actual entity involved, it's relationships to other entities, the lazy-loading settings, the cascade update settings. This would make it effectively a mapped entity object in hiding. Besides, jOOQ and MyBatis don't do this for Use Cases 1 & 2.

How To Achieve Your Requests?

Implement narrow ORM behaviour directly into your objects, as a small custom layer on top of your framework or raw SQL/JDBC.

Use Case 1: Store metadata for each entity object relationship: (i) whether relationship should be lazy-loaded (class-level) and (ii) whether lazy-load has occured (object-level). Then in the getter method, use these flags to determine whether to do lazy-load and actually do it.

Use Case 2: Similar to Use Case 1 - do it yourself. Store a dirty flag within each entity. Against each entity object relationship, store a flag describing whether the save should be cascaded. Then when an entity is saved, recursively visit each "save cascade" relationship. Write any dirty entities discovered.

Patterns

Pros

  • Calls to SQL framework are simple.

Cons

  • Your objects become more complicated. Take a look at the code for Use Cases 1 & 2 within an open source product. It's not trivial
  • Lack of support for Object Model. If you're using object model in java for your domain, it will have lesser support for data operations.
  • Risk of scope creep & anti-patterns: the above missing functionality is the tip of the iceberg. May end up doing some Reinvent the Wheel & Infrastructure Bloat in Business Logic.
  • Education and Maintenance on non-standard solution. JPA, JDBC and SQL are standards. Other frameworks or custom solutions aren't.

Worthwhile???

This solution works well if you have fairly simple data handling requirements and a data model with a smaller number of entities:

  • If so, great! Do above.
  • If not, this solution's a poor fit and represents false savings in effort - i.e. will end up taking longer and being more complicated than using an ORM. In that case, have another look at JPA - it might be simpler than you think and it supports ORM for CRUD plus raw SQL for complicated queries :-).
查看更多
男人必须洒脱
3楼-- · 2019-03-07 17:05

It sounds like the core issue in the question is with the relational model itself. From what's been described a graph database will map the problem domain very neatly. As an alternative, document stores are another way of approaching the problem because, while the impedance is still there, documents in general are simpler to reason about than sets. Of course, any approach is going to have its own quirks to attend to.

查看更多
Deceive 欺骗
4楼-- · 2019-03-07 17:06

This kind of problem is typical when not using a real ORM, and there is no silver bullet. A simple design approach that worked for me for a (not very big ) webapp with iBatis (myBatis), is to use two layers for persistence:

  • A dumb low-level layer: each table has its Java class (POJO or DTO), with fields that maps directly to the table columns. Say we have a PERSON table with a ADDRESS_ID field that points to an ADRESS table; then, we'd have a PersonDb class, with just a addressId (integer) field; we have no personDb.getAdress() method, just the plain personDb.getAdressId(). These Java classes are, then, quite dumb (they don't know about persistence or about related classes). A corresponding PersonDao class knows how to load/persist this object. This layer is easy to create and maintain with tools like iBatis + iBator (or MyBatis + MYBatisGenerator).

  • A higher level layer that contains rich domain objects: each of these is typically a graph of the above POJOs. These classes have also the intelligence for loading/saving the graph (perhaps lazily, perhaps with some dirty flags), by calling the respective DAOs. The important thing, however, is that these rich domain objects do not map one-to-one to the POJO objects (or DB tables), but rather with domain use cases. The "size" of each graph is determined (it doesn't grow indefinitely), and is used from the outside like a particular class. So, it's not that you have one rich Person class (with some indeterminate graph of related objects) that is used is several use cases or service methods; instead, you have several rich classes, PersonWithAddreses, PersonWithAllData... each one wraps a particular well-limited graph, with its own persistence logic. This might seem inefficient or clumsy, and in some context it might be, but it happens often that the use cases when you need to save a full graph of objects are actually limited.

  • Additionally, for things like tabular reports, (specific SELECTS that return a bunch of columns to be displayed) you'd not use the above, but straight and dumb POJO's (perhaps even Maps)

See my related answer here

查看更多
女痞
5楼-- · 2019-03-07 17:06

The last ten years I was using JDBC, EJB entity beans, Hibernate, GORM and finally JPA (in this order). For my current project I have returned to using plain JDBC, because the emphasis is on performance. Therefore I wanted

  • Full control on the generation of SQL statements: To be able to pass a statement to DB performance tuners, and put the optimized version back in the program
  • Full control on the number of SQL statements which are sent to the database
  • Stored procedures (triggers), stored functions (for complex calculations in SQL queries)
  • To be able to use all available SQL features without restrictions (recursive queries with CTEs, window aggregate functions, ...)

The data model is defined in a data dictionary; using a model driven approach a generator creates helper classes, DDL scripts etc. Most operations on the database are read-only; only few use cases write.

Question 1: Fetching children

The system is built on a use cases, and we have one dedicated SQL statement to get all data for a given use case/request. Some of the SQL statments are bigger than 20kb, they join, calculate using stored functions written in Java/Scala, sort, paginate etc. in a way that the result is directly mapped into a data transfer object which in turn is fed into the view (no further processing in the application layer). As a consequence the data transfer object is use case specific as well. It only contains the data for the given use case (nothing more, nothing less).

As the result set is already "fully joined" there is no need for lazy/eager fetching etc. The data transfer object is complete. A cache is not needed (the database cache is fine); the exception: If the result set is large (around 50 000 rows), the data transfer object is used as a cache value.

Question 2: Saving

After the controller has merged back the changes from the GUI, again there is a specific object which holds the data: Basically the rows with a state (new, deleted, modified, ...) in a hierarchy. It's a manual iteration to save the data down the hierarchy: New or modified data is persisted using some helper classes with generate SQL insert or update commands. As for deleted items, this is optimized into cascaded deletes (PostgreSql). If multiple rows are to be deleted, this is optimized into a single delete ... where id in ... statement as well.

Again this is use case specific, so it's not dealing with a general approch. It needs more lines of code, but these are the lines which contain the optimizations.

The experiences so far

  • One should not underestimate the effort to learn Hibernate or JPA. One should consider the time spent in configuring the caches, cache invalidation in a cluster, eager/lazy fetching, and tuning as well. Migrating to another Hibernate major version is not just a recompilation.
  • One should not overestimate the effort to build an application without ORM.
  • It's simpler to use SQL directly - being close to SQL (like HQL, JPQL) is not the same, especially if you talk to your DB performance tuner
  • SQL servers are incredibly fast when running long and complex queries, especially if combined with stored functions written in Scala
  • Even with the use case specific SQL statements, the code size is smaller: "Fully joined" result sets save lots of lines in the application layer.

Related information:

Update: Interfaces

If there is a Person entity in the logical data model, there is a class to persist a Person entity (for CRUD operations), just like a JPA entity.

But the clou is that there is no single Person entity from the use case / query / business logic perspective: Each service method defines its own notion of a Person, and it only contains exactly the values required by that use case. Nothing more, nothing less. We use Scala, so the definition and usage of many small classes is very efficient (no setter/getter boiler plate code required).

Example:

class GlobalPersonUtils {
  public X doSomeProcessingOnAPerson(
    Person person, PersonAddress personAddress, PersonJob personJob, 
    Set<Person> personFriends, ...)
}

is replaced by

class Person {
  List addresses = ...
  public X doSomeProcessingOnAPerson(...)
}

class Dto {
  List persons = ...
  public X doSomeProcessingOnAllPersons()
  public List getPersons()
}

using use case specific Persons, Adresses etc: In this case, the Person already aggregates all relevant data. Requires more classes, but there is no need to pass around JPA entities.

Note that this processing is read-only, the results are used by the view. Example: Get distinct City instances from a person's list.

If data is changed, this is another use case: If the city of a person is changed, this is processed by a different service method, and the person is fetched again from the database.

查看更多
老娘就宠你
6楼-- · 2019-03-07 17:14

The answer to your many questions is simple. You have three choices.

  1. Use one of the three SQL-centric tools you've mentioned (MyBatis, jOOQ, DbUtils). This means you should stop thinking in terms of your OO domain model and Object-Relational Mapping (i.e. entities and lazy loading). SQL is about relational data and RBDMS are pretty good at calculating execution plans for "eager fetching" the result of several joins. Usually, there isn't even a lot of need for premature caching, and if you do need to cache the occasional data element, you can still use something like EhCache

  2. Don't use any of those SQL-centric tools and stick with Hibernate / JPA. Because even if you said you don't like Hibernate, you're "thinking Hibernate". Hibernate is very good at persisting object graphs to the database. None of those tools can be forced to work like Hibernate, because their mission is something else. Their mission is to operate on SQL.

  3. Go an entirely different way and choose not to use a relational data model. Other data models (graphs for instance) may better suit you. I'm putting this as a third option, because you might not actually have that choice, and I don't have much personal experience with alternative models.

Note, your question wasn't specifically about jOOQ. Nonetheless, with jOOQ, you can externalise the mapping of flat query results (produced from joined table sources) to object graphs through external tools such as ModelMapper. There's an intersting ongoing thread about such an integration on the ModelMapper User Group.

(disclaimer: I work for the company behind jOOQ)

查看更多
成全新的幸福
7楼-- · 2019-03-07 17:24

Since you want a simple and lightweight library and to use SQL, I can suggest take a look at fjorm. It allows you to use POJOs and CRUD operations without much effort.

Disclaimer: I'm an author of the project.

查看更多
登录 后发表回答