Hibernate ORM, jOOQ, HikariCP, Transactions, and Spring: An SQL / CQRS Tutorial

When it comes to interacting with relational databases, citizens of the Java world tend to be in one of two camps.  Lately, the rift continues to widen.

  1. Use Hibernate ORM for all CRUD operations, all queries, and anything else that would otherwise touch SQL.  SQL is icky.  Abstract all the things.
  2. I got 99 problems and Hibernate ORM caused them all.  Es el Diablo.  Avoid it, no matter what.  Use nothing but pure SQL, as God himself intended.

Folks miss an important point: ORM was never meant to be used for everything!  It is a tool to be used where applicable.  Can it be used as a complete abstraction or replacement for everything involving the database, shielding developers from ever having to learn SQL?  For the most part, yes.  Should it be?  Definitely no.  It often makes sense to combine ORM with other SQL approaches, marrying the best of both worlds.

NeighborLink

Throughout this tutorial, we’ll refer to an example model that’s specific to NeighborLink, a network of nonprofit organizations that provide volunteer home improvement to neighbors in need.  Operations holistically revolve around a web platform, involving homeowners, volunteers, projects, donations, events, communities, notifications, skills, tools, and a large set of associations between them all.

Command Query Responsibility Segregation

The combination of multiple DB tools is one of the core concepts within Command Query Responsibility Segregation (CQRS).  In a nutshell, the design pattern utilizes separate models, approaches, and frameworks for CRUD vs. large-scale fetches.

More specifically, Hibernate ORM excels at managing complex CRUD operations for associated objects.  As an example, a homeowner can visit the website and request assistance.  That eventually causes a set of associated user, project, skill, funding request, and notification objects needing to be written to the database.  To do so with straight SQL is painful, especially considering the large network of foreign keys and join tables.  Hibernate ORM persisting and cascading the entities makes a lot of sense.  Obviously, it’s a similar story for updates.  Additionally, when viewing a single project’s detail page, we have an easy way of obtaining the Project and all of its associations in a few simple steps, rather than managing all the individual queries and building up the objects by-hand.

However, we also support an advanced search capability that queries every project in the system.  The results page only needs to display a small subset each Project’s data.  Even with proper laziness configured on entity associations, proper fetching strategies, and other tactics, executing the queries with HQL/JPQL/Criteria can result in a less-than-stellar quantity of over-reaching SQL hits.  There are certainly ways to further improve the ORM-driven queries.  However, I suggest that CQRS is a powerful idea and the hybrid approach is helpful.

ORM for Complex CRUD, SQL for Large-Scale Fetches

So, how exactly should we split up the approach?  Here’s what I generally do.  Again, in a nutshell, use ORM for complex CRUD — that is precisely why it was designed and developed.  For large-scale fetches, rely on SQL.

  • CREATE: Generally, my inserts use Hibernate.  Since you’re typically building up a set of objects and associations between them, it makes sense to do so with entity classes and allow the ORM to persist them all.
  • UPDATE: More often than not, I’ll use Hibernate for updates as well, since edits frequently happen on the entities themselves (retrieve an entity, modify its fields, and re-persist it).  However, batch updates are a toss-up.  If you’re only changing a handful of fields within a single table, both SQL and HQL/JPQL work well and efficiently.
  • DELETE: Again, more often than not, Hibernate excels here.  Deleting a row in a table frequently requires additional cleanup of associations (removing the associated row(s) in a join table, nulling foreign keys, etc.).  With ORM’s association and “cascading” support, this ends up being fairly trivial.  Most RDBMS vendors support some notion of cascading, but ORM’s ability to manage all of it based on the object model is helpful.  That being said, simple SQL can be hard to beat for batch deletes acting on a single or small set of tables.
  • READ (using the NeighborLink example):
    • For fetching a single Project and its associations (a “complex read”) and populating the View Project page, let ORM fetch the entity by ID and rely on your laziness and fetch strategies.  This is nearly always preferable over straight SQL, since the retrieval of the object tree is condensed to one line of code and a small set of SQL statements.
    • For querying all Projects in the system (“advanced search”), rely on SQL.  Since the association tree is deep, properly controlling the SQL generation through ORM gets complex and difficult to maintain.  Explicitly control over SQL grabbing only what you need, and nothing you don’t, is vital.

CQRS Models

One thing to note: typical CQRS discussions include different models for each separate layer.  So, complex CRUD would use the entity model layer, while SQL results would use a completely different set of simplified DTO objects.  However, the models can be approached in any number of ways:

  1. Use the explicit separation.  Entities for CRUD, simple DTOs for query results.
  2. Use the same objects for both.  Rather than maintaining multiple models, I often “cheat” and re-use the entity objects for the SQL results.
  3. Use a layered approach through class inheritance.  This can take multiple forms.  Generate a base layer of objects containing the essentials, then have both the entities and DTOs extend them.  Alternatively, use the DTOs as your base layer and have the entities extend them.  Variations come up here and there.  The main point is that attempts can be made to reduce duplication.

Statically-Typed SQL

“Ok, we get it.  Use SQL where it makes sense.  But we don’t want to maintain a bunch of SQL strings in the code.  What happens when the schema changes, or worse, we migrate to a different RDBMS vendor?”

Fantastic question.  Bonus points for worrying about that.  Of course, the other primary benefits to ORMs are portability and statically-typed interactions.

Numerous frameworks exist that can help.  One example is jOOQ, which analyzes your RDBMS schema and (at a minimum) generates 1.) a statically-typed metamodel and 2.) a useful Java DSL for building queries.  With this approach, you can explicitly define and execute SQL using nothing but Java.  If your schema changes, the generated metamodel changes with it, clearly showing issues through compilation errors.  And, like ORMs, the metamodel and DSL are aware of the underlying RDBMS vendor through “dialects”, providing automatic portability.

(One somewhat-unfortunate caveat with jOOQ: only fully open source RDBMS vendors are supported in the community version of the framework.  In order to use it with Oracle, SQL Server, HANA, Sybase, and other “enterprise” databases, you have to purchase a commercial license.)

SHOW ME THE CODE

Alright, up until now, we’ve been talking from a 30,000′ level.  Here’s a sample of how this looks, again using NeighborLink as an example.  The approach marries Hibernate ORM and jOOQ for CQRS, HikariCP as the connection pool (I can’t speak highly enough about this library), and Spring MVC wiring it all together (in addition to providing the Transactional support).

First, the relevant sections of the Maven POM.  I’m including the jOOQ Maven plugin, showing how the metamodel and DSL is generated at build time. However, I’m omitting some Hibernate specifics (bytecode enhancement, etc.) for the sake of brevity.

<build>
  <plugins>
    ...
    <plugin>
      <groupId>org.jooq</groupId>
      <artifactId>jooq-codegen-maven</artifactId>
      <version>${jooq.version}</version>
      <executions>
        <execution>
          <goals>
            <goal>generate</goal>
          </goals>
        </execution>
      </executions>
      <dependencies>
        <dependency>
          <groupId>org.jooq</groupId>
          <artifactId>jooq-codegen</artifactId>
          <version>${jooq.version}</version>
        </dependency>
      </dependencies>
      <configuration>
        <jdbc>
          <driver>com.mysql.jdbc.Driver</driver>
          <url>jdbc:mysql://[HOST]:[PORT]/[DB NAME]</url>
          <user>[USER]</user>
          <password>[PASSWORD]</password>
        </jdbc>
        <generator>
          <name>org.jooq.util.DefaultGenerator</name>
          <database>
            <name>org.jooq.util.mysql.MySQLDatabase</name>
            <includes>.*</includes>
            <excludes></excludes>
            <inputSchema>[DB NAME]</inputSchema>
          </database>
          <target>
            <packageName>org.threeriverdev.neighborlink.query</packageName>
            <directory>target/generated-sources/jooq</directory>
          </target>
        </generator>
      </configuration>
    </plugin>
    ...
</build>

<dependencies>
  <!-- Spring framework -->
  ...
  <dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-orm</artifactId>
    <version>${spring.version}</version>
  </dependency>
  ...

  <!-- Hibernate -->
  <dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-core</artifactId>
    <version>${hibernate.version}</version>
  </dependency>

  <!-- DataSource -->
  <dependency>
    <groupId>com.zaxxer</groupId>
    <artifactId>HikariCP</artifactId>
    <version>2.4.6</version>
  </dependency>
  <dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.39</version>
  </dependency>

  <!-- jOOQ -->
  <dependency>
    <groupId>org.jooq</groupId>
    <artifactId>jooq</artifactId>
    <version>${jooq.version}</version>
  </dependency>
  <dependency>
    <groupId>org.jooq</groupId>
    <artifactId>jooq-meta</artifactId>
    <version>${jooq.version}</version>
  </dependency>

  ...
</dependencies>

This is how it’s all bootstrapped. NeighborLink is a typical Spring MVC app, so here we’re making use of the Java-based @Configuration to instantiate relevant beans. For anyone with past Hibernate experience, the ORM pieces should look familiar.

One important thing to note: the jOOQ setup is making use of Spring’s TransactionAwareDataSourceProxy, meaning it’s able to plug into the TransactionManager we have defined. Also note that, in this case, that will be a HibernateTransactionManager. That’s entirely safe to use outside of Hibernate! See notes on Spring’s HibernateTransactionManager JavaDoc for more info. Essentially, it’s perfectly acceptable to share the same DataSource and HibernateTransactionManager for all DB interactions, including ORM, jOOQ, and straight JDBC!

An off-topic bit I left in is the #isProduction support. Based on the existence of an environment variable, our ConfigService knows whether or not this app instance is running in the production server or locally. We decide which database to use based on that.

@Configuration
@EnableWebMvc
@EnableTransactionManagement
@ComponentScan
public class AppConfig {

    @Autowired
    private Environment env;

    @Autowired
    private ConfigService configService;

    ...

    @Bean(destroyMethod = "close")
    public DataSource dataSource() {
        String dbHost;
        String dbPort;
        String dbName;
        String dbUsername;
        String dbPassword;
        if (configService.isProduction()) {
            // We're in a production environment.
            dbHost = "localhost";
            dbPort = "3306";
            dbName = configService.getConfig().getDatabase().getName();
            dbUsername = configService.getConfig().getDatabase().getUsername();
            dbPassword = configService.getConfig().getDatabase().getPassword();
        } else {
            // Local dev or unit test.
            dbHost = "[HOST]";
            dbPort = "[PORT]";
            dbName = "[DB NAME]";
            dbUsername = "[USER]";
            dbPassword = "[PASSWORD]";
        }

        final String url = new StringBuilder()
                .append("jdbc:mysql://")
                .append(dbHost)
                .append(":")
                .append(dbPort)
                .append("/")
                .append(dbName)
                .toString();

        HikariDataSource dataSource = new HikariDataSource();

        ...[HikariCP options]

        MysqlDataSource mysqlDataSource = new MysqlDataSource();
        mysqlDataSource.setURL(url);
        mysqlDataSource.setUser(dbUsername);
        mysqlDataSource.setPassword(dbPassword);
        ...[MySQL-specific options]
        dataSource.setDataSource(mysqlDataSource);

        return dataSource;
    }

    // HIBERNATE

    @Bean
    public LocalSessionFactoryBean sessionFactory() {
        LocalSessionFactoryBean sessionFactory = new LocalSessionFactoryBean();
        sessionFactory.setDataSource(dataSource());
        sessionFactory.setHibernateProperties(hibernateProperties());
        
        List<String> packages = new ArrayList<String>();
        packages.add("org.threeriverdev.neighborlink.entity.core");
        ...
        sessionFactory.setPackagesToScan(packages.toArray(new String[packages.size()]));

        return sessionFactory;
    }

    @Bean
    public HibernateTransactionManager transactionManager() {
        HibernateTransactionManager txManager = new HibernateTransactionManager();
        txManager.setSessionFactory(sessionFactory().getObject());
        txManager.setDataSource(dataSource());
        return txManager;
    }

    private Properties hibernateProperties() {
        final Properties properties = new Properties();
        ...[Add all "hibernate.*" specific properties -- dialect, etc.]
        return properties;
    }

    // JOOQ

    @Bean
    public TransactionAwareDataSourceProxy transactionAwareDataSource() {
        return new TransactionAwareDataSourceProxy(dataSource());
    }

    @Bean
    public DataSourceConnectionProvider connectionProvider() {
        return new DataSourceConnectionProvider(transactionAwareDataSource());
    }

    @Bean()
    public DSLContext jooq() {
        // Generated source assumes the development DB.  Dynamically change it to the production DB name.
        String dbName;
        if (configService.isProduction()) {
            // We're in a production environment.
            dbName = configService.getConfig().getDatabase().getName();
        } else {
            // Local dev or unit test.
            dbName = "[SANDBOX DB NAME]";
        }
        // The DB name used in the generated DSL (see the Maven plugin use) will not be the same as
        // the production DB, unless running in a local dev env!  withSchemata allows us
        // to override that during runtime.
        Settings settings = new Settings().withRenderMapping(
                new RenderMapping().withSchemata(new MappedSchema().withInput("[SANDBOX DB NAME]").withOutput(dbName)));
        return DSL.using(connectionProvider(), SQLDialect.MYSQL, settings);
    }
}

I’ll now show snippets from a Controller, a Service, and two DAOs, providing a glimpse into how we make use of ORM and jOOQ in a CQRS pattern.

@Controller
...
public class ProjectController {
	
    @Autowired
    protected ProjectService projectService;

    ...
    
    // COMPLEX READ
    @RequestMapping("/project/{projectId}")
    public String project(@PathVariable("projectId") int projectId, ModelMap model) {
        ...
        
        Project project = projectService.getProject(projectId);
        
        ...
        
        model.addAttribute("project", project);
        
        ...
        
        return "/view/project.jsp";
    }

    ...
    
    // COMPLEX CREATE
    @RequestMapping(value = "/project}", method = RequestMethod.POST)
    public String addProject(@ModelAttribute("addProjectWizard") AddProjectWizard wizard, ModelMap model) {
        ...
        
        Project project = new Project();
        ...[Build up the object.]
        
        projectService.insert(project);
        
        ...
        
        return "redirect:/project/" + project.getId();
    }

    ...
    
    // COMPLEX UPDATE
    @RequestMapping(value = "/project/{projectId}", method = RequestMethod.POST)
    public String editProject(@PathVariable("projectId") int projectId, @ModelAttribute("project") Project projectEdit, ModelMap model) {
        ...
        
        Project project = projectService.getProject(projectId));
        ...[Edit 'project' with fields from 'projectEdit'.  Ex: project.setDescription(projectEdit.getDescription);]
        
        projectService.update(project);
        
        ...
        
        return "redirect:/project/" + project.getId();
    }

    // LARGE-SCALE FETCH
    // NOTE: AdvancedSearchForm is a bean that was bound to a JSP/JSTL form, containing free-text search, checkboxes, etc.
    @RequestMapping(value = "/advancedSearch")
    public String projects(@ModelAttribute("advancedSearch") AdvancedSearchForm advancedSearch, ModelMap model) {
	...
        
        List<Project> projects = projectService.search(advancedSearch);
        
        ...
        
        model.addAttribute("projects", projects);
        
        ...
        
        return "/view/projects.jsp";
    }
@Service
@Transactional
...
public class ProjectService {
	
    @Autowired
    private ProjectDao projectDao;
	
    @Autowired
    private ProjectJooqDao projectJooqDao;

    ...
    
    // COMPLEX READ
    public Project getProject(int id) {
        return projectDao.getProject(id);
    }

    ...
    
    // COMPLEX CREATE
    public void insert(Project project) {
        projectDao.insert(project);
    }

    ...
    
    // COMPLEX UPDATE
    public void update(Project project) {
        project.setLastUpdated(Calendar.getInstance());
        projectDao.update(project);
    }

    ...
    
    // LARGE-SCALE FETCH
    public List<Project> search(AdvancedSearchForm advancedSearch) {
        return projectJooqDao.search(advancedSearch);
    }
@Repository
...
public class ProjectDao {
	
    @Autowired
    private SessionFactory sessionFactory;

    ...
    
    // COMPLEX READ
    public Project getProject(int id) {
        // Note that the simple #get is likely not enough, since there are likely associations that are LAZY.
        // This likely implies HQL + joins, a fetch profile, explicit use of Hibernate#initialize, or a combo.
        return (Project) sessionFactory.getCurrentSession().get(Project.class, id);
    }

    ...
    
    // COMPLEX CREATE
    public void insert(Project project) {
        sessionFactory.getCurrentSession().save(project);
    }

    ...
    
    // COMPLEX UPDATE
    public void update(Project project) {
        sessionFactory.getCurrentSession().update(project);
    }
@Repository
...
public class ProjectJooqDao {
	
    @Autowired
    private DSLContext jooq;

    ...
    
    // LARGE-SCALE FETCH
    public List<Project> search(AdvancedSearchForm advancedSearch) {
        Result<Record> results = jooq.selectDistinct()
                .from(Projects.PROJECTS)
                .join(...)
                ...
                .where(...)
                .orderBy(...)
                .fetch();

        // NOTE: Mapping the jOOQ result can be done in any number of ways!
        // jOOQ DSL includes a fetchInto(Class) object that can map results
        // into POJOs automatically, including support for a JPA annotated
        // primitive fields.  However, since Project contains enums, collections,
        // and other non-primitives, it's easier to build this manually.
        // More recent versions of jOOQ include integration with
        // frameworks like ModelMapper that can help automate and clean up the mapping!
        List<Project> projects = new ArrayList<>();
        ...[Build up each Project using the jOOQ Records.]

        return projects;
    }

Apache Camel Processors Should NEVER Be Stateful

If you have experience with Apache Camel, this one might sound a little obvious.  But, it has recently come up a few times, so it’s worth mentioning. As an example, say you have a route that iterates over paged data and does something with it, and you therefore need to keep track of the pagination. You might use something like the following:

public class FooProcessor implements Processor {
	private int page = 0;
	
	public void process(Exchange exchange) throws Exception {
		List<String> results = new ArrayList<>();
		
		while (page < 5) {
			// Run *anything* involving pagination -- SQL, REST, etc.
			String result = ...;
			results.add(result);
			page++;
		}
		
		exchange.getIn().setBody(results);
	}
}

Spot the issue? At first glance, this looks like it’d work, and it will if the route never executes multiple times at once. But…

In Camel, a Processor is a singleton bean, meaning every flow through a route hits a single instantiation of the Processor object. So, if a route is executed, then executed again before the first flow finishes, “page” will set back to 0 in the middle of the original run. Even worse, both Processor runs then increment “page”, so neither one is able to retrieve the entire result set.  You wind up with something like:

  1. Processor run #1, page = 0
  2. Processor run #1, page = 1
  3. Processor run #2, page = 0
  4. Processor run #1, page = 1
  5. Processor run #2, page = 2
  6. Processor run #1, page = 3
  7. Processor run #2, page = 4

In the end, run #1 will contain pages 0, 1, 1 & 3, while #2 has 0, 2, and 4.  In high throughput applications, or contexts where the Processor execution can take a while, this certainly spells disaster for the end results.

The example is admittedly ridiculous and could be easily fixed by moving the “page” variable into the “process” method itself, rather than at the Class level.  However, the point is that any statefulness within a Processor, Bean, etc. must be avoided!

Why Systems Integration is Hard: Ashley, Bob, Cindy, Dave, Emily, Frank, Gina, and Hank

Imagine you manage a team of corporate event planners, responsible for overseeing each event and ensuring every single detail is arranged and executed.  Your team consists of 8 individuals.  The scale of each event is sufficiently large, requiring you to task each individual.  If any of them fail, the event will be unsatisfactory (at best) or completely unsafe (at worst).

Ashley speaks English.  Ashley refuses to listen to anyone but Gina.

Bob speaks Spanish.  If you ask him to do something, you won’t hear anything from him until he’s completely finished.  So, you’re not always sure if he heard you to begin with.

Cindy speaks Portuguese.  She can understand bits and pieces of Bob’s Spanish, but regularly needs help translating anyway.  If asked to do something, she will immediately respond with “yep, no problem at all” and run off to work on it.  But, she’ll frequently hit a roadblock and stop working on the task.  You won’t find out whether or not her work was completed until a few days later.  By that time, you’ve long since forgotten what you asked and have to go back to your notes to figure it out.

Dave can speak English, but he instead chooses to speak a language he made up entirely on his own and refuses to respond to anything else.  All you have is a set of hastily written notes he jotted down on a napkin.  Most of the ways you know to communicate with him were arrived at by trial and error on past projects, all of which required you to keep your own notes.  The COO firmly believes he’s absolutely critical to the business and frequently directs you to assign him important tasks.  But he’s only available 25% of the time you need him.  The other 75%, he’s drunk at the local bar.

Emily speaks English, but interlaced with words she made up purely because she thinks they’re superior to the standard language.  She refuses to speak verbally and will only communicate using written word on paper.

Frank is obsessed with the “old days” and sticks to Old English.  He’s perfectly capable of becoming effective with modern English, but he’s stuck in his ways.  These days, Frank can’t handle much on his own, but he’s been an employee for a solid 40 years.  He knows everything there is to know about local ordinances and codes, so you must frequently consult with him and pass the information off to the others.

Gina does not understand a single language, but she’s able to perfectly enunciate any sentence you tell her, repeating it to someone else.  However, she refuses to do so unless you first pat her on the back and gently shove her in the correct direction.

Hank spoke English when you first hired him, but he decides to speak nothing but his hobby language (Klingon) in the middle of the project.

thonk

Need help managing Ashley, Bob, Cindy, Dave, Emily, Frank, Gina, and Hank?  I’m all ears.

Apache Karaf on Vagrant (example Vagrantfile)

If you have a team developing OSGi applications for Apache Karaf, Vagrant provides an easy way to ensure everyone is testing local deployments in a consistent context.  Vagrant is a little like Docker, using a layered approach to build up virtual environments.  In this case, we create an Ubuntu “box”, running on a VirtualBox VM, and automatically set it up with everything necessary for Karaf testing. Continue reading

Find Transaction Leaks in Wildfly and JCA

Here’s a quick tip: the easiest way to find transaction leaks in Wildfly/JCA.  In standalone.xml’s JCA subsystem, change

<cached-connection-manager/>
to
<cached-connection-manager debug=”true”/>

That will automatically close un-closed JCA connections and spit out a nice warning/stacktrace, showing exactly where the transaction was opened.

Apache and MariaDB/MySQL Settings for Low-Memory Servers

Gone are the days of requiring large amounts of resources to adequately run a fast, enterprise-grade web server.  I currently run a single DigitalOcean instance (the 1GB memory plan) and host many web platforms with no performance issues, whatsoever.  I thought I’d share the settings that have been working really well in this low-memory environment.  Note that the server is running CentOS 7, but these settings should be applicable for any OS.

For what it’s worth, if you’re interested in a DigitalOcean account, click here to use my referral — you’ll gain $10 in credits when you sign up… Continue reading