Aug 12, 2014

Neo4j read-only transactions still need success()

Since Neo4j 2.0 it has been necessary to wrap read-only database access in transactions just like you needed for write access in 1.X. This has necessitated refactoring of much legacy code, adding transactions around code that previously did not require it. It turns out that there is a subtlety to this, a case when it is easy to make a mistake that does not necessarily cause any immediate problems.

Let's first explain a common pattern that leads to this issue. Consider a method with multiple return statements:
public String getName(Node node) {
    if(node.hasProperty("name")) {
        return node.getProperty("name");
    } else {
        return null;
    }
}
This code is a bit contrived, but it does represent a common pattern, multiple return statements in code that does read-only access to the database. I've seen much more complex cases, but this one suffices to demonstrate the problem. So, what happens if we add a transaction? The easiest is to wrap all code in the method:
public String getName(Node node) {
    try (Transaction tx = node.getDatabase().beginTx()) {
        if(node.hasProperty("name")) {
            return node.getProperty("name");
        } else {
            return null;
        }
    }
}
You might immediately notice I left out the tx.success() call. The reason I did this was two-fold:
  • I would need to add multiple calls, one before each return.
  • If I follow the pattern of calling tx.success() after all database access, I should create a new local variable for the result of the getProperty() call making the code mode complex.
  • It turns out that this code runs fine, so why bother with the tx.success() call?
This last point is the subtle catch. While this code seems to run fine, there are cases when it really does not run fine. Before explaining those cases, let's just look at how the code would look if we were strict with the tx.success() call.
public String getName(Node node) {
    try (Transaction tx = node.getDatabase().beginTx()) {
        if(node.hasProperty("name")) {
            String name = node.getProperty("name");
            tx.success();
            return name;
        } else {
            tx.success();
            return null;
        }
    }
}
Clearly this is noticeably more verbose than the previous code. And since the previous code worked, it is extremely tempting to code like that. And I did. And all hell broke loose and it took a while to figure out why.

It turns out that while a single transaction like this does run fine without the tx.success() call, it does not really work with nested transactions. If you have other code calling your method, and that code also has transactions, and those transactions remember to call tx.success(), then an exception will be raised. The inner transaction, even though it becomes a dummy transaction, will mark the transaction as not having completed successfully, and if the outer transaction does mark success, the database will reject this case, and throw a:
TransactionFailureException
 - Unable to commit transaction
with cause:
RollbackException
 - Failed to commit, transaction rolled back
and no further information.

When I first saw this, I could not understand where it was coming from. I was in the process of updating legacy code by adding more read-only exceptions, and successfully fixing several unit test cases. However, I found that after fixing some unit tests, other apparently unrelated unit tests started failing with this exception. It occurred because the first unit tests added transactions deeper in the call stack, the same methods used by other tests that had already added transactions higher in the stack. Normally it does not matter if you nest transactions. This is a common pattern. So it was not obvious, at first, that the nesting would cause the problem, but it sure did.

To clarify the cases when this problem occurs I wrote a little test code:
private void readNamesWithNestedTransaction(boolean outer, boolean inner) {
    try (Transaction tx_outer = graphDb.beginTx()) {
        try (Transaction tx_inner = graphDb.beginTx()) {
            readNames();
            if (inner) {
                tx_inner.success();
            }
        }
        if (outer) {
            tx_outer.success();
        }
    }
}
This method can be called with two booleans, which control whether tx.success() is called in the inner transaction, outer transaction or both. It turns out only one of the four possible combinations will cause an exception (tested against Neo4j 2.1.2):
Outer
successno-success
Innersuccess
no-successException

The cases where the inner and outer are the same, no exception is thrown because there is no inconsistency. If they are both true (success called for both), the transaction succeeds cleanly. If they are both missing the success call, then the entire transaction rolls back, but since it is a read-only transaction, nothing will actually be done. If the inner transaction is true, and the outer false, the outer ignores the success of the inner transaction, and decides on a full rollback, which again does nothing. However, in the specific case when the inner does not have success(), but the outer does, it marks this as an invalid case, and throws this exception despite being read-only.

My conclusion? While it is not really wrong to skip the success() on read-only transactions, it is certainly a very bad idea. Perhaps you can get away with it on the outer-most main loops of an app, when you are really certain you will not be called by something else, but be very, very wary of forgetting the success call in any methods you expect to be called from elsewhere. Either have no transaction at all (rely on the calling code to add transactions), or accept the higher verbosity that comes with writing the code right.