NHibernate querying – a matter of knowing your scenarios

Skrevet - Saturday, June 26th, 2010 kl. 10:48 | Kategori - * Rants

These past few days, I’ve held a company-specific course on NHibernate. It went really well, and I believe the participants also got those few golden eggs that makes the course worthwhile for them. They had been using NHibernate for some time and have specialists abroad – so, it was with some trepidation I approached the course material.

I focused on relaying the design decisions in NHibernate and then discussed a lot of do’s and dont’s. I think the key thing to learn when using NHibernate is the power of the ISession. Typically, people are locked into a world where they need to express things as you do with SQL – ie. you query the same way that you would in SQL. Behold the famous Customer vs. Order scenario:

Each Order has a Customer and a Customer can have any number of Orders making the association a one-to-many in the database. But how about querying the database efficiently? Most people will first try this given lazy-loading:

using(var session = GetSession())
using(var tx = session.BeginTransaction())
{

var query = session.CreateQuery(“from Order”);
var orders = query.List<Order>();
foreach(var order in orders)
{

Console.WriteLine(“OrderDate {0} with Customer {1}”, order.Id,order.Customer.AccountNumber);

}

}

This will result in the dreaded Select N+1 where N is the number of Customers that have Orders. Worse, it opens equally many database-connections. (To explain – everytime we hit an Order with a Customer that has not yet been loaded, an SQL query is sent for the Customer – if we hit a Customer that has already been fetched NHibernate will take it from the 1st level cache. Okay, most will then try this:


var query = session.CreateQuery(“select o from Order o join fetch o.Customer”);

We have now reached our goal, right? While it is true that we have reduced the number of queries to one and thus avoided Select N+1, we have a potential problem left. The problem with joins is that we will return a potentially much larger result-set (called the Cartesian product). Suppose our Customers are really heavy on data – and each Customer has many Orders – we will effectively be returning the same heavy customer data object many times over. Let’s say each Customer is 10 KB and each order is 1 KB and each Customer has on average 1.000 orders and there are 10.000 Orders in total. We will now be transferring 1 KB * 10.000 (orders) and 10 KB * 10.000 (customers each returned on average 1000 times) making for 110.000 kb or ~ 107 MB…

How to solve this conundrum – SQL cannot help us here, since it will have exactly the same problem? Well, there is one thing left to exploit. The ISession’s IdentityMap:


session.CreateQuery(“from Customer”).List();
var query = session.CreateQuery(“from Order”);

Why does this work? Every time anything is loaded in an ISession, it is stored in the 1st level cache. This when we iterate over Orders and query for the Customer, it will result in a cache-hit every time. Given our previous example, we have reduced this to two queries, resulting in 1 KB * 10.000 (orders) and 10 KB * 10 (customers) = 10.100 KB ~ 10 MB. Only one thing is still bugging us – we open two connections to the database – but we can get rid of this using MultiQuery:

using(var session = GetSession())
using(var tx = session.BeginTransaction())
{

var multiQuery = session.CreateMultiQuery();
multiQuery.Add(“from Customer”);
multiQuery.Add(“from Order”);
var list = multiQuery.List();
foreach(var order in ((IEnumerable)list[1]).OfType<Order>())
{

Console.WriteLine(“OrderDate {0} with Customer {1}”, order.Id,order.Customer.AccountNumber);

}

}

As with all performance optimizations – this is a contrived example – you will rarely run into such improvements – and you need to know your scenarios to decide which is the better thing to do – basically what are we trying to reduce? The amount of data – the amount of connections and what possibilities do we have. Also, in a real world scenario, you would use NH Profiler or SQL Profiler to determine that it is actually a problem before deciding to performance optimize in the first place. Optimizations come at a price… reduced readability of the code and thus increased maintenance.

Feed | Trackback |

Post a Comment