Filtering by joining is difficult to optimise


Joining data is one of the most complex parts of
SQL optimisation. Especially when you are dealing with large tables and you are
trying to visualise what set processing the query engine will do.

You might say you don't need to do all that, you can just use the Database
Tuning Adviser. Well you're wrong. The database tuning advisor only helps out
indexing a database. It doesn't help out when you have bad SQL. Do you have an
app that fires a query for each line in an order, or calculates the order totals
for all customers but only returns one of them

All these situations are about database querying design. The two examples
above are quite simple to see and fix, but what happens when you just have a
very large query that takes ages.You've got covering indexes, and key aligned
indexes as much as possible but its still slow and you are only returning a few
rows. Well what is most likely happening is that within the query you are
processing more data than you need to. Trying to resolve this is very complex
and oftens comes down to joining.

There are a number of ways to reduce the number of rows processed during
a query,

1. use a where clause with a literal

2. join to another table

option 1 is very easy for the engine to process and can often mean the rows
never make it out of the storage operator, index seek, index scan. The second is
more difficult. If you have two tables with 1 million rows and when joined you
only get 10 records. The Query processor has to likely process all rows in both
tables.You maybe lucky and have key aligned indexes (indexes where the key
columns are in the same order and match those of the join clause) which means
you can do a merge join, if you haven;t got key aligned indexes then you will
likely have a hash join. In both case you still have to read all the data. What
happens when you add a 3rd table into the equation. that makes things more

The summary is that even though you only need 10 rows if you are joining
tables to filter it is likely you have to process all dat in both/all tables. If
you can't do anything about the query/table design you need lots of memory and
if your data is greater than your memory you need fast storage


Leave a Reply

Your email address will not be published. Required fields are marked *