INSERT and UPDATE loading practice – The impact of forwarding pointers.

Ever since I’ve been working with databases I have come
across the following pattern of programming. The pattern is an INSERT and LOTS
of UPDATEs. The reason for this pattern is reasonable, and is generally due to
the desired data set being complex to generate from the source database.

For this reason a base set of data
is generated and then updated by a number of updates.

I’ve seen this is two main
scenarios, integration and warehousing. The former is due to a target system
wanting something that the soruce system had in a different structure or a
differnt level and the latter generally wants calculated data like number of
children, or name of last dentist visited.

You might think these should easily
be solved by one query, and if there was only on of these bits of infomation
required that would be easy. However more often than not there are many, and
even worse they are depenent, i.e. given the last dentist someone visted, give
me the number of people that have sued said dentist.

So logically people break this down
into manageable chunks. And do something like,

 

INSERT
INTO CustomerDetails

SELECT
somecolums

FROM
Customers

 

 

ALTER
sometable ADD NumberOfOrders int,
NumberOfChildren int, NumberOfPeopleSued
int
, LastDentistVisited varchar(100)

 

UPDATE
CustomerDetails

   SET NumberOfChildren = (SELECT COUNT(1) FROM ….)

 

UPDATE
CustomerDetails

   SET LastDentistVisited = (SELECT COUNT(1) FROM ….)

 

 

UPDATE
CustomerDetails

   SET NumberOfPeopleSued= (SELECT TOP 1 Surname FROM
….)

 

The reason
for this post is that this approach is right for 1 reason and wrong for many
more.

It is right
because it is good practice to break small problems down into manageable
understandable chunks as this aids understanding and means code is easier to
support in the future.

It is wrong
for many reasons of performance and one functional reason. I will cover the
latter in a later post but will mention the performance reasons now.

1. The
addition of the columns means each row has to be adjusted to fit the new
columns. With fixed width columns this can be very expensive as columns have to
be moved about. If you have a need for this the columns should be added in the
insert statement. To get the correct type, use cast, i.e. CAST(NULL as int) or
CAST(NULL as varchar(24))

2. The second
reason is that for each UPDATE each row has to be updated. If you have N updates
that means each page is touched at least N times. Which is not good for
performance. If you have to do this the solution is to try and batch up dates to
do it in one pass, by using derived tables and CASE statements. i.e.

UPDATE
<table>

   SET Col1 = CASE WHEN <some criteria>

                  
THEN <a new value>
ELSE Col1 END

     , Col2 = CASE WHEN
<Some other criteria>
           
       THEN <some other
new value> ELSE Col2 END

3. The final
reason is one of forward pointers. Even if you add your additional columns by
adding them in the INSERT statement, if any of them are variable length then
they will result in FORWARDED RECORDS when you start doing an update. Whats a
forwarded record, well it occurs when a row has to move because it no longer
fits on the page it was put to start with. Its just like when you move house and
redirect your mail. SQL Server maintains that your record is still located in
its original location but in that location is a pointer to another location, in
another page in your database. Why is this bad. Because it means SQL server has
to perform two IOs to be able to read the data, and also it has to do more work
to update the data. More work equals worse performance.

So lets have
a look at forward pointers.

In this
example we are going to create a table, and put 1 row in it. We will then put
another row in the table that fills the page.

use
tempdb

go

drop
table heap

go

create
table heap
(id int
not null, v1
varchar(5000))

go

insert
into heap(id,v1)

select
1, ''

go

select
object_id,
partition_number, alloc_unit_type_desc, record_count,
avg_record_size_in_bytes,
forwarded_record_count

from
sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')

go

insert
into heap(id,v1)

select
2, REPLICATE('#',4000)

go

select
object_id,
partition_number, alloc_unit_type_desc, record_count,
avg_record_size_in_bytes,
forwarded_record_count

from
sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')

go

select
* from
heap

cross
apply sys.fn_PhysLocCracker(%%physloc%%)

 

What you should see is that after
the first insert the DMV
dm_db_index_physical_stats  is reporting 1 row
and no forwarded records. After the second insert we get 2 tows and no forwarded
records. Not we are also using a SQL 2008 undocumented feature
%%physloc%% and fn_PhysLocCracker.
 Both of these are undocumented, but
are very handy as they show the physical location of a row, the File, Page and
Slot.  There is also a scalar
function as well. Thanks to Paul Randal for the pointer
http://www.sqlskills.com/blogs/paul/post/SQL-Server-2008-New-(undocumented)-physical-row-locator-function.aspx
 .
What we will see is that this returns the original location and not any
forwarded location.

Now if we
update the first row so its too big to fit on the page what do you think
happens.

update
heap

set
v1= REPLICATE('@',5000)

where
id = 1

go

select
* from
heap

cross
apply sys.fn_PhysLocCracker(%%physloc%%)

go

select
object_id,
partition_number, alloc_unit_type_desc, record_count,
avg_record_size_in_bytes,
forwarded_record_count

from
sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')

 

Well you will see that we now have
a forwarded record, but %%PhysLoc%% is still saying its in the same place. If we
use DBCC IND we can see that the table now has 3 pages, an allocation page and
two data pages.

dbcc
ind(tempdb,
heap,1)

go

dbcc
traceon (3604)
go
dbcc page (tempdb,1,???,3)
dbcc page
(tempdb,1,???,3)

Using DBCC PAGE we can look at the
original page and you will see something like this. (note the ??? change these
to the page numbers returned from DBCC IND where PageType =
1)

Slot
0 Offset 0x101a Length 9

 

Record
Type = FORWARDING_STUB       
Record Attributes =                 
Record Size = 9

 

Memory
Dump @0x0000000011A1B01A

 

0000000000000000:   04000900 00010000
00†††††††††††††††††..    
……       

Forwarding
to  =  file 1 page 2304 slot 0

This highlights that the row has
moved to page 2304 (your page will probably be
different)

If we look at that page (which will
be the second DBCC PAGE above) you will find your
row.

So what happens if we move house
again, what does SQL Server do. To check it out we need to add two rows, 1 fills
up the first page and one fills up the second page. If we then update the first
row so that its too big for its new page, it should
move.

 

insert
into heap(id,v1) select 3, REPLICATE('#',4000)

insert
into heap(id,v1) select 4, REPLICATE('#',1000)

go

select
* from heap

cross
apply sys.fn_PhysLocCracker(%%physloc%%)

go

select
object_id,
partition_number, alloc_unit_type_desc, record_count,
avg_record_size_in_bytes,
forwarded_record_count

from
sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')

go

update
heap

set
v1= REPLICATE('@',8000)

where
id = 1

go

select
* from heap

cross
apply sys.fn_PhysLocCracker(%%physloc%%)

go

select
object_id,
partition_number, alloc_unit_type_desc, record_count,
avg_record_size_in_bytes,
forwarded_record_count

from
sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')

go

dbcc
ind(tempdb,
heap,1)

Doing the insert we will see that
the 4th row now appears on the second page but the other three appear to be on
the first page. After the update nothing changes and the forward record count is
still 1. So whats happened.

Well if we look at the first page
again, using DBCC PAGE you will find that the page number that the forward
pointer points to has changed to a new page. If you run DBCC IND you will see
you now have a 3rd data page. Interestingly if you have a look at the second
page, you will see that the slot 0 when the old forwarded record resided is now
not being shown, thats because there is nothing in it. Slot 0 is now
free.

So the bottom line is that if you
are resorting to use such an approach doing an INSERT and then an UPDATE
forwarding records can cause you a problem, not just during reading but also
when updating because data has to be moved. If you have to use this approach
then you are best to preallocate space in the table this ensures that the row
doesn’t move and performance will improve. I’ve seen 50% improvements in systems
by allocating defaults to varchar columns using replicate for
instance.

So what should you
do.

Well whilst you might be able to
right 1 query that achieves your goal, it is probably only supportable by the
person that wrote it or someone from MENSA. So breaking it up is a good thing,
but the use of updates is a bad thing. You will probably find that it is quicker
to perform 5 insert statements and then joined them all together rather than 1
INSERT and 4 UPDATES. If you use SELECT INTO it will be a bulk logged operation,
you can then add a clustered index on your key which should then allow the final
query to be an efficient MERGE  JOIN.

If you can you could use INSERT
INTO with the CLUSTERED INDEX already inplace and use trace flag 610
http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/23/minimal-logging-changes-in-sql-server-2008-part-2.aspx This would normally be the quickest option.

ps. One nice feature in SQL 2008 is that the ALTER TABLE <heap>
REBUILD, removes forward pointers.


SQL Server Usergroup in London on Sept 10th – All things BI and SQL 2008

I have teamed up with the guys that run the PASS chapter
in London to do a SQL 2008 and BI event.

Ashwani Roy from IMGroup has kindly volunteered to do the BI side and I will
be doing the SQL 2008 stuff.

If you are based in London then why not come along, the meeting is in Canary
Wharf.

For more details go to http://www.sqlpass.org.uk/Events/September2009Meeting/tabid/59/Default.aspx

I look forward to seeing you there.


Error 1309 uninstalling Adventure Works sample

Just came across a really nice feature of the Adventure
Works sample uninstall. I had accidentally installed the case sensitive colation
db and I'm not that picky and so wanted to replace with the insensitive one. You
know the one that doesn't appreciate its 3am and you still haven't finished your
demos, when it does something unexpected.

Anyway I tried to uninstall but it failed because the data files were in use.
No surprise there, so I deleted the databas, but it still failed because I think
it still expected the files to be there.

So I did a repair and still no luck.

What I narrowed it down to was permissions. The permissions on the LDF file
where just me. So it seems that the uninstall needed some extra permissions. Now
it could be because I am on a laptop and not connected to the domain.

Anyway I added everyone to the permissions on the LDF file and the data
folder and everything worked fine.

I have removed those permissions now so we should be back to normal
now.


Software developer conference in Dublin

What:
epicenter 2009:
http://epicenter.ie epicenter
is three days of software excellence – it's a conference, it's an expo, it's a
place to hang out! It's for software architects, CTO's and developers of all
knowledge and experience levels.

 

epicenter
is organised by IrishDev.com – News, Events and Jobs for Irish Software
Developers.

 

epicenter
is three days of software excellence – it's a conference, it's an expo, it's a
place to hang out! It's for software architects, CTO's and developers of all
knowledge and experience levels.

 

epicenter
is organised by IrishDev.com – News, Events and Jobs for Irish Software
Developers.

 

epicenter is three days of software
excellence – it's a conference, it's an expo, it's a place to hang out! It's for
software architects, CTO's and developers of all knowledge and experience
levels.

When: Wednesday
26th through to Friday 28th August 2009

Where: Trinity College, Dublin

Cost:  Online prices 1-Day €124.00 / 2-Day €183 / 3-Day
€220.00

 

What to
expect:

The event has four streams covering Java, Web, Open Source and
Microsoft.  Each stream has two tracks (except Open Source): expect to
choose from four sessions per track (i.e. eight per stream!) 

 

Keep up to
date:

Follow epicenter2009 on Twitter:
http://twitter.com/epicenter2009 or
on Facebook:
http://facebook.com/irishdev

 


Win a free registration and Hotel for SQLBits V

We've just announced a competition to win a free ticket
for all 3 days of SQLBits. Plus we will put you up in the Hotel on the Thursday
and Friday night. So you can choose whatever pre con you want to attend, spend the day getting
up to speed with SQL 2008 and then spend the Saturday learning about everything else you wanted to
know about SQL. Thats all worth £500 +

All you have to do is take a mug shot i.e. a photo of your mug from SQLBits
IV. If you don't have a mug then you can download a picture you can stick on a
mug, I'll also be giving some out at events I'm doing in September

The prize is for the most interesting picture, maybe your mug flying a plane
or skydiving from 13,000 feet. who knows.

To enter post your pics to http://www.sqlbits.com/Competition.aspx


SQL Bits session subsmission

Don't forget to submit sessions for SQLBits V in
November.

For anyone that hasn't spoken before as a speaker you will get exclusive
access to the speaker dinner, we will cover one nights hotel and provide you
with, the best of all, a SQLBits polo shirt.

Its easy to submit a session just complete a profile http://sqlbits.com/Profile/SpeakerProfile.aspx and
then submit sessions http://sqlbits.com/Profile/MySessions.aspx if
you don't like what you've submitted you can always change it later. (well until
its chosen that is).

Submission will close at the end of the month so get submitting
soon.


Publisher cannot open file – Publisher 2007 error

I've just spent the last 30 minutes trying to fix a
problem my Dad was experiencing. As with most people in IT I am my parent's IT
support.

He writes the local church magazine and has done so for many years. Its all
done in Microsoft Publisher.

He's just come back from holiday and found that he can't open any of his
Publisher documents. He just gets "Publisher cannot open the file" error. not
very helpful.

So where does he go as a home user?  Me.

After managing to get on his machine I starting bing bonging away and find a
thread that discusses this. Odly there are some posts from 2007 and some from
2009. (probably due to it being such a generic error message).

Any way the fix is to get hotfix 972566 http://support.microsoft.com/kb/972566/ which
reports this to be an issue after you install SP2. Being a good update citizen
Dad has auto update turned on.

Now this is an ondemand hotfix that means you have to request it. The steps
are as follows.

  1. Find that you need the hotfix (Bing for "Publisher
    cannot open file" 2007 error http://www.bing.com/search?q=%22Publisher+cannot+open+file%22+-+Publisher+2007+error+1100009&form=OSDSRC and
    you will find lots of pages but no link to the KB
  2. Find the KB page (http://support.microsoft.com/kb/972566/)
  3. Find the button on the KB page (http://support.microsoft.com/hotfix/KBHotfix.aspx?kbnum=972566&kbln=en-us)

  4. Complete the form to get the hotfix downloaded
  5. Find the hotfix in you email, (often its in your spam folder)
  6. Open it
  7. Download the hotfix from Microsoft and save it somewhere
  8. Try an remember where you saved it and open it (its a self extracting exe)
  9. Specify a folder (It nicely defaulted to c:\, I hope it doesn't do that
    for normal users as they won't have writes to the C drive)

  10. Copy the obscure password from the email to the extractor and start the
    extraction

  11. Find the folder you specified in 9. and run the hotfix file.
  12. Acknolwedge the EULA and everything
  13. All done.

Now this is the process for the all ondemand hotfixes I am aware of. Its nice
that you don't have to contact support to request the hotfix but for a consumer
these steps are awful. I think they are acceptable for an sysadmin person who is
used to the process but for the normal consumer its a complete nightmare.

Finding the update is difficult enough and I would have expected it on the
top issues page, a similar issue is on the page but from 2007 http://support.microsoft.com/ph/11374#tab0