INSERT and UPDATE loading practice – The impact of forwarding pointers.

Ever since I’ve been working with databases I have come across the following pattern of programming. The pattern is an INSERT and LOTS of UPDATEs. The reason for this pattern is reasonable, and is generally due to the desired data set being complex to generate from the source database.


For this reason a base set of data is generated and then updated by a number of updates.


I’ve seen this is two main scenarios, integration and warehousing. The former is due to a target system wanting something that the soruce system had in a different structure or a differnt level and the latter generally wants calculated data like number of children, or name of last dentist visited.


You might think these should easily be solved by one query, and if there was only on of these bits of infomation required that would be easy. However more often than not there are many, and even worse they are depenent, i.e. given the last dentist someone visted, give me the number of people that have sued said dentist.


So logically people break this down into manageable chunks. And do something like,


 


INSERT INTO CustomerDetails


SELECT somecolums


FROM Customers


 


 


ALTER sometable ADD NumberOfOrders int, NumberOfChildren int, NumberOfPeopleSued int, LastDentistVisited varchar(100)


 


UPDATE CustomerDetails


   SET NumberOfChildren = (SELECT COUNT(1) FROM ….)


 


UPDATE CustomerDetails


   SET LastDentistVisited = (SELECT COUNT(1) FROM ….)


 



 


UPDATE CustomerDetails


   SET NumberOfPeopleSued= (SELECT TOP 1 Surname FROM ….)


 


The reason for this post is that this approach is right for 1 reason and wrong for many more.


It is right because it is good practice to break small problems down into manageable understandable chunks as this aids understanding and means code is easier to support in the future.


It is wrong for many reasons of performance and one functional reason. I will cover the latter in a later post but will mention the performance reasons now.


1. The addition of the columns means each row has to be adjusted to fit the new columns. With fixed width columns this can be very expensive as columns have to be moved about. If you have a need for this the columns should be added in the insert statement. To get the correct type, use cast, i.e. CAST(NULL as int) or CAST(NULL as varchar(24))


2. The second reason is that for each UPDATE each row has to be updated. If you have N updates that means each page is touched at least N times. Which is not good for performance. If you have to do this the solution is to try and batch up dates to do it in one pass, by using derived tables and CASE statements. i.e.


UPDATE <table>


   SET Col1 = CASE WHEN <some criteria>


                   THEN <a new value> ELSE Col1 END


     , Col2 = CASE WHEN <Some other criteria>
                   THEN <some other new value> ELSE Col2 END


3. The final reason is one of forward pointers. Even if you add your additional columns by adding them in the INSERT statement, if any of them are variable length then they will result in FORWARDED RECORDS when you start doing an update. Whats a forwarded record, well it occurs when a row has to move because it no longer fits on the page it was put to start with. Its just like when you move house and redirect your mail. SQL Server maintains that your record is still located in its original location but in that location is a pointer to another location, in another page in your database. Why is this bad. Because it means SQL server has to perform two IOs to be able to read the data, and also it has to do more work to update the data. More work equals worse performance.


So lets have a look at forward pointers.


In this example we are going to create a table, and put 1 row in it. We will then put another row in the table that fills the page.


use tempdb


go


drop table heap


go


create table heap (id int not null, v1 varchar(5000))


go


insert into heap(id,v1)


select 1, ''


go


select object_id, partition_number, alloc_unit_type_desc, record_count, avg_record_size_in_bytes, forwarded_record_count


from sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')


go


insert into heap(id,v1)


select 2, REPLICATE('#',4000)


go


select object_id, partition_number, alloc_unit_type_desc, record_count, avg_record_size_in_bytes, forwarded_record_count


from sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')


go


select * from heap


cross apply sys.fn_PhysLocCracker(%%physloc%%)


 


What you should see is that after the first insert the DMV dm_db_index_physical_stats  is reporting 1 row and no forwarded records. After the second insert we get 2 tows and no forwarded records. Not we are also using a SQL 2008 undocumented feature %%physloc%% and fn_PhysLocCracker.  Both of these are undocumented, but are very handy as they show the physical location of a row, the File, Page and Slot.  There is also a scalar function as well. Thanks to Paul Randal for the pointer http://www.sqlskills.com/blogs/paul/post/SQL-Server-2008-New-(undocumented)-physical-row-locator-function.aspx  . What we will see is that this returns the original location and not any forwarded location.


Now if we update the first row so its too big to fit on the page what do you think happens.


update heap


set v1= REPLICATE('@',5000)


where id = 1


go


select * from heap


cross apply sys.fn_PhysLocCracker(%%physloc%%)


go


select object_id, partition_number, alloc_unit_type_desc, record_count, avg_record_size_in_bytes, forwarded_record_count


from sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')


 


Well you will see that we now have a forwarded record, but %%PhysLoc%% is still saying its in the same place. If we use DBCC IND we can see that the table now has 3 pages, an allocation page and two data pages.


dbcc ind(tempdb, heap,1)


go


dbcc traceon (3604)
go
dbcc page (tempdb,1,???,3)
dbcc page (tempdb,1,???,3)


Using DBCC PAGE we can look at the original page and you will see something like this. (note the ??? change these to the page numbers returned from DBCC IND where PageType = 1)


Slot 0 Offset 0x101a Length 9


 


Record Type = FORWARDING_STUB        Record Attributes =                  Record Size = 9


 


Memory Dump @0x0000000011A1B01A


 


0000000000000000:   04000900 00010000 00†††††††††††††††††..     ……       


Forwarding to  =  file 1 page 2304 slot 0


This highlights that the row has moved to page 2304 (your page will probably be different)


If we look at that page (which will be the second DBCC PAGE above) you will find your row.


So what happens if we move house again, what does SQL Server do. To check it out we need to add two rows, 1 fills up the first page and one fills up the second page. If we then update the first row so that its too big for its new page, it should move.


 


insert into heap(id,v1) select 3, REPLICATE('#',4000)


insert into heap(id,v1) select 4, REPLICATE('#',1000)


go


select * from heap


cross apply sys.fn_PhysLocCracker(%%physloc%%)


go


select object_id, partition_number, alloc_unit_type_desc, record_count, avg_record_size_in_bytes, forwarded_record_count


from sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')


go


update heap


set v1= REPLICATE('@',8000)


where id = 1


go


select * from heap


cross apply sys.fn_PhysLocCracker(%%physloc%%)


go


select object_id, partition_number, alloc_unit_type_desc, record_count, avg_record_size_in_bytes, forwarded_record_count


from sys.dm_db_index_physical_stats(db_id('tempdb'),object_id('heap'),0,null,'detailed')


go


dbcc ind(tempdb, heap,1)


Doing the insert we will see that the 4th row now appears on the second page but the other three appear to be on the first page. After the update nothing changes and the forward record count is still 1. So whats happened.


Well if we look at the first page again, using DBCC PAGE you will find that the page number that the forward pointer points to has changed to a new page. If you run DBCC IND you will see you now have a 3rd data page. Interestingly if you have a look at the second page, you will see that the slot 0 when the old forwarded record resided is now not being shown, thats because there is nothing in it. Slot 0 is now free.


So the bottom line is that if you are resorting to use such an approach doing an INSERT and then an UPDATE forwarding records can cause you a problem, not just during reading but also when updating because data has to be moved. If you have to use this approach then you are best to preallocate space in the table this ensures that the row doesn’t move and performance will improve. I’ve seen 50% improvements in systems by allocating defaults to varchar columns using replicate for instance.


So what should you do.


Well whilst you might be able to right 1 query that achieves your goal, it is probably only supportable by the person that wrote it or someone from MENSA. So breaking it up is a good thing, but the use of updates is a bad thing. You will probably find that it is quicker to perform 5 insert statements and then joined them all together rather than 1 INSERT and 4 UPDATES. If you use SELECT INTO it will be a bulk logged operation, you can then add a clustered index on your key which should then allow the final query to be an efficient MERGE  JOIN.


If you can you could use INSERT INTO with the CLUSTERED INDEX already inplace and use trace flag 610 http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/23/minimal-logging-changes-in-sql-server-2008-part-2.aspx This would normally be the quickest option.


ps. One nice feature in SQL 2008 is that the ALTER TABLE <heap> REBUILD, removes forward pointers.







SQL Server Usergroup in London on Sept 10th – All things BI and SQL 2008

I have teamed up with the guys that run the PASS chapter in London to do a SQL 2008 and BI event.


Ashwani Roy from IMGroup has kindly volunteered to do the BI side and I will be doing the SQL 2008 stuff.


If you are based in London then why not come along, the meeting is in Canary Wharf.


For more details go to http://www.sqlpass.org.uk/Events/September2009Meeting/tabid/59/Default.aspx


I look forward to seeing you there.








Error 1309 uninstalling Adventure Works sample

Just came across a really nice feature of the Adventure Works sample uninstall. I had accidentally installed the case sensitive colation db and I'm not that picky and so wanted to replace with the insensitive one. You know the one that doesn't appreciate its 3am and you still haven't finished your demos, when it does something unexpected.


Anyway I tried to uninstall but it failed because the data files were in use. No surprise there, so I deleted the databas, but it still failed because I think it still expected the files to be there.


So I did a repair and still no luck.


What I narrowed it down to was permissions. The permissions on the LDF file where just me. So it seems that the uninstall needed some extra permissions. Now it could be because I am on a laptop and not connected to the domain.


Anyway I added everyone to the permissions on the LDF file and the data folder and everything worked fine.


I have removed those permissions now so we should be back to normal now.








Software developer conference in Dublin

What: epicenter 2009: http://epicenter.ie epicenter is three days of software excellence – it's a conference, it's an expo, it's a place to hang out! It's for software architects, CTO's and developers of all knowledge and experience levels.


 


epicenter is organised by IrishDev.com – News, Events and Jobs for Irish Software Developers.


 


epicenter is three days of software excellence – it's a conference, it's an expo, it's a place to hang out! It's for software architects, CTO's and developers of all knowledge and experience levels.


 


epicenter is organised by IrishDev.com – News, Events and Jobs for Irish Software Developers.


 


epicenter is three days of software excellence – it's a conference, it's an expo, it's a place to hang out! It's for software architects, CTO's and developers of all knowledge and experience levels.



When: Wednesday 26th through to Friday 28th August 2009



Where: Trinity College, Dublin



Cost:  Online prices 1-Day €124.00 / 2-Day €183 / 3-Day €220.00


 


What to expect:
The event has four streams covering Java, Web, Open Source and Microsoft.  Each stream has two tracks (except Open Source): expect to choose from four sessions per track (i.e. eight per stream!) 


 


Keep up to date:
Follow epicenter2009 on Twitter:
http://twitter.com/epicenter2009 or on Facebook: http://facebook.com/irishdev


 








Win a free registration and Hotel for SQLBits V

We've just announced a competition to win a free ticket for all 3 days of SQLBits. Plus we will put you up in the Hotel on the Thursday and Friday night. So you can choose whatever pre con you want to attend, spend the day getting up to speed with SQL 2008 and then spend the Saturday learning about everything else you wanted to know about SQL. Thats all worth £500 +


All you have to do is take a mug shot i.e. a photo of your mug from SQLBits IV. If you don't have a mug then you can download a picture you can stick on a mug, I'll also be giving some out at events I'm doing in September


The prize is for the most interesting picture, maybe your mug flying a plane or skydiving from 13,000 feet. who knows.


To enter post your pics to http://www.sqlbits.com/Competition.aspx







SQL Bits session subsmission

Don't forget to submit sessions for SQLBits V in November.


For anyone that hasn't spoken before as a speaker you will get exclusive access to the speaker dinner, we will cover one nights hotel and provide you with, the best of all, a SQLBits polo shirt.


Its easy to submit a session just complete a profile http://sqlbits.com/Profile/SpeakerProfile.aspx and then submit sessions http://sqlbits.com/Profile/MySessions.aspx if you don't like what you've submitted you can always change it later. (well until its chosen that is).


Submission will close at the end of the month so get submitting soon.







Publisher cannot open file – Publisher 2007 error

I've just spent the last 30 minutes trying to fix a problem my Dad was experiencing. As with most people in IT I am my parent's IT support.


He writes the local church magazine and has done so for many years. Its all done in Microsoft Publisher.


He's just come back from holiday and found that he can't open any of his Publisher documents. He just gets "Publisher cannot open the file" error. not very helpful.


So where does he go as a home user?  Me.


After managing to get on his machine I starting bing bonging away and find a thread that discusses this. Odly there are some posts from 2007 and some from 2009. (probably due to it being such a generic error message).


Any way the fix is to get hotfix 972566 http://support.microsoft.com/kb/972566/ which reports this to be an issue after you install SP2. Being a good update citizen Dad has auto update turned on.


Now this is an ondemand hotfix that means you have to request it. The steps are as follows.


  1. Find that you need the hotfix (Bing for "Publisher cannot open file" 2007 error http://www.bing.com/search?q=%22Publisher+cannot+open+file%22+-+Publisher+2007+error+1100009&form=OSDSRC and you will find lots of pages but no link to the KB
  2. Find the KB page (http://support.microsoft.com/kb/972566/)
  3. Find the button on the KB page (http://support.microsoft.com/hotfix/KBHotfix.aspx?kbnum=972566&kbln=en-us)
  4. Complete the form to get the hotfix downloaded
  5. Find the hotfix in you email, (often its in your spam folder)
  6. Open it
  7. Download the hotfix from Microsoft and save it somewhere
  8. Try an remember where you saved it and open it (its a self extracting exe)
  9. Specify a folder (It nicely defaulted to c:\, I hope it doesn't do that for normal users as they won't have writes to the C drive)
  10. Copy the obscure password from the email to the extractor and start the extraction
  11. Find the folder you specified in 9. and run the hotfix file.
  12. Acknolwedge the EULA and everything
  13. All done.

Now this is the process for the all ondemand hotfixes I am aware of. Its nice that you don't have to contact support to request the hotfix but for a consumer these steps are awful. I think they are acceptable for an sysadmin person who is used to the process but for the normal consumer its a complete nightmare.


Finding the update is difficult enough and I would have expected it on the top issues page, a similar issue is on the page but from 2007 http://support.microsoft.com/ph/11374#tab0