Use the right data for SharePoint testing
Thanks to Dario Mratovich, Microsoft Consultant, for his awesome recommendations of preparing the testing data. Citing the section from his “Capacity Planning Testing for SharePoint 2007”
Make sure you have adequate sample data. This tends to be a very common stumbling block – sites are built with only a tiny percentage of content that production will have. Not enough sites, not enough content, not exercising a broad enough sampling of your dataset, not enough users – these can all fatally influence your test results.
Another common problem is having enough content for a reasonable search corpus. What many people try doing is uploading the same document many times – sometimes hundreds or thousands of times – and think that if it has a different file name then it will be okay. Unfortunately in that scenario the search duplicate process can start taking significantly longer than it otherwise normally would, so this too can unfairly reduce your query throughput.
A document that is uploaded multiple times will affect the way that SharePoint performs duplicate detection: where SharePoint’s search calculates a hash based on the contents of a document – it doesn’t look at the filename! So uploading a document 30,000, even with a different filename, will cause the search retrieval to become slower and slower as SharePoint tries to resolve duplicate documents.
You will need tools in all likelihood to populate sample data. Some tools you can start with are on CodePlex at http://www.codeplex.com/sptdatapop. You will probably end up writing additional tools for other data population tasks, or possibly to work in combination with these tools.
Using PowerShell for scripting the creating of objects and data for testing is also very useful.