I’ve been involved in designing and documenting a 5-day stress test for a customer and wanted to share a couple of findings and tips with the community.
First of all: LoadGen is proving itself again and again! The documentation is not quite complete and accurate (and most samples I’ve seen in blogs are just not working – maybe a next post?), but once you’ve figured that out and got it up-and-running, its easy to tweak it to the needs for that particular customer you’re working for.
Now during stress testing there are a number of things you’d want to monitor: CPU and Memory usage on all machines and Disk IO and queue lengths (and maybe the average lock wait time and average lock timeouts/sec) on SQL are the most important ones. Of course the BizTalk counters for Spoolsize, Documents processed/sec, Orchestrations processed/sec and Orchestration persistence points/sec are important ones to monitor. Windows’ Perfmon is the ideal tool to be used there. You can see (and log) all counters in one console and once the stress test is over, you can analyze the logs with the same tool. Like this, you can detect memory leaks, CPU uptrends and SQL ‘drowning’ issues.
After running the test, the hard part starts: Analyzing the results.
From a BizTalk perspective, you’d want to know if there is an uptrend in response times and/or message throughput. I still think that HAT is the ideal tool here. Just change the query ‘last 100 orchestrations’ (remove the top xxx from the query) and execute it. Then save the results as Excel and start doing your analysis.
Now there’s one thing that’s quite important if you run tests that run for more than 1 day: Disable the SQL Agent job: “DTA Purge and Archive (BizTalkDTADb)” (this job continuously archives all tracked messages that are more than 1 day old by default) and make sure there’s enough disk space for keeping all the handled messages in the DTA database. This is crucial for being able to analyse ALL the message traffic. Otherwise you end up restoring a couple of DTA backups….
One last thing I’d like to remind you of: Don’t forget to also log all the counters you see in Perfmon. It’s happened to quite some people (including me) that you added a counter to the Perfmon console, and forgot to also log it…