THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Paul Nielsen

10 lessons from 35K tps

I’ve spent the last 6 months on a contract designing and developing a high transaction database – a couple orders of magnitude higher than the average OLTP SQL Server database. The SLA specified a hard transaction duration and the primary fetch feature required some complex business logic. That level of concurrency drives out a few lessons. Some reinforced what I already believed and some lessons were a surprise.

I’m under a strict NDA not to reveal the client or the industry, but I have permission to discuss some of the technology. Let me just say that if you live in the industrialized world, I’d bet $100 that your data has gone through this system at least once.

Lesson 1: At 35K tps, it’s difficult to configure a stable test or production platform. We had more trouble just getting the servers stable than we did developing the code. A good system administrator who knows how to properly configure every component in the network, server, routers, SAN is worth their weight in gold. And at this tps, don’t try iSCSI.

Lesson 2: We moved all the business logic into the database design and stored procedures to use set-based code. The middle tier was mostly a pass-through from the mainframe handling the manually partitioning and the connection pooling. The creative data-driven design allowed for flexibility and performed great.

Lesson 3: I developed both a denormalized and normalized version of the schema and the stored procedures using the same API. The denormalized version violated first normal form and repeated columns (part1, part2, part3, part4). It seemed like a reasonable idea at the time because a single clustered seek could fetch all the data without any other reads and the number of parts was fixed. However, several other descriptive columns were added for each “part”. The amount of code required to normalize the parts inside the proc, and the extra indexes, cost the denormalized solution, the result was that the normalized version actually out performed the denormalized version by several precious milliseconds. ("Ah, my Precious" said Gollum)

Lesson 4: When developing the upsert proc prior to tuning the indexes, I first trusted that the If Exists(Select…) line would fire for any item and would prohibit duplicates. Nada. In a short time there were thousands of duplicates because the same item would hit the upsert at the same millisecond and both transactions would see a not exists and perform the insert. After much testing the solution was to use the unique index, catch the error, and retry allowing the transaction to see the row and perform an update instead an insert.

Lesson 5: This app requires that several hundred “items” are passed into the stored procedure for processing as a set. I chose to input the data as comma delimited VarChar(max), originally using Erland’s iterative string2set function to parse string into a set inside the proc. It worked well enough. The plan was to substitute a CLR function near the end of development. When the time came, the CLR function ran blazingly fast on my notebook, but under stress it was painfully sloooooow. I didn’t understand at all why. This is exactly what we were told by Microsoft the CLR was for. Pass string data in, manipulate it, pass it back – no data access – sounds great right? In a few emails to my friend Adam, he told me he had the same issues a while back and the CLR simply couldn't keep up with the marshalling of the VarChar(max).

So back to the T-SQL function and time to make it faster. I switched to the Table of Numbers solution and found that with test data using many short strings (‘abc,qwe,wer,ert x9000) it out performed the iterative function by over 2x. But when double checking with our real data the iterative function ran faster by a magnitude. The reason was that our data has much longer strings, and the iterative solution quickly jumped to the next comma, cut the string and moved on.

Lesson 6: Agile iterative development with unit testing is a blast! During the development I maintained two scripts: one that created the database, tables, and procs, and a second one with test data and unit test calls. During the crunch of development, I would grasp the next 10-15% of the user requirements (some of which were a bit complex) and then build out those requirements. When it passed the tests and the client could see that it was working we moved on to the next level of complexity – rip it up and rebuild better and stronger with the next iteration. But then as the rush of development ended and it moved into integration testing, I had to conform to source code control. A file for every proc is a good thing for IT control, but it sure slowed down my development.

Lesson 7: At this transaction level the nonclustered indexes can get 97% fragmented in a day. Be very careful when choosing nonclustered indexes.

Lesson 8: The code went through several rounds of fine tuning and searching for ways to reduce any millisecond possible. The overall distribution of my time was about ½ design and code, ¼ unit testing, and ¼ integration and performance testing/tuning.

Lesson 9: Despite the best efforts to agree on an API between the middle tier and the database, little things like agreeing on the return codes and error handling, and formatting of dates were only discovered during integration testing.

Lesson 10: At 35K tps ADO.Net connection pooling is harder than it sounds.

So one day of knowledge transfer left on the contract. It’s about to go live and under a full production load it’s performing about 33% faster than required. The last thing to say about this project is that the folks were great to work with – I had a blast!

Published Wednesday, December 12, 2007 11:29 PM by Paul Nielsen
Filed under: , , , ,



Denis Gobo said:

Congratulations on a job well done. Did you do Scrum?

December 13, 2007 6:28 AM

Adam Machanic said:

So you found that an iSCSI system didn't work; what did you use for the I/O system, and what were the HA and DR requirements?  I think some more detail here (perhaps another post) would be extremely interesting.  What was the configuration of each tested system, and where did you see pitfalls?

December 13, 2007 7:59 AM

jamiet said:

Excellent info.  Thanks Paul! Building highly-transactional systems isn't a topic I can claim to be knowledgable in but is one I would love to learn more about. This will be a great reference.



December 13, 2007 8:51 AM

roman said:

Paul, in Leson 5, did you mean to say that the CLR could NOT keep up?

December 14, 2007 9:42 AM

Paul Nielsen said:

Roman, Exactly. At a small test load the CLR was 4x faster than the T-SQL but under the heavy transactin load the CLR was 10x slower than T-SQL. We re-compiled, re-configured and re-tested several times because we jsut couldn't believe it.

December 14, 2007 9:54 AM

Adam Machanic said:

Let's be a bit more specific: CLR could not keep up for THIS PARTICULAR TASK.  Just wanted to make sure people understand that there certainly are use cases for the technology, just not string splitting :-)

December 14, 2007 9:55 AM

Denis Gobo said:


what kind of box(es) does this run on CPU, RAM and 32 or 64 bit?

Also did you use partitioning functions and have you separated nonclustered indexes from clustered indexes by putting them on different filegroups


December 14, 2007 10:08 AM

Kevin Boles said:

I would be VERY interested in a more in-depth session on your experiences and findings.  Perhaps a 500 level PASS presentation?  Or a series of blog posts?  I consider myself an expert in performance tuning but I know I could learn a LOT of useful things from your work!!

December 14, 2007 10:32 AM

Paul Nielsen said:

I did the database design and development. Other were involved in the physical box configuration.

We started with iSCSI but it couldn't keep up with the estimated load using the SQLIO simulator. So they switched to a very nice Hitachi SAN. At first it couldn't keep up either even with 70Gb of cache. Then Hitachi spent a few days making LUN adjustments and it works great.

Each server is a 16 CPU quad-core server with 48Gb RAM clustered running SQL Server 2005 Enterprise 64-bit. We partitioned the data across four servers. A config table tells the .NET middle tier which server to send the items to using a very simple scheme. Sometime the Query proc can't completely solve the request locally and must fetch data from another server or two, in which case it calls a remote stored proc. on the other server.

Another are that I was not directly involved with but was a challenge was deleting half a million rows per day in a job without affecting 7x24 performance. The solution was to batch the deletes and loop until completed.

December 14, 2007 12:05 PM

Paul Nielsen said:

Another issue was that the CLR was that it changed the query execution plan. The T-SQL function estimated 1 row output and the CLR estimated 1,000 rows output. Until I noticed the changed query plan and fixed that, the CLR was looking real bad, but the real culprit was a index scan.

December 14, 2007 12:07 PM

noeld said:

Lesson4 : The correct answer is to use appropriate locking hints:

December 14, 2007 5:49 PM

Paul Nielsen said:

Noeld, Dan's solution is good SQL for most applications. It’s well written SQL and I considered nearly the same thing. So thanks for asking about that.

Under a very high transaction load, the holdlock solution simply won’t perform. At 35K tpb, Begin Tran and holdlock start looking as evil as cursors.

Seriously, at the start of the project we identified locking contention as one of the biggest risks and EVERY line of code is written to reduce locking (but NOT to eliminate locking - NO nolocks anywhere!)

December 14, 2007 8:59 PM

Adam Machanic said:

Paul, did you try the UPDATE ... ROWCOUNT ... INSERT merge pattern:


SET y = @z

WHERE pk = @pk


 INSERT x (y, pk)

 SELECT (@z, @pk)

December 14, 2007 9:32 PM

netbob said:

AIX involved anywhere by chance?

December 15, 2007 9:32 AM

Linchi Shea said:

Is the transaction rate measured at the app server layer or at the per-database server level? In other words, is 35K tps an aggregate over the four servers? Also, what is the average size of a transaction in terms of I/O MBs and # of I/Os?

December 15, 2007 11:58 AM

Linchi Shea said:

In addition, on each server, is this against a single database or multiple databases? If multiple, how many on each server?

December 15, 2007 12:01 PM

Gany said:


Any chance of posting the code for your CLR function and the TSQL version?

December 17, 2007 4:56 AM

Gerald Hanweck said:

Reply to Adam Machanic:   I believe UPDATE...ROWCOUNT...INSERT could produce duplicates without locking if another transaction performs a (duplicate) insert immediately after UPDATE returns ROWCOUNT=0, but before the INSERT is reached.

December 17, 2007 8:59 AM

Adam Machanic said:

Gerald:  Absolutely correct.  It is certainly no substitute for locking and/or catching a key violation.  What's great is that it does 50% less work than the IF EXISTS(...) pattern, if the row already exists.

December 17, 2007 9:47 AM

Kevin Boles said:

If you use SERIALIZABLE isolation level, will that cause range locks or some other locking that would prevent the duplicate entry when the other spid's INSERT fires?

December 17, 2007 9:58 AM

JJ B said:

You wrote: "A file for every proc is a good thing for IT control, but it sure slowed down my development."

I'm not sure what you mean here.  I'm guessing that you mean that if you have to store every stored proc in a separate file, then it takes longer to open up all those files in correct order and run them when you are re-creating the database.  If that is what you mean, I wanted to share my solution for this:  

I wrote a mini MS Access app which stores the paths and names of files stored in a convenient order.  The files include all the stored procs, UDFs, views, triggers etc for the database.  When I need to re-create my database, I click a button in the app which creates an MS Word Master Document.  That master document links to all the files in order and effectively acts like it has all the text embedded in one single file.  All I have to do is work with one (or in bigger systems, say 2 or 3) files of one large script.  The bigger file can be used both when re-creating databases (copy all the code at once in to Enterprise Manager) and when I need to do some global code searching or searching and replacing.

I get the best of both worlds:  both individual files for procs and larger/single scripts.  

Just a thought.

- JJ from Eugene

December 17, 2007 12:02 PM

Dataland said:

Paul Nielsen has written an informative post that chronicles lessons learned from developing a high transaction (35K tps) database system. Here are some topics that he touched, which I’ve always advocated ...


December 17, 2007 12:10 PM

CK said:

Were you checking for the violation of the unique index and retrying the transaction within TSQL code or from ADO?  Can you provide an example?


December 17, 2007 4:59 PM

Sean said:

Great article, Paul.

I read this article a few days ago and I could have sworn that you mentioned a tool that you used to benchmark.  Skimming through it now, however, I wasn't able to locate anywhere that you mentioned anything.  On that note, would you mind letting us know which tool --if any-- you used to perform the benchmark?

Thank you...

December 17, 2007 10:57 PM

Jimmy said:

Hi Paul..

just want to ask, i'm newbie in SQL.

How do you manage if the implementation phase running together with development phase? Like if you change a hundred tables (with relation) from total thousand tables (development server) and want to implement it on production server.

I got problem of changing database (tables/sp) on production server too much which slow down both environment.

Even software like DBComparer from EMS cannot solve it well. Thousands of tables with relation just too much.

For now I do it manually and takes lot of time.

December 18, 2007 11:32 AM

foobar said:

Excellent article.  Thank you for this.

December 18, 2007 7:28 PM

homer said:

Oh so now you have transactions on computers now? so what is the application, what application servers, and what did you really learn that is generally useful - i think you can reduce your 10 points to 1 or 2 to save me and my family and friends some reading - so what are the two things?

December 18, 2007 7:34 PM

Kevin Boles said:

I wonder if ApexSQL's Edit, with it's very nice source code integration features, would have been helpful for you Paul.  It also has some unit-testing functionality baked in as well.  Ashton Hobbs has done some really nice work with the app.

I agree about the potential for SERIALIZABLE causing throughput headaches due to the locking overhead.  I would bet it would be especially problematic if you have activity that is hot-spot in nature.  I have used SERIALIZABLE a few times in production systems to great effect but never in an intensive application such as this.

As to the CLR I wonder about two things:  Did you use built in string functions or code your own?  and two, did you try 'chunking' out pieces of the varchar(max) into smaller strings for manipulation to cut the overhead down and speed things along?  

December 18, 2007 9:15 PM

Paul Nielsen said:

Adam, Yes, the [Update @@RowCount] was my first inclination and I still like that solution (assuming you add a unique index and a try catch), especially because the “happy Path” for this app favors updates. The consensus among the client DBAs was that the If Exists () method was cleaner code, since they have to maintain it, it was rightfully their call.

Kevin Boles:I did think hard about using serializable isolation and while I think it would prevent upsert dups, I was scared of the locking overhead.

NetBlob: The mainframe called a .net middle tier using WCF which called SQL Server using ADO SNAC. No AIX that I know of, but then I try to stay as inside SQL Server as I can.

Linchi: The 35K tps refers to the total number of database transactions which were spread across four servers. I don’t know the I/O details. Sorry. Each server hosts 1 database of about 100 Gb, so the database is small, it was the transaction rate that made this app fun. TempDB is at about 20Gb.

Gany: The CLR code is a very simple - just a .split. I used Erland’s T-SQL function, it was the best I could find.

JJ B: (Hi JJ!) When I was using 2 scripts on my notebook in my own environment modifications and testing was very smooth and effortless. Storing each object in a separate file in Visual Source Safe added source code integrity, but significantly slowed my development time and testing time.

CK: all the logic for the upsert is inside the stored procedure.

December 18, 2007 10:49 PM

Paul Nielsen said:

Sean: Another consultant used a MSFT tool to test throughput on the hardware prior to installing SQL Server. I think it was called the SQLIOSimulator.

For my benchmarking as I tested the code and tuned the queries and indexes, I used the trusty SQL Trace and Profiler. I absolutely love the simplicity and detail of Trace/Profiler. Btw, if you want to get cozy with Profiler, there’s a good chapter on it in Kalen’s new book. Huh Adam?

December 18, 2007 10:52 PM

Paul Nielsen said:

Hi Homer, I don’t expect my family to read my blog, so to save you the trouble, if I had to reduce it from 10 lessons to 2 or 3, I’d have to say:  

1)  Just like developing a normal sized database, a good schema/set-based code/indexing/API is still the foundation. But at that transaction rate, the game becomes all about locks.

2)  Agile Iterative development rocks!

December 18, 2007 10:56 PM

Paul Nielsen said:

Hi Kevin, I’m sure the Apex tool is great if it’s a nice as the other tools from them that I’ve tried. But I teach and write about the Microsoft UI and to be honest, my brain isn’t big enough to keep up multiple UIs, so I just use Management  Studio.

In the CLR code I just used a one line .spilt method

December 18, 2007 11:09 PM

Paul Nielsen said:


I never ever use the graphical tools for schema design or changes. I only write scripts. When I can I script the database, tables, triggers, and procs in one script, and sample data and unit test proc EXECs in another script. Sometimes I split the test data and proc test calls into two scripts.

But as soon as there’s a database, this no longer works and every change has to be scripted as an alter command. So in Source safe, there will be the original create scripts and every alter script.

December 18, 2007 11:13 PM

Denis Gobo said:

A year in review, The 21 + 1 best blog posts on SQLBlog Best posts according to me, it might have been

December 27, 2007 4:11 PM

Jared said:

I especially liked Lesson 3.

All developers should perform the exercise of writing code for denormalized vs. normalized schemas.

Normalized always wins on DML.

January 9, 2008 5:05 PM

dsgfdsg said:


January 12, 2008 2:43 PM

Alex said:

Thanks for great article Paul.

Any lessons about index clustering in such busy environment?

October 9, 2008 4:18 PM

rupesh said:

notes on data bace modle

February 3, 2009 12:00 AM

Todd said:


I realize this is an old session, but I hope you still monitor it, as I could really use your advice.

I am embarking upon a new venture that looks to be quite promising but will require a highly transactional database design. I'd like to get your thoughts on some things.  Please contact me via email when you get a chance and I will explain (



August 16, 2011 4:43 PM

Andy Welch said:

I'm surprised by Lesson 4. Surprised you had to learn this...seems at odds with the rest of your experience?

I'd argue this is fairly fundamental knowledge. A upsert using IF NOT EXISTS simply cannot work without the unique index in a multi-user system - prior to the release of an atomic upsert feature - as in the move from SQL 2005 - SQL 2008.

cheers for the article though


July 9, 2013 7:36 PM

Jeff said:

Question Paul, you said you switched from iSCSI to a Hatachi SAN, but isn't iSCSI just how you connect your drives? Would you please explain more what you did before/after and if you can share any of the SAN configuration. Thanks!

October 21, 2014 4:10 PM

Denver Naidoo said:

I'm impressed at 35 TPS in 2007. Nice one Paul Nielsen.

April 13, 2015 6:01 AM
New Comments to this post are disabled

About Paul Nielsen

Paul Nielsen believes SQL is the romance language of data. As such he’s a hands-on database developer, Microsoft SQL Server MVP, trainer, and author of SQL Server Bible series (Wiley). As a data architect, he developed the concepts of Smart Database Design and Nordic – an open source O/R dbms for SQL Server. He lives in Colorado Springs.

This Blog



news item test
Privacy Statement