<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'Performance', 'hyperthreading', and 'HT'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=Performance,hyperthreading,HT&amp;orTags=0</link><description>Search results matching tags 'Performance', 'hyperthreading', and 'HT'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Performance impact: hyperthreading for reporting queries</title><link>http://sqlblog.com/blogs/linchi_shea/archive/2012/01/05/performance-impact-hyperthreading-for-reporting-queries.aspx</link><pubDate>Thu, 05 Jan 2012 05:36:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:40817</guid><dc:creator>Linchi Shea</dc:creator><description>&lt;P&gt;There are a lot of questions on hyperthreading, but not a lot of answers.&amp;nbsp; There is no shortage of opinions, but very few are based on significant first hand experience or solid test data. &lt;/P&gt;
&lt;P&gt;We know that the hyperthreading technology in the older generations of the Intel processors was not well received by the SQL Server community. Hyperthreading in the newer generations of the Intel’s Nehalem-based Xeon processors, however, is decidedly better implemented, and it appears to be much better received by the SQL Server community.&lt;/P&gt;
&lt;P&gt;I have done some tests with the Intel Westmere-EX processors, exploring how its hyperthreading feature may influence various types of SQL Server workloads. And my experience from these tests is generally positive. In fact, I have not seen any significantly negative performance impact that could be directly attributed to enabling hyperthreading. I plan to post some of my empirical observations. The intent is not to settle any issues. Rather in keeping with my own tradition, it is to contribute some data points to this very important subject.&lt;/P&gt;
&lt;P&gt;In this post, I focus on the performance impact of enabling hyperthreading on reporting queries. Since there is a large number of variables that could heavily influence the results of any hyperthreading related test, quite a bit of attention is paid to ensure repeatability of the test results. &lt;/P&gt;
&lt;P&gt;The tests are set up as follows:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The tests use a single common query (shown below). The focus is not on the single query performance, but how streams of this query perform.&lt;/LI&gt;
&lt;LI&gt;There are two key test variables: &lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;&lt;STRONG&gt;Degree of parallelism&lt;/STRONG&gt;&lt;/EM&gt;. It is controlled with the option (maxdop) clause appended to the query.&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;&lt;STRONG&gt;Number of concurrent query streams&lt;/STRONG&gt;&lt;/EM&gt;.&amp;nbsp; It is controlled through a C# program by spawning as many threads, each of which drives the query in a serial loop for a configurable amount of time.&lt;/LI&gt;&lt;/UL&gt;
&lt;LI&gt;There are three hyperthreading (HT) test configurations:&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;&lt;STRONG&gt;40 core with HT&lt;/STRONG&gt;&lt;/EM&gt;. That is 80 logical cpus. HT is enabled in the BIOS.&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;&lt;STRONG&gt;40 cores without HT&lt;/STRONG&gt;&lt;/EM&gt;. That is 40 logical cpus. HT is disabled in the BIOS.&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;&lt;STRONG&gt;20 cores with HT&lt;/STRONG&gt;&lt;/EM&gt;. That is 40 logical cpus. HT is enbaled in the BIOS.&lt;/LI&gt;&lt;/UL&gt;
&lt;LI&gt;The query execution plans are logged by the test driver and verified to be constant across each test run and each query execution.&lt;/LI&gt;
&lt;LI&gt;The performance statistics of each query execution, regardless of its maxdop setting or which stream it is in, is captured by the test driver and loaded into a table for analysis and overall metric calculation.&lt;/LI&gt;
&lt;LI&gt;The test database consists of a 200-million-row main table tran_stats, exactly as described in &lt;A href="http://sqlblog.com/blogs/linchi_shea/archive/2011/12/15/performance-impact-cartesian-product-comes-to-the-rescue.aspx"&gt;this previous post&lt;/A&gt;. The SQL scripts for creating and populating this table are in the attached zip file. The test database also includes a dimension table which basically is the #tmp table in &lt;A href="http://sqlblog.com/blogs/linchi_shea/archive/2011/12/15/performance-impact-cartesian-product-comes-to-the-rescue.aspx"&gt;this same previous post.&lt;/A&gt; The scripts to create and populate this dimension table are also included in the attached zip file.&lt;/LI&gt;
&lt;LI&gt;The test results are summarized with a query processing throughput metric, &lt;U&gt;Queries per Hour&lt;/U&gt;, representing the number of queries processed in an hour for the given test configuration.&lt;/LI&gt;
&lt;LI&gt;All tests are run while the data are fully cached. The storage subsystem is not a factor in these tests.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Essentially, with this test workload we want to see how it behaves when we increase the load level by increasing the number of concurrent query streams for a given maxdop setting, with and without hyperthreading on a DL580 G7 with four E7-4870 (10 core) processors and 264GB of RAM. The OS is Windows 2008 R2 Enterprise x64 Edition with SP2 and DBMS is SQL Server 2008 R2 Enterprise x64 Edition (10.50.1600).&lt;/P&gt;
&lt;P&gt;The main test query (without the maxdop option clause) is as follows:&lt;/P&gt;
&lt;P&gt;&lt;FONT size=2 face="Courier New"&gt;select COUNT(*) from ( &lt;BR&gt;&amp;nbsp;&amp;nbsp; select id_server_config,id_thread_name,id_tran_type,id_tran_status,id_users,id_cpus, &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; COUNT(*) as cnt &lt;BR&gt;&amp;nbsp; from tran_stats t1 join dimension t2 &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; on t1.server_config = t2.server_config and &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t1.thread_name = t2.thread_name and &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t1.tran_type&amp;nbsp;&amp;nbsp; = t2.tran_type and&amp;nbsp; &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t1.tran_status = t2.tran_status and &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t1.users&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = t2.users and &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t1.cpus&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = t2.cpus &lt;BR&gt;group by id_server_config,id_thread_name,id_tran_type,id_tran_status,id_users,id_cpus &lt;BR&gt;) as v&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;I pick this particular query because it is a rather plain reporting query. Nothing exotic and something you’d see often in the real world. I also happen to be playing with it recently for other purposes.&lt;/P&gt;
&lt;P&gt;By the way, in the spirit of full disclosure, if there is interest in the actual test driver program, I can post it here.&lt;/P&gt;
&lt;P&gt;So, how do the results look like?&lt;/P&gt;
&lt;P&gt;&lt;FONT size=3&gt;&lt;STRONG&gt;40 cores with HT vs. 40 cores without HT&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;For maxdop = 2, the query throughput results are summarized in the following chart.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/linchi_shea/maxdop2_40cores_HT_vs_40cores_4C196FC8.png"&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=maxdop2_40cores_HT_vs_40cores border=0 alt=maxdop2_40cores_HT_vs_40cores src="http://sqlblog.com/blogs/linchi_shea/maxdop2_40cores_HT_vs_40cores_thumb_4AD7679A.png" width=521 height=377&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;Most notably, when the load level is stressed with 64 concurrent streams, the system manages to push through ~478 queries per hour with HT. Compare that to ~374 queries per hour without HT at the same load level. That is about 28% increase in the processing power (for this specific configuration, always remember that!). That’s a substantial improvement.&lt;/P&gt;
&lt;P&gt;Also note that for streams 1 through 16, there is little to no difference whether or not HT is enabled. That is expected. After all, if the queries are not competing for processors, whether you have 40 or 80 processors should not make a big difference. This also means that if you just run your test with a single query, you probably won’t see any difference, one way or another. And if you draw a conclusion based on that single query result, the conclusion is likely to be misleading.&lt;/P&gt;
&lt;P&gt;The next three charts show the results for maxdop=4, 8, and 20, respectively. The results look similar to that for maxdop=2 in terms of the general trend and pattern, though the extent of performance improvement varies. The improvement is generally between 8% and 25%.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/linchi_shea/maxdop4_40cores_HT_vs_40cores_0D4A09D1.png"&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=maxdop4_40cores_HT_vs_40cores border=0 alt=maxdop4_40cores_HT_vs_40cores src="http://sqlblog.com/blogs/linchi_shea/maxdop4_40cores_HT_vs_40cores_thumb_08D3890A.png" width=508 height=367&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/linchi_shea/maxdop8_40cores_HT_vs_40cores_3654A8CD.png"&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=maxdop8_40cores_HT_vs_40cores border=0 alt=maxdop8_40cores_HT_vs_40cores src="http://sqlblog.com/blogs/linchi_shea/maxdop8_40cores_HT_vs_40cores_thumb_03F0D54E.png" width=508 height=367&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/linchi_shea/maxdop20_40cores_HT_vs_40cores_785B180E.png"&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=maxdop20_40cores_HT_vs_40cores border=0 alt=maxdop20_40cores_HT_vs_40cores src="http://sqlblog.com/blogs/linchi_shea/maxdop20_40cores_HT_vs_40cores_thumb_45F7448F.png" width=510 height=369&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;&lt;FONT size=3&gt;&lt;STRONG&gt;40 cores without HT vs. 20 cores with HT&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;In this case, they both appear as 40 logical processors at the OS level and to the SQL Server instance except that one has 40 cores and the other has 20 cores. So which configuration should have better performance capacity?&lt;/P&gt;
&lt;P&gt;You probably would pick 40 cores without HT. And you’d be correct. The following charts are the supporting evidence.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/linchi_shea/maxdop2_40cores_vs_20cores_HT_5E86E1DF.png"&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=maxdop2_40cores_vs_20cores_HT border=0 alt=maxdop2_40cores_vs_20cores_HT src="http://sqlblog.com/blogs/linchi_shea/maxdop2_40cores_vs_20cores_HT_thumb_2C230E60.png" width=513 height=371&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/linchi_shea/maxdop4_40cores_vs_20cores_HT_1C83034F.png"&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=maxdop4_40cores_vs_20cores_HT border=0 alt=maxdop4_40cores_vs_20cores_HT src="http://sqlblog.com/blogs/linchi_shea/maxdop4_40cores_vs_20cores_HT_thumb_552DAD5C.png" width=514 height=372&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/linchi_shea/maxdop8_40cores_vs_20cores_HT_02AECD20.png"&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=maxdop8_40cores_vs_20cores_HT border=0 alt=maxdop8_40cores_vs_20cores_HT src="http://sqlblog.com/blogs/linchi_shea/maxdop8_40cores_vs_20cores_HT_thumb_504AF9A0.png" width=513 height=371&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/linchi_shea/maxdop20_40cores_vs_20cores_HT_2BB96C1C.png"&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=maxdop20_40cores_vs_20cores_HT border=0 alt=maxdop20_40cores_vs_20cores_HT src="http://sqlblog.com/blogs/linchi_shea/maxdop20_40cores_vs_20cores_HT_thumb_7955989C.png" width=513 height=371&gt;&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;When pushed, 40 cores without HT outperforms 20 cores with HT substantially by as much as ~48%.&lt;/P&gt;
&lt;P&gt;So overall in these tests, the performance impact of hyperthreading on the query throughput is positive, and in some cases very substantial.&lt;/P&gt;
&lt;P&gt;To reiterate, I’d be careful--if at all--extrapolating the observations here. Hopefully, though, as the evidence accumulates from the community, we can have a better comfort level in how we use this feature effectively.&lt;/P&gt;</description></item></channel></rss>