Performance troubleshooting on Microsoft SQL Server

Oct. 6, 2017 via ADMIN Magazine

Use Microsoft SQL Server diagnostic tools to troubleshoot bottlenecks and other performance problems. If Microsoft SQL Server in your IT center supports infrastructure services such as virtualization and IP address management (IPAM), as well as databases for applications and business processes, a performance bottleneck in the SQL Server area is almost always a critical issue. For one thing, delays to data queries can be very specifically translated into dollars and cents from the management perspective. Moreover, all areas of IT – server, storage, network, and applications themselves – are involved in accessing SQL Server. As the person responsible for the affected SQL Server, you would thus do well to be proactive from the outset by identifying any problems as quickly as possible and helping others understand any difficulties. However, before you pounce on SQL Server, make sure the problem lies within your sphere of influence. A typical SQL-based application has several layers – the application itself, with the application layer and the client front end – each of which can be the source of poor performance. A further sticking point could be the network between client and server. If the SQL queries provide a large volume of data, transmitting the data to the client could be a cause of the perceived delay – even if SQL Server has processed the results quickly. The database also can be the cause of poor performance. If you have a table with many records, incorrect or missing indexing cannot possibly be compensated for by the available hardware resources and will affect the performance of the selection queries. SQL Server makes very complex use of the hardware resources provided to it, and configuration errors could easily occur. Also, a typical SQL Server is likely to provide databases for numerous applications in several instances. What is rarely tested before rolling out a new application or a major upgrade of an existing application is whether the use of resources still leaves enough room to maneuver. For further diagnostics, it is important to find out where the performance bottleneck originated. Thus, you should first ask the appropriate department where the information causing the suspected performance bottleneck comes from. If you only have one user complaining about a "slow database server," it is quite possible that the application they use has a performance problem that is not related to SQL Server, or that the application is simply badly programmed. Tools for Troubleshooting The most important tools to use to diagnose problems in the SQL Server environment are provided by SQL Server and Windows themselves. For example, SQL Management Studio provides access to the server's system tables and logs, as well as diagnostic tools such as SQL Activity Monitor, SQL Server Profiler, and Database Tuning Advisor. PerfMon provides information about the performance behavior of various SQL Server and Windows components. If you need to generate data collector sets and evaluate PerfMon results, the free PAL (Performance Analysis for Logs) proves to be an excellent choice [1] . Windows Task Manager also provides a valuable service, although the Process Explorer by Microsoft Sysinternals [2] is considerably more powerful. If a secondary, network-specific diagnosis is required, Wireshark or Microsoft Message Analyzer (formerly NetMon) jump into the breach. DiskPart and FSUtil give you the lowdown on some typical misconfigurations in the disk subsystem. If the affected SQL Servers are virtualized, you need a good basic understanding of how resources are allocated on the hypervisor. In this case, you also need tools to measure the performance data from there. Depending on the direction in which your troubleshooting takes you, it might also be necessary to visualize the results (e.g., to present to other teams or the management). Spreadsheet programs such as Microsoft Excel are well suited to visualizing tabular data, but you can also import the data into SQL Server and access Reporting Services or even Microsoft Power BI. If the results of your diagnosis force you and your database developers to optimize the database schema and queries, I recommend the popular Unified Plan Explorer [3] tool, which the creator, SQL Sentry, now offers free of charge. Determine the Baseline, Identify Bottlenecks To begin, it is a good idea to know the performance behavior of your SQL Server. Record a baseline for the key resources (CPU, RAM, network, hard disk) – that is, a performance profile of the server in a state considered to be normal in ongoing operations. You can use PerfMon for this step, or you could use your monitoring system, assuming that it is designed for long-term storage of performance indicators. If you are then confronted with questions about possible changes in SQL Server's performance in the future, you can quickly compare the new data with the historic data. You can quickly examine the behavior of an interactive application with the help of SQL Server Profiler. To begin, make sure that the time on the user's workstation and on SQL Server are in sync to the extent possible. Launch Profiler and select the events to be tracked so that TSQL events are recorded. If you know that the application in question tends to work with stored procedures rather than SQL instructions, add these events, too. Filter the trace when recording, preferably by user, workstation, database, or application if these parameters are unambiguously known. Let the user perform the action that seems slow to them, and make a precise note of the timing from the user's perspective. In the Profiler trace, you will be able to identify clearly when the user input raises a query on SQL Server and how long it takes from the perspective of SQL Server. If a considerable amount of time occurs between user input and the SQL query (or between the end of the SQL query and the output in the application), the problem, at least partly, lies with the application, which needs to be addressed by its developers. The same thing applies if you notice in Profiler that the submitted queries cause errors on SQL Server. One more question needs to be clarified as soon as possible: Does the observed performance bottleneck only occur when reading from the database? Only when writing? Or for any kind of access? This information will be enormously helpful in the further diagnosis. Slow SQL Server If the bottleneck is SQL Server, then you need to draw up a plan for further diagnosis. SQL Server uses four types of resources: CPU, RAM, storage, and network. The following sequence is recommended for performance troubleshooting: Network: Problems with the network can also cause bottlenecks in the CPU or RAM. CPU. RAM: Not enough RAM will almost always lead to increased disk I/O levels. Disk: If you have ruled out the other factors as the cause, take a closer look at the disks. A bottleneck in one resource does not automatically mean that the problem can be resolved by providing more of that resource. Before checking anything else, make sure SQL Server has no obvious problems or misconfigurations. The following list can serve as a starting point, and you can add your own known incidents: Status of hardware, virtualization, and operating system (OS): Do suspicious entries exist in the Event Viewer? For virtualized servers, is the CPU, RAM, or both oversubscribed on the hypervisor? Configuration and status of network connections: Check the teaming, speed, and flow control. Has a tempdb database filled the system logs, disk logs, or hard disk? Have you defined and enabled exclusions from the antivirus scan [4] ? Is the maximum available RAM for the SQL instance (or the sum total of all instances in case of several instances) configured such that the OS still has enough RAM left? For Windows Server 2012, the OS should have at least 2GB of RAM available, or 4GB on a physical system with a large amount of RAM. In Hyper-V virtualization does the dynamic memory configuration comply with best practices [5] ? Have you taken into account and checked the alignment of disk volumes [6] [7] ? Are the volumes formatted with a 64KB file allocation unit? Does the distribution of the database files (on SQL Server 2016 and for tempdb instances) correspond to the performance profile of the disk volumes? Are full backups and database or index reorganizations running at peak times? If the default CPU configuration of SQL Server has been changed, does the currently active configuration still match the number of instances and their use? If you have eliminated all possible sources of error, but the observed problem still persists, you need to perform more in-depth performance troubleshooting. 1 2 Next» var a2a_config = a2a_config || {}; a2a_config.linkname = "Performance troubleshooting on Microsoft SQL Server"; a2a_config.linkurl = "http://www.admin-magazine.com/Archive/2017/40/Performance-troubleshooting-on-Microsoft-SQL-Server"; Buy this article as PDF Express-Checkout as PDF Price $2.95 (incl. VAT) Buy ADMIN Magazine SINGLE ISSUES Print Issues Digital Issues SUBSCRIPTIONS Print Subs Digisubs TABLET & SMARTPHONE APPS US / Canada UK / AustraliaEvaluating Performance Details You should gain a quick overview of the server's resource consumption and compare it with the baseline, assuming you have one. To do so, launch PerfMon and add the following performance counters to the real-time display (Figure 1 ): Memory Available MBytes and Page Faults/sec Network Adapter Bytes Sent/sec and Bytes Received/sec Physical Disk: Avg. Disk Queue Length for each volume Processor Information: % User time and % Processor Time System: Processor Queue Length TCPv4: Segments/sec and Segments Retransmitted/sec TCPv6: The same as TCPv4 if TCPv6 is used in production. Now is the time to find out which version is used – some systems automatically agree to use TCPv6 if it is technically possible. Figure 1: A small selection of counters in PerfMon allows a quick overview. Even better, define a data collector with these values to log data over a slightly longer period of time. Watch out for the following phenomena: Network: Bytes Sent/sec and Bytes Received/sec approaching the available bandwidth. Keep in mind that the bandwidth is specified in bits per second. Thus, for a Gigabit Ethernet interface, the critical value is 131,072 bytes per second. If the measured values are permanently close to the upper limit, then the diagnosis is available in the network bandwidth, but it is also critical to look at the size of the returned results. Network/TCP: Retransmitted segments make up too large a proportion of the total transferred segments on your network. Typical values fluctuate between 0.5 and 2 percent. Ask your network specialists to be sure. CPU: The utilization of CPU on a single physical server that approaches the 100 percent mark is not fundamentally worrying, but if you see the CPU queue length permanently exceeding the limit of 1, you should be worried. In virtualized environments, this value can also be distorted by CPU scheduling on the hypervisor. RAM: A system with no free RAM is not good. If the Available MBytes value is clearly and permanently less than 256, action is required. You can also check the rate of Page Faults/sec , which then shoots up. Because this metric has no generally accepted thresholds, you should compare it with your own baseline, if possible. Physical disk queue length: If this classic metric is permanently greater than 1, you definitely have a problem, even though you cannot be sure at this juncture where the currently observed performance bottleneck originates. Observe performance for a while and remember the troubleshooting sequence "network, CPU, RAM, disk" to help you to exclude normal parameters at an early stage and concentrate on what is essential. Conclusions Armed with the right tools, you can identify and resolve performance issues accurately with SQL Server. In the second article of this series, I will take a look at the network, RAM usage, and the hard disk drives. Finally, I will look into visualizing and evaluating the results. Infos PAL on GitHub: https://github.com/clinthuffman/PAL Process Explorer: https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer Plan Explorer: https://www.sentryone.com/plan-explorer Antivirus exclusions: https://blog.sqlauthority.com/2015/05/19/sql-server-antivirus-exclusions-best-practices-with-sql-server/ Dynamic memory: https://msdn.microsoft.com/en-us/library/hh372970.aspx MSDN: Partition alignment: https://blogs.msdn.microsoft.com/jimmymay/2014/03/14/disk-partition-alignment-it-still-matters-dpa-for-windows-server-2012-sql-server-2012-and-sql-server-2014/ TechNet: Partition alignment: https://technet.microsoft.com/en-us/library/dd758814.aspx «Previous 1 2 var a2a_config = a2a_config || {}; a2a_config.linkname = "Performance troubleshooting on Microsoft SQL Server"; a2a_config.linkurl = "http://www.admin-magazine.com/Archive/2017/40/Performance-troubleshooting-on-Microsoft-SQL-Server"; Buy this article as PDF Express-Checkout as PDF Price $2.95 (incl. VAT) Buy ADMIN Magazine SINGLE ISSUES Print Issues Digital Issues SUBSCRIPTIONS Print Subs Digisubs TABLET & SMARTPHONE APPS US / Canada UK / Australia

http://www.admin-magazine.com/Archive/2017/40/Performance-troubleshooting-on-Microsoft-SQL-Server...