Whether interaction with customers in an online shop, access to resources on a website or the internal use of software applications: databases are a mainstay of numerous business processes. Even small problems at the database level can result in slower processes or even downtime. According to Gartner, the average cost of IT downtime is $5,600 per minute. Although the number depends significantly on the size and type of business, even small businesses can see costs of up to $100,000 per hour of downtime, while for larger businesses the costs can rise to over $1 million per hour. Slowed processes, on the other hand, cause less damage – it is estimated that about one fifth of the hourly downtime costs – but they occur ten times more often. These costs can add up very quickly, so being able to identify such events early and quickly, determine their cause, resolve them and prevent them in the future is of tremendous value to any business.
Databases are associated with a variety of other processes that can affect performance. Companies therefore use various tools and methods to identify problems that can impair or cause database performance. These may appear good enough for day-to-day operations, but when used in the wrong way, even more effort and expense can result. It is therefore worthwhile for companies to take a critical look at their monitoring solutions:
Application Performance Management (APM) Tools
Today, there are a plethora of tools on the market that allow easy, good-quality monitoring of application performance and provide a high-level view of the health of IT environments. As the majority of users state, while APM tools are able to point in the right direction, they fail to identify root causes of problems on the data platform. It is therefore necessary to collect additional manual data in order to troubleshoot and permanently solve problems.
Because APM tools sometimes don’t reach the required depth of database analysis to eliminate the underlying performance problems, root cause analysis usually takes longer and makes long-term optimization more difficult.
Custom Scripts
Experienced database administrators (DBAs) tend to have a collection of custom scripts that they either researched online or created themselves. These scripts are typically used to complement other tools such as APM, fill in missing features, or solve ad hoc problems.
If the performance monitoring is primarily done by a script library, this can still have limitations. Because they can only be used to create a complete picture of the IT environment in exceptional cases. Many scripts are only developed for a specific issue and are redundant after just one use. Those that offer longer-term value, on the other hand, are often difficult to maintain. As the environment grows and new technologies are used, maintaining the scripts can quickly become a full-time job. Given the low probability that the scripts have the required granularity and/or provide significant historical detail to be able to get to the root of the problem, this is an unnecessarily high effort.
SQL Server Wait Statistics
A ‘resource wait time’ is accumulated by processes running on SQL Server waiting for a specific resource to become available. Wait statistics accordingly show where critical bottlenecks are building up within SQL Server. Some IT pros may be tempted to simply focus on the wait statistics to understand how their databases are performing. However, this can lead to completely wrong conclusions, especially at the query level. It’s a bit like just looking at a single car in a traffic jam. The car moves and therefore runs perfectly. What escapes this view, however, is the truck ahead trying to turn around.
Nevertheless, they are at least a very good starting point to get a feel for the performance profile of the server and to find out where problems can occur.
Data Performance Monitoring (DPM) Tools
At first glance, DPM tools seem to be the most effective way to monitor database performance, after all, that’s exactly what they’re designed for. But they also bring limitations if you don’t know how to use their potential to the maximum.
For example, a problem with DPM tools can be too few details, especially with counter metrics such as CPU or IO. Some products and even self-developed solutions capture snapshots of this data only once every several minutes in order not to overload the corresponding server. Other limitations concern query-level detail: most often only the top N queries are collected or displayed, regardless of the level of activity on the server. Or one focuses on queries based on their own wait times as opposed to the actual resource consumption of the request, where the root cause is much more likely to be identified.
Scalability is also often a challenge. Most DPM tools limit the number of servers that can be monitored with one product installation. However, the fact that they are all backed by a SQL Server database to store all the data collected creates a bottleneck. For this reason, these products struggle from around 200-300 monitored SQL servers. For larger companies, it may therefore be necessary to implement multiple installations to cover all servers. Some DPM products handle this by supporting multiple back-end databases from a single interface, although this involves significant cost and management overhead.
Therefore, it is important to ensure that the tools used are running optimally and bringing maximum benefit. Essentially, companies can use the following criteria to assess whether their database monitoring solution is sufficiently effective:
- Does the solution provide enough detail and accuracy to quickly resolve and prevent problems?
- Can it scale with expected data growth?
- Does the solution work in all environments where it is needed?
- Does it contribute satisfactorily to solving the problem – or does it possibly evoke more than it is intended to solve?
- What about the support? Is there support from experienced engineers in case something goes wrong?
Critically evaluating existing solutions at regular intervals helps to develop effective and efficient processes. In particular, it prevents the phenomenon in which the solutions used appear to be good enough, but ultimately cause unnecessary costs through slowdowns or failures.
Also Read: Learn To Use The New Low-cost Communication Tools In Your Business