Who Has Busy Files? Identifying Biggest Performance Users and Bottlenecks (Part 3)

FEATURED AUTHOR
Merrill Aldrichshadow
Merrill Aldrich

This is Part 3 of a guest blogging series by longtime DBA, blogger, and all-around SQL Server enthusiast Merrill Aldrich. (Part 1 of this series is available here, and Part 2 is available here.) You can find Merrill on Twitter as @onupdatecascade.

This is the third part of a blog series here at Data Education about how to use out-of-the-box SQL Server features to track the IO demands of all your SQL Server database files.

In this installment, we are going to enhance the SSIS package we left off with last time to enable looping over all the SQL Server instances in your environment and collecting data from each. We will also add a rollup at the end to aggregate the low-level counter information into minimums, maximums and averages over time. The resulting package will look like this:

Iterating Instances in SQL Server

Step 1: Add a Loop Container for the Data Collection

The next concept we’ll tackle is to loop over a list of SQL instances to do the data collection from multiple servers. To create a loop, we’ll use the Integration Services Foreach Loop Container. A loop container is just what its name implies: It’s a container into which you can place tasks, and it will repeat those tasks in a loop based on some rules.

From the toolbox, drag a Foreach Loop Container onto an empty area of the Control Flow surface. Expand it so that there’s enough room inside, then select all of the tasks we created in the last installment, at the same time, and drag them inside. Set the name of the container to “Foreach Instance.” You should end up with this:

Foreach Container

A Foreach Loop can iterate over several different kinds of collections. What we want to do is get a list of servers and databases from a table, and then run this process once for each row from that table. Getting SSIS to do that is slightly counter-intuitive, so please bear with me here as I try to explain. We need to:

  1. Query a central table for a list of monitored SQL Server instances and admin database names.
  2. Store that list in memory in SSIS somewhere. We’ll use a Recordset for that, and the whole Recordset will be stored in a single variable as an object.
  3. For each (note that key word) row in that in-memory Recordset, we want to pull out the name of the SQL instance, and the name of the admin database. For these two names we’ll use two additional variables.
  4. In the loop, we want to use those two names to set the connection manager to connect to that combination of SQL instance and admin database. To get this changing connection behavior, we will use expressions in the connection manager, which will change its database connection string for each iteration of the loop.

So, three variables: one for the RecordSet, one for the current SQL Server instance, and one for the current admin database to collect from. Those are the three variables we created in the last blog installment: InstanceList (of type Object, so it can store a Recordset), CurrentSourceServer and CurrentSourceDB (each type String). The Foreach Loop will run once for each row in InstanceList and each pass will set CurrentSourceServer and CurrentSourceDB to the values from the current row.

Step 2: Create a Database Table for your Instance List

In your central admin database, first make a table to store the instance list:

CREATE TABLE dbo.AdminDBs(
    ServerName sysname NOT NULL,
    DBName sysname NOT NULL
)

Then populate that table with one or two test SQL instances and admin database names. (Tip: Things are not likely to work the first time through; pros use test instances!) Instance and database names should be exactly in the form you would use to connect to them.

Step 3: Add a Dataflow to Load the Instance List

Back in SSIS, add a dataflow at the top of the package, and name it “Get SQL Instance List”:

Get SQL Instance List

Open that new data flow. Create a data source that pulls straight from the table in Step 2, and then create a Recordset Destination to receive the results:

Get Instance List Dataflow

The Recordset Destination will want you to set a variable to store the Recordset. Edit it, and choose the User::InstanceList variable for the VariableName property. The result is that the Data Source will produce rows, and the rows will be placed into an in-memory Recordset, and the Recordset will in turn be stored as an object in User::InstanceList:

Recordset Editor

Next, in the same dialog box, look at the Input Columns tab, and make sure the two columns from the source table are selected:

Recordset Editor

Let’s take a moment to recap: We have a table with SQL Instance and admin database names. We have a dataflow “Get SQL Instance List,” and in that dataflow the data source pulls all the rows from that table. We then have a Recordset destination that receives those rows, and assigns the Recordset to the variable User::InstanceList in Integration Services.

Step 4: Drive the Foreach Loop Using the Recordset

Return to the Control Flow, and now let’s plug that Foreach Loop into the Recordset. Edit the Foreach Loop, and look at the Collection panel.

  1. Set the Enumerator to Foreach ADO Enumerator, which, loosely, is SSIS speak for “Use a Recordset.”
  2. Under ADO object source variable, choose the variable that will contain the Recordset:

Foreach Loop Editor

Setting the ADO object source variable will cause the Foreach Loop to iterate over the rows in the Recordset and execute once for each row. However, we need an additional step to actually fetch the data from inside those rows to change the data connection. On the next panel, Variable Mappings, we can map each column in the Recordset, positionally, to a variable in the package. Set the mapping like this:

Foreach Loop Editor

With the mapping in place, during each iteration of the loop, the values from the current Recordset row will each be loaded into those SSIS variables.

Step 5: Drive the Connection Manager from the Variables

Now that we have those two local variables set up to change from one source server to the next, we need the connection manager to change its connection information too. For that we can tie the CurrentSourceServer and CurrentSourceDB variables to the connection manager properties with expressions. Basically we just need the ServerName and InitialCatalog properties of the connection manager to change to the values of those variables as the loop runs. That will cause the connection manager to connect to the different servers.

Highlight the DBAdmin_Source connection manager and look at the Properties window. Expand the Expressions section, and add an expression for ServerName and an expression for InitialCatalog, as:

Connection Manager Properties

Step 6: Test

That should get the package iterating over a collection of instances. At this point it probably makes sense to try it, watch, and see what you get. We’ve made a lot of changes, so it might take some trial and error to get everything running smoothly.

Step 7: Add a Rollup

The raw IO counter information may be useful for a short time, but over a longer period it can grow quite large, and it probably contains too much detail to be worth saving at its original granularity. The last step will be to aggregate it together into a summary table, preserving max, min, and average values for longer time periods. I have an example here that uses a 15-minute interval, but this process could be adapted to longer blocks of time—whatever level of detail makes sense for your environment.

At a high level, we want to:

  1. Compute max, min, and average values for the counters for all samples, for all physical files.
  2. Group all that by each 15-minute block of time.
  3. Merge the results into a summary table containing the existing computed aggregates from the past.

Keep that basic process in mind here because the aggregate query is a bit of a beast.

First, in my implementation I have two tables: one to store a list of all the physical files from all the instances, and a second to store the aggregated performance data:

CREATE TABLE dbo.AllFiles(
      ServerName nvarchar(128) NOT NULL,
      Physical_Name nvarchar(260) NOT NULL,
    CONSTRAINT PK_AllFiles PRIMARY KEY CLUSTERED 
    (
      ServerName ASC,
      Physical_Name ASC
    )
) ;

CREATE TABLE dbo.TrackFileIO_15MinAgg(
      AsOfDate datetime NOT NULL,
      ServerName nvarchar(128) NOT NULL,
      database_name sysname NOT NULL,
      physical_name nvarchar(260) NOT NULL,
      sample_duration_sec decimal(4, 2) NOT NULL,
      number_of_samples int NOT NULL,
      max_num_of_reads bigint NOT NULL,
      min_num_of_reads bigint NOT NULL,
      avg_num_of_reads bigint NOT NULL,
      max_avg_num_of_reads_per_sec bigint NOT NULL,
      min_avg_num_of_reads_per_sec bigint NOT NULL,
      avg_avg_num_of_reads_per_sec bigint NOT NULL,
      max_num_of_KB_read bigint NOT NULL,
      min_num_of_KB_read bigint NOT NULL,
      avg_num_of_KB_read bigint NOT NULL,
      max_avg_num_of_KB_read_per_sec bigint NOT NULL,
      min_avg_num_of_KB_read_per_sec bigint NOT NULL,
      avg_avg_num_of_KB_read_per_sec bigint NOT NULL,
      max_io_stall_read_ms bigint NOT NULL,
      min_io_stall_read_ms bigint NOT NULL,
      avg_io_stall_read_ms bigint NOT NULL,
      max_avg_io_stall_read_ms_per_sec bigint NOT NULL,
      min_avg_io_stall_read_ms_per_sec bigint NOT NULL,
      avg_avg_io_stall_read_ms_per_sec bigint NOT NULL,
      max_num_of_writes bigint NOT NULL,
      min_num_of_writes bigint NOT NULL,
      avg_num_of_writes bigint NOT NULL,
      max_avg_num_of_writes_per_sec bigint NOT NULL,
      min_avg_num_of_writes_per_sec bigint NOT NULL,
      avg_avg_num_of_writes_per_sec bigint NOT NULL,
      max_num_of_KB_written bigint NOT NULL,
      min_num_of_KB_written bigint NOT NULL,
      avg_num_of_KB_written bigint NOT NULL,
      max_avg_num_of_KB_written_per_sec bigint NOT NULL,
      min_avg_num_of_KB_written_per_sec bigint NOT NULL,
      avg_avg_num_of_KB_written_per_sec bigint NOT NULL,
      max_io_stall_write_ms bigint NOT NULL,
      min_io_stall_write_ms bigint NOT NULL,
      avg_io_stall_write_ms bigint NOT NULL,
      max_avg_io_stall_write_ms_per_sec bigint NOT NULL,
      min_avg_io_stall_write_ms_per_sec bigint NOT NULL,
      avg_avg_io_stall_write_ms_per_sec bigint NOT NULL,
      max_io_stall bigint NOT NULL,
      min_io_stall bigint NOT NULL,
      avg_io_stall bigint NOT NULL,
      max_avg_io_stall_per_sec bigint NOT NULL,
      min_avg_io_stall_per_sec bigint NOT NULL,
      avg_avg_io_stall_per_sec bigint NOT NULL,
    CONSTRAINT PK_TrackFileIO_15MinAgg PRIMARY KEY CLUSTERED 
    (
      AsOfDate ASC,
      ServerName ASC,
      physical_name ASC
    )
) ;

Next, I have a stored procedure with merge statements that takes the raw counter data, groups on the time interval, and then merges into the summary table. In the text here I am presenting a shortened version, cutting out repetitive column lists and including some explanatory comments. (You can also download the full text of the procedure if you want.)

CREATE PROCEDURE dbo.RollUp_TrackFileIO
AS
BEGIN

WITH    
  -- This table expression creates a "calendar table"-like set of the datetimes
  -- for 15 minute intervals from the past day:
  fifteenMinInterval ( AsOfDate )
          AS ( SELECT   CONVERT( datetime,
                              CONVERT ( char(13), 
                                DATEADD( HOUR, -26, GETDATE() ), 120 ) 
                                + ':00'
                              , 120 )
                              AsOfDate
               UNION ALL
               SELECT   DATEADD( MINUTE, 15, AsOfDate ) AsOfDate
               FROM     fifteenMinInterval
               WHERE    AsOfDate <= GETDATE()
             ) ,
  -- This one computes the max, min and average of the IO counters 
  -- within each of the time blocks defined by the intervals above:
  AggregatedFileIO AS (           
    SELECT  fifteenMinInterval.AsOfDate,
        TrackFileIO.ServerName,
        TrackFileIO.database_name,
        TrackFileIO.physical_name,
        10.0 AS sample_duration_sec, -- This value is a fixed 10 seconds in 
                                     -- the polling jobs that do the data collection
        COUNT(*) AS number_of_samples,
        MAX(num_of_reads) AS max_num_of_reads,
        MIN(num_of_reads) AS min_num_of_reads,
        AVG(num_of_reads) AS avg_num_of_reads,
        MAX(avg_num_of_reads_per_sec) AS max_avg_num_of_reads_per_sec,
        MIN(avg_num_of_reads_per_sec) AS min_avg_num_of_reads_per_sec,
        AVG(avg_num_of_reads_per_sec) AS avg_avg_num_of_reads_per_sec,
        
        /* ... Insert all the other counter columns here 
            following the same pattern ... */
        
        MAX(avg_io_stall_per_sec) AS max_avg_io_stall_per_sec,
        MIN(avg_io_stall_per_sec) AS min_avg_io_stall_per_sec,
        AVG(avg_io_stall_per_sec) AS avg_avg_io_stall_per_sec
        
    FROM    fifteenMinInterval
    INNER JOIN dbo.TrackFileIO 
      ON fifteenMinInterval.AsOfDate >= dbo.TrackFileIO.AsOfDate
                  AND dbo.TrackFileIO.AsOfDate > DATEADD( minute, -15, fifteenMinInterval.AsOfDate )
    WHERE TrackFileIO.AsOfDate >= DATEADD( hour, -24, GETDATE() )
    GROUP BY fifteenMinInterval.AsOfDate,
        TrackFileIO.ServerName,
        TrackFileIO.database_name,
        TrackFileIO.physical_name
        
        ) MERGE INTO dbo.TrackFileIO_15MinAgg AS existing
        USING AggregatedFileIO AS new
        ON new.AsOfDate = existing.AsOfDate
                  AND new.ServerName = existing.ServerName
                  AND new.physical_name = existing.physical_name
            WHEN MATCHED AND                    
                  (   new.number_of_samples != existing.number_of_samples
                        OR new.sample_duration_sec != existing.sample_duration_sec
                        OR new.number_of_samples != existing.number_of_samples
                        OR new.max_num_of_reads != existing.max_num_of_reads
                        OR new.min_num_of_reads != existing.min_num_of_reads
                        OR new.avg_num_of_reads != existing.avg_num_of_reads
                        OR new.max_avg_num_of_reads_per_sec != existing.max_avg_num_of_reads_per_sec
                        OR new.min_avg_num_of_reads_per_sec != existing.min_avg_num_of_reads_per_sec
                        OR new.avg_avg_num_of_reads_per_sec != existing.avg_avg_num_of_reads_per_sec
                        
                        /* ... Insert all the other counter columns here 
                            following the same pattern ... */
            
                        OR new.max_io_stall != existing.max_io_stall
                        OR new.min_io_stall != existing.min_io_stall
                        OR new.avg_io_stall != existing.avg_io_stall
                        OR new.max_avg_io_stall_per_sec != existing.max_avg_io_stall_per_sec
                        OR new.min_avg_io_stall_per_sec != existing.min_avg_io_stall_per_sec
                        OR new.avg_avg_io_stall_per_sec != existing.avg_avg_io_stall_per_sec 
                  ) THEN 
                  UPDATE SET 
                        number_of_samples = new.number_of_samples,
                        max_num_of_reads = new.max_num_of_reads,
                        min_num_of_reads = new.min_num_of_reads,
                        avg_num_of_reads = new.avg_num_of_reads,
                        max_avg_num_of_reads_per_sec = new.max_avg_num_of_reads_per_sec,
                        min_avg_num_of_reads_per_sec = new.min_avg_num_of_reads_per_sec,
                        avg_avg_num_of_reads_per_sec = new.avg_avg_num_of_reads_per_sec,
                        
                        /* ... Insert all the other counter columns here 
                            following the same pattern ... */
            
                        max_avg_io_stall_per_sec = new.max_avg_io_stall_per_sec,
                        min_avg_io_stall_per_sec = new.min_avg_io_stall_per_sec,
                        avg_avg_io_stall_per_sec = new.avg_avg_io_stall_per_sec
                  WHEN NOT MATCHED THEN
                  INSERT (
                        AsOfDate,
                        ServerName,
                        database_name,
                        physical_name,
                        sample_duration_sec,
                        number_of_samples,
                        max_num_of_reads,
                        min_num_of_reads,
                        avg_num_of_reads,
                        
                        /* ... Insert all the other counter columns here 
                            following the same pattern ... */

                        max_avg_io_stall_per_sec,
                        min_avg_io_stall_per_sec,
                        avg_avg_io_stall_per_sec 
                  ) VALUES ( 
                        AsOfDate,
                        ServerName,
                        database_name,
                        physical_name,
                        sample_duration_sec,
                        number_of_samples,
                        max_num_of_reads,
                        min_num_of_reads,
                        avg_num_of_reads,
                        max_avg_num_of_reads_per_sec,
                        min_avg_num_of_reads_per_sec,
                        avg_avg_num_of_reads_per_sec,
                        
                        /* ... Insert all the other counter columns here 
                            following the same pattern ... */
            
                        max_avg_io_stall_per_sec,
                        min_avg_io_stall_per_sec,
                        avg_avg_io_stall_per_sec
         )              
      OPTION  ( MAXRECURSION 10000 ) ;
      
MERGE INTO dbo.AllFiles
    USING 
        ( SELECT DISTINCT
                    servername,
                    physical_name
          FROM      dbo.TrackFileIO_15MinAgg
        ) AS names
    ON AllFiles.ServerName = names.ServerName
        AND AllFiles.Physical_Name = names.physical_name
    WHEN NOT MATCHED 
        THEN INSERT ( ServerName,
                      Physical_Name 
                    )
          VALUES    ( names.ServerName,
                      names.physical_name 
                    ) ;
            
END 

Once it’s working, that summary stored procedure can be called after each set of data is collected, by appending it to the very end of our SSIS package. Return to BIDS and SSIS, and add an Execute SQL task to the end of the package, like this:

Iterating Instances Overview SQL Server

In the editor for the SQL Task, just call the rollup procedure:

Rollup Editor SQL Server

Summary

At this point we have:

  1. a central list of SQL instances and admin databases to drive data collection.
  2. an Integration Services package that can iterate over those instances, dynamically connect to each, and move rows of performance data to a central warehouse, and need not be modified if the instance list changes.
  3. a technique for rolling up the collected data, via CTEs and Merge, into a summary table.
  4. a reasonably fault tolerant setup, where data is retained on a 24-hour sliding window, so that all the processes are re-runnable, and most can catch up in the event of an error.

I hope you will agree that not only do we have useful performance data, we also have useful tools and skills for a wide variety of multi-server data solutions.

In Part 4 of this Data Education blog post series, we’ll take a look at how to make a Reporting Services report that will summarize the IO counter information, to show the busiest files in the enterprise at a glance.