Who Has Busy Files? Identifying Biggest Performance Users and Bottlenecks (Part 3)
This is the third part of a blog series here at Data Education about how to use out-of-the-box SQL Server features to track the IO demands of all your SQL Server database files.
In this installment, we are going to enhance the SSIS package we left off with last time to enable looping over all the SQL Server instances in your environment and collecting data from each. We will also add a rollup at the end to aggregate the low-level counter information into minimums, maximums and averages over time. The resulting package will look like this:
Step 1: Add a Loop Container for the Data Collection
The next concept we’ll tackle is to loop over a list of SQL instances to do the data collection from multiple servers. To create a loop, we’ll use the Integration Services Foreach Loop Container. A loop container is just what its name implies: It’s a container into which you can place tasks, and it will repeat those tasks in a loop based on some rules.
From the toolbox, drag a Foreach Loop Container onto an empty area of the Control Flow surface. Expand it so that there’s enough room inside, then select all of the tasks we created in the last installment, at the same time, and drag them inside. Set the name of the container to “Foreach Instance.” You should end up with this:
A Foreach Loop can iterate over several different kinds of collections. What we want to do is get a list of servers and databases from a table, and then run this process once for each row from that table. Getting SSIS to do that is slightly counter-intuitive, so please bear with me here as I try to explain. We need to:
- Query a central table for a list of monitored SQL Server instances and admin database names.
- Store that list in memory in SSIS somewhere. We’ll use a Recordset for that, and the whole Recordset will be stored in a single variable as an object.
- For each (note that key word) row in that in-memory Recordset, we want to pull out the name of the SQL instance, and the name of the admin database. For these two names we’ll use two additional variables.
- In the loop, we want to use those two names to set the connection manager to connect to that combination of SQL instance and admin database. To get this changing connection behavior, we will use expressions in the connection manager, which will change its database connection string for each iteration of the loop.
So, three variables: one for the RecordSet, one for the current SQL Server instance, and one for the current admin database to collect from. Those are the three variables we created in the last blog installment:
InstanceList (of type Object, so it can store a Recordset),
CurrentSourceDB (each type String). The Foreach Loop will run once for each row in
InstanceList and each pass will set
CurrentSourceDB to the values from the current row.
Step 2: Create a Database Table for your Instance List
In your central admin database, first make a table to store the instance list:
CREATE TABLE dbo.AdminDBs( ServerName sysname NOT NULL, DBName sysname NOT NULL )
Then populate that table with one or two test SQL instances and admin database names. (Tip: Things are not likely to work the first time through; pros use test instances!) Instance and database names should be exactly in the form you would use to connect to them.
Step 3: Add a Dataflow to Load the Instance List
Back in SSIS, add a dataflow at the top of the package, and name it “Get SQL Instance List”:
Open that new data flow. Create a data source that pulls straight from the table in Step 2, and then create a Recordset Destination to receive the results:
The Recordset Destination will want you to set a variable to store the Recordset. Edit it, and choose the
User::InstanceList variable for the
VariableName property. The result is that the Data Source will produce rows, and the rows will be placed into an in-memory Recordset, and the Recordset will in turn be stored as an object in
Next, in the same dialog box, look at the Input Columns tab, and make sure the two columns from the source table are selected:
Let’s take a moment to recap: We have a table with SQL Instance and admin database names. We have a dataflow “Get SQL Instance List,” and in that dataflow the data source pulls all the rows from that table. We then have a Recordset destination that receives those rows, and assigns the Recordset to the variable
User::InstanceList in Integration Services.
Step 4: Drive the Foreach Loop Using the Recordset
Return to the Control Flow, and now let’s plug that Foreach Loop into the Recordset. Edit the Foreach Loop, and look at the Collection panel.
- Set the Enumerator to
Foreach ADO Enumerator, which, loosely, is SSIS speak for “Use a Recordset.”
- Under ADO object source variable, choose the variable that will contain the Recordset:
Setting the ADO object source variable will cause the Foreach Loop to iterate over the rows in the Recordset and execute once for each row. However, we need an additional step to actually fetch the data from inside those rows to change the data connection. On the next panel, Variable Mappings, we can map each column in the Recordset, positionally, to a variable in the package. Set the mapping like this:
With the mapping in place, during each iteration of the loop, the values from the current Recordset row will each be loaded into those SSIS variables.
Step 5: Drive the Connection Manager from the Variables
Now that we have those two local variables set up to change from one source server to the next, we need the connection manager to change its connection information too. For that we can tie the
CurrentSourceDB variables to the connection manager properties with expressions. Basically we just need the ServerName and InitialCatalog properties of the connection manager to change to the values of those variables as the loop runs. That will cause the connection manager to connect to the different servers.
Highlight the DBAdmin_Source connection manager and look at the Properties window. Expand the Expressions section, and add an expression for ServerName and an expression for InitialCatalog, as:
Step 6: Test
That should get the package iterating over a collection of instances. At this point it probably makes sense to try it, watch, and see what you get. We’ve made a lot of changes, so it might take some trial and error to get everything running smoothly.
Step 7: Add a Rollup
The raw IO counter information may be useful for a short time, but over a longer period it can grow quite large, and it probably contains too much detail to be worth saving at its original granularity. The last step will be to aggregate it together into a summary table, preserving max, min, and average values for longer time periods. I have an example here that uses a 15-minute interval, but this process could be adapted to longer blocks of time—whatever level of detail makes sense for your environment.
At a high level, we want to:
- Compute max, min, and average values for the counters for all samples, for all physical files.
- Group all that by each 15-minute block of time.
- Merge the results into a summary table containing the existing computed aggregates from the past.
Keep that basic process in mind here because the aggregate query is a bit of a beast.
First, in my implementation I have two tables: one to store a list of all the physical files from all the instances, and a second to store the aggregated performance data:
CREATE TABLE dbo.AllFiles( ServerName nvarchar(128) NOT NULL, Physical_Name nvarchar(260) NOT NULL, CONSTRAINT PK_AllFiles PRIMARY KEY CLUSTERED ( ServerName ASC, Physical_Name ASC ) ) ; CREATE TABLE dbo.TrackFileIO_15MinAgg( AsOfDate datetime NOT NULL, ServerName nvarchar(128) NOT NULL, database_name sysname NOT NULL, physical_name nvarchar(260) NOT NULL, sample_duration_sec decimal(4, 2) NOT NULL, number_of_samples int NOT NULL, max_num_of_reads bigint NOT NULL, min_num_of_reads bigint NOT NULL, avg_num_of_reads bigint NOT NULL, max_avg_num_of_reads_per_sec bigint NOT NULL, min_avg_num_of_reads_per_sec bigint NOT NULL, avg_avg_num_of_reads_per_sec bigint NOT NULL, max_num_of_KB_read bigint NOT NULL, min_num_of_KB_read bigint NOT NULL, avg_num_of_KB_read bigint NOT NULL, max_avg_num_of_KB_read_per_sec bigint NOT NULL, min_avg_num_of_KB_read_per_sec bigint NOT NULL, avg_avg_num_of_KB_read_per_sec bigint NOT NULL, max_io_stall_read_ms bigint NOT NULL, min_io_stall_read_ms bigint NOT NULL, avg_io_stall_read_ms bigint NOT NULL, max_avg_io_stall_read_ms_per_sec bigint NOT NULL, min_avg_io_stall_read_ms_per_sec bigint NOT NULL, avg_avg_io_stall_read_ms_per_sec bigint NOT NULL, max_num_of_writes bigint NOT NULL, min_num_of_writes bigint NOT NULL, avg_num_of_writes bigint NOT NULL, max_avg_num_of_writes_per_sec bigint NOT NULL, min_avg_num_of_writes_per_sec bigint NOT NULL, avg_avg_num_of_writes_per_sec bigint NOT NULL, max_num_of_KB_written bigint NOT NULL, min_num_of_KB_written bigint NOT NULL, avg_num_of_KB_written bigint NOT NULL, max_avg_num_of_KB_written_per_sec bigint NOT NULL, min_avg_num_of_KB_written_per_sec bigint NOT NULL, avg_avg_num_of_KB_written_per_sec bigint NOT NULL, max_io_stall_write_ms bigint NOT NULL, min_io_stall_write_ms bigint NOT NULL, avg_io_stall_write_ms bigint NOT NULL, max_avg_io_stall_write_ms_per_sec bigint NOT NULL, min_avg_io_stall_write_ms_per_sec bigint NOT NULL, avg_avg_io_stall_write_ms_per_sec bigint NOT NULL, max_io_stall bigint NOT NULL, min_io_stall bigint NOT NULL, avg_io_stall bigint NOT NULL, max_avg_io_stall_per_sec bigint NOT NULL, min_avg_io_stall_per_sec bigint NOT NULL, avg_avg_io_stall_per_sec bigint NOT NULL, CONSTRAINT PK_TrackFileIO_15MinAgg PRIMARY KEY CLUSTERED ( AsOfDate ASC, ServerName ASC, physical_name ASC ) ) ;
Next, I have a stored procedure with merge statements that takes the raw counter data, groups on the time interval, and then merges into the summary table. In the text here I am presenting a shortened version, cutting out repetitive column lists and including some explanatory comments. (You can also download the full text of the procedure if you want.)
CREATE PROCEDURE dbo.RollUp_TrackFileIO AS BEGIN WITH -- This table expression creates a "calendar table"-like set of the datetimes -- for 15 minute intervals from the past day: fifteenMinInterval ( AsOfDate ) AS ( SELECT CONVERT( datetime, CONVERT ( char(13), DATEADD( HOUR, -26, GETDATE() ), 120 ) + ':00' , 120 ) AsOfDate UNION ALL SELECT DATEADD( MINUTE, 15, AsOfDate ) AsOfDate FROM fifteenMinInterval WHERE AsOfDate <= GETDATE() ) , -- This one computes the max, min and average of the IO counters -- within each of the time blocks defined by the intervals above: AggregatedFileIO AS ( SELECT fifteenMinInterval.AsOfDate, TrackFileIO.ServerName, TrackFileIO.database_name, TrackFileIO.physical_name, 10.0 AS sample_duration_sec, -- This value is a fixed 10 seconds in -- the polling jobs that do the data collection COUNT(*) AS number_of_samples, MAX(num_of_reads) AS max_num_of_reads, MIN(num_of_reads) AS min_num_of_reads, AVG(num_of_reads) AS avg_num_of_reads, MAX(avg_num_of_reads_per_sec) AS max_avg_num_of_reads_per_sec, MIN(avg_num_of_reads_per_sec) AS min_avg_num_of_reads_per_sec, AVG(avg_num_of_reads_per_sec) AS avg_avg_num_of_reads_per_sec, /* ... Insert all the other counter columns here following the same pattern ... */ MAX(avg_io_stall_per_sec) AS max_avg_io_stall_per_sec, MIN(avg_io_stall_per_sec) AS min_avg_io_stall_per_sec, AVG(avg_io_stall_per_sec) AS avg_avg_io_stall_per_sec FROM fifteenMinInterval INNER JOIN dbo.TrackFileIO ON fifteenMinInterval.AsOfDate >= dbo.TrackFileIO.AsOfDate AND dbo.TrackFileIO.AsOfDate > DATEADD( minute, -15, fifteenMinInterval.AsOfDate ) WHERE TrackFileIO.AsOfDate >= DATEADD( hour, -24, GETDATE() ) GROUP BY fifteenMinInterval.AsOfDate, TrackFileIO.ServerName, TrackFileIO.database_name, TrackFileIO.physical_name ) MERGE INTO dbo.TrackFileIO_15MinAgg AS existing USING AggregatedFileIO AS new ON new.AsOfDate = existing.AsOfDate AND new.ServerName = existing.ServerName AND new.physical_name = existing.physical_name WHEN MATCHED AND ( new.number_of_samples != existing.number_of_samples OR new.sample_duration_sec != existing.sample_duration_sec OR new.number_of_samples != existing.number_of_samples OR new.max_num_of_reads != existing.max_num_of_reads OR new.min_num_of_reads != existing.min_num_of_reads OR new.avg_num_of_reads != existing.avg_num_of_reads OR new.max_avg_num_of_reads_per_sec != existing.max_avg_num_of_reads_per_sec OR new.min_avg_num_of_reads_per_sec != existing.min_avg_num_of_reads_per_sec OR new.avg_avg_num_of_reads_per_sec != existing.avg_avg_num_of_reads_per_sec /* ... Insert all the other counter columns here following the same pattern ... */ OR new.max_io_stall != existing.max_io_stall OR new.min_io_stall != existing.min_io_stall OR new.avg_io_stall != existing.avg_io_stall OR new.max_avg_io_stall_per_sec != existing.max_avg_io_stall_per_sec OR new.min_avg_io_stall_per_sec != existing.min_avg_io_stall_per_sec OR new.avg_avg_io_stall_per_sec != existing.avg_avg_io_stall_per_sec ) THEN UPDATE SET number_of_samples = new.number_of_samples, max_num_of_reads = new.max_num_of_reads, min_num_of_reads = new.min_num_of_reads, avg_num_of_reads = new.avg_num_of_reads, max_avg_num_of_reads_per_sec = new.max_avg_num_of_reads_per_sec, min_avg_num_of_reads_per_sec = new.min_avg_num_of_reads_per_sec, avg_avg_num_of_reads_per_sec = new.avg_avg_num_of_reads_per_sec, /* ... Insert all the other counter columns here following the same pattern ... */ max_avg_io_stall_per_sec = new.max_avg_io_stall_per_sec, min_avg_io_stall_per_sec = new.min_avg_io_stall_per_sec, avg_avg_io_stall_per_sec = new.avg_avg_io_stall_per_sec WHEN NOT MATCHED THEN INSERT ( AsOfDate, ServerName, database_name, physical_name, sample_duration_sec, number_of_samples, max_num_of_reads, min_num_of_reads, avg_num_of_reads, /* ... Insert all the other counter columns here following the same pattern ... */ max_avg_io_stall_per_sec, min_avg_io_stall_per_sec, avg_avg_io_stall_per_sec ) VALUES ( AsOfDate, ServerName, database_name, physical_name, sample_duration_sec, number_of_samples, max_num_of_reads, min_num_of_reads, avg_num_of_reads, max_avg_num_of_reads_per_sec, min_avg_num_of_reads_per_sec, avg_avg_num_of_reads_per_sec, /* ... Insert all the other counter columns here following the same pattern ... */ max_avg_io_stall_per_sec, min_avg_io_stall_per_sec, avg_avg_io_stall_per_sec ) OPTION ( MAXRECURSION 10000 ) ; MERGE INTO dbo.AllFiles USING ( SELECT DISTINCT servername, physical_name FROM dbo.TrackFileIO_15MinAgg ) AS names ON AllFiles.ServerName = names.ServerName AND AllFiles.Physical_Name = names.physical_name WHEN NOT MATCHED THEN INSERT ( ServerName, Physical_Name ) VALUES ( names.ServerName, names.physical_name ) ; END
Once it’s working, that summary stored procedure can be called after each set of data is collected, by appending it to the very end of our SSIS package. Return to BIDS and SSIS, and add an Execute SQL task to the end of the package, like this:
In the editor for the SQL Task, just call the rollup procedure:
At this point we have:
- a central list of SQL instances and admin databases to drive data collection.
- an Integration Services package that can iterate over those instances, dynamically connect to each, and move rows of performance data to a central warehouse, and need not be modified if the instance list changes.
- a technique for rolling up the collected data, via CTEs and Merge, into a summary table.
- a reasonably fault tolerant setup, where data is retained on a 24-hour sliding window, so that all the processes are re-runnable, and most can catch up in the event of an error.
I hope you will agree that not only do we have useful performance data, we also have useful tools and skills for a wide variety of multi-server data solutions.
In Part 4 of this Data Education blog post series, we’ll take a look at how to make a Reporting Services report that will summarize the IO counter information, to show the busiest files in the enterprise at a glance.