Wednesday, September 21, 2016

SQL-Server Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 2 - Multiple Simultaneous Running-Totals

SET-BASED Multiple Simultaneous Running Subtotals in SQL

Have you ever wanted to perform a set-based running-subtotals and/or running-totals operation in SQL-Server (i.e., have row-level accumulators store a running total or running subtotal?) Want the solution to not use cursors? Don't have access to SQL-Server 2012 or newer yet?

Here I present a cursorless method of implementing such functionality in a SQL-Server Query, Stored-Procedure, or User Defined Function (UDF) that works with all versions of SQL-Server (SQL-Server 2005, 2008, 2008r2, 2012, 2014, 2016...) without requiring SQL2012-specific OVER / PARTITION BY functionality. I have seen running totals done before, with varied methods, but what I have not seen much about elsewhere is how to do running subtotals. The code that appears below does both running totals and running subtotals. It is easily adapted to do multi-column break-value running subtotals, as noted in the code comments.

This example builds off of Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 1, which introduces the techniques shown here (and more), and includes more discussion of the solution(s) — see the comments in the example source code.

SQL-Server 2012 (or newer) Note: once SQL-Server 2012 became widely available, this example was updated to show how to implement the running totals and running subtotals using SQL-Server 2012 OVER and PARTITION BY (windowing functionality) clauses that are ANSI standard SQL available to Microsoft Transact-SQL query writers now. See the lower portion of the source code for a solution using these modern SQL features. Included are performance (run-time) comparison values for both solution methods.

Note: for this example, I used the SQL Server 2012 Adventureworks sample database, which provided me with a test case of approximately 113,000 rows against which I perform the running-subtotal (by product level) logic and do my row-level accumulator manipulations (using SET logic). This query is relatively efficient and averaged just under 3 seconds to execute withing a VM (virtual machine) on my single-CPU Intel Core-i7 development desktop machine.

SQL-Server Source Code

--********************************************************************************
--This source code is Copyright (c) 2007-2016
--     Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************
--**********************************************************************************************
-- BEGIN: SET-BASED Running-Subtotals Technique #2 - Multiple Simultaneous Running-Totals
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE    WITH NO_INFOMSGS; -- Clears the procedure (plan) cache

SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

DECLARE @starttime DATETIME = GetDate();

--Accumulator variables
DECLARE @RunningTotal           MONEY = 0.0;
DECLARE @RunningTranDtSubtotal  MONEY = 0.0;
DECLARE @RunningProductSubtotal MONEY = 0.0;

--Our break-level-value variables (Data-Types match those of the columns we are comparing to)
--Initialize these to some values that will NOT exist in each break-column's actual data.
DECLARE @SubTotalBreakValue1    CHAR(8) = '18010101';
DECLARE @SubTotalBreakValue2    INT = -1;


--Our work-table, where we can ensure the ORDER with which we later access the data, being
--by default, the order of the PRIMARY KEY declared here in UniqueID column.
DECLARE @ResultsTable TABLE
(
    UniqueID                INT IDENTITY NOT NULL PRIMARY KEY,
    SubtotalBreakColumn1    CHAR(8),
    SubtotalBreakColumn2    INT,
    ActualCost              SMALLMONEY,
    RunningTranDtSubtotal   MONEY,
    RunningProductSubtotal  MONEY,
    RunningTotal            MONEY
);

--Insert all values to be totaled, into our work table in the REQUIRED BREAK-LEVEL(S) ORDER
INSERT INTO @ResultsTable
(
    SubtotalBreakColumn1,
    SubtotalBreakColumn2,
    ActualCost,
    RunningTranDtSubtotal,
    RunningProductSubtotal,
    RunningTotal
)
SELECT
    dbo.[udfConvertDateToString](TransactionDate),
    ProductID,
    ActualCost,
    0,
    0,
    0
FROM
    Production.TransactionHistory
ORDER BY
    TransactionDate, ProductID    --Insert into table in our subtotal-break order (IMPORTANT!)
;

--**********************************************************************************************
-- ALGORITHM EXPLANATION:
-- See SET-BASED RUNNING SUBTOTALS Example 1, which this builds upon as discussed 
-- in the comments therein.
--**********************************************************************************************
UPDATE
    @ResultsTable
SET
    @RunningTotal           = RunningTotal          = @RunningTotal + ActualCost,
    @RunningTranDtSubtotal  = RunningTranDtSubtotal =
        CASE 
            WHEN @SubTotalBreakValue1 = SubtotalBreakColumn1 
                THEN @RunningTranDtSubtotal + ActualCost
            ELSE ActualCost 
        END,
    @RunningProductSubtotal = RunningProductSubtotal =
    CASE 
        WHEN ( @SubTotalBreakValue1 = SubtotalBreakColumn1 AND 
                @SubTotalBreakValue2 = SubtotalBreakColumn2  ) 
            THEN @RunningProductSubtotal + ActualCost 
        ELSE ActualCost 
    END,
    @SubTotalBreakValue1    = SubtotalBreakColumn1,
    @SubTotalBreakValue2    = SubtotalBreakColumn2
;

--Output the results... format to fit printout better.
SELECT
    UniqueID,
    SubtotalBreakColumn1                    AS 'DateBreak',
    CONVERT(CHAR(6),SubtotalBreakColumn2)   AS 'ProdID',
    ActualCost,
    RunningTranDtSubtotal                   AS 'RunningDtSub',
    RunningProductSubtotal                  AS 'RunningProdSub',
    RunningTotal
FROM
    @ResultsTable;


PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('Custom Multiple-Running-Subtotals / Total Example', @starttime, DEFAULT);
--**********************************************************************************************
--Using the AdventureWorks DB (with SQL2005), you'll see a large set with data like this:
--**********************************************************************************************
--UniqueID DateBreak ProdID ActualCost   RunningDtSub      RunningProdSub    RunningTotal
---------- --------- ------ ------------ ----------------- ----------------- ---------------------
--1        20030901  3      0.00         0.00              0.00              0.00
--2        20030901  316    0.00         0.00              0.00              0.00
--...
--...
--25       20030901  534    0.00         0.00              0.00              0.00
--26       20030901  707    20.994       20.994            20.994            20.994
--27       20030901  707    20.994       41.988            41.988            41.988
--28       20030901  707    20.994       62.982            62.982            62.982
--...
--...
--75       20030901  707    34.99        1105.684          1105.684          1105.684
--76       20030901  707    34.99        1140.674          1140.674          1140.674
--77       20030901  708    34.99        1175.664          34.99             1175.664
--78       20030901  708    34.99        1210.654          69.98             1210.654
--79       20030901  708    34.99        1245.644          104.97            1245.644
--...
--...
--3942     20030901  999    388.7928     1674280.9813      7775.856          1674280.9813
--3943     20030901  999    388.7928     1674669.7741      8164.6488         1674669.7741
--3944     20030901  999    388.7928     1675058.5669      8553.4416         1675058.5669
--3945     20030902  3      0.00         0.00              0.00              1675058.5669
--3946     20030902  316    0.00         0.00              0.00              1675058.5669
--...
--...
--113437   20040903  929    37.086      1086.456          37.086             27307057.0617
--113438   20040903  930    43.0395     1129.4955         43.0395            27307100.1012
--113439   20040903  931    34.3455     1163.841          34.3455            27307134.4467
--113440   20040903  931    34.8705     1198.7115         69.216             27307169.3172
--113441   20040903  932    39.7635     1238.475          39.7635            27307209.0807
--113442   20040903  932    39.2385     1277.7135         79.002             27307248.3192
--113443   20040903  948    82.8345     1360.548          82.8345            27307331.1537


--**********************************************************************************************
-- END: SET-BASED Running-Subtotals Technique #2 - Multiple Simultaneous Running-Totals
--
-- Average run-time against source data count of 113K rows in AdventureWorks2012 are decent: 
-- 2980ms (3.0 seconds)
--**********************************************************************************************



--**********************************************************************************************
--**********************************************************************************************
-- BEGIN: alternative SET-BASED Running-Subtotal and Running-Totals Technique using 
-- "OVER" (windowing) functionality introduced in SQL-Server 2012.
--**********************************************************************************************
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE    WITH NO_INFOMSGS; -- Clears the procedure (plan) cache

--DECLARE @starttime DATETIME = GetDate();
SET  @starttime = GetDate();

SELECT
    dbo.[udfConvertDateToString](TransactionDate) AS DateBreak, --show without zero'd HHMMSS info
    ProductID AS ProdID, 
    ActualCost, 
    SUM(ActualCost) OVER
        (
        PARTITION BY TransactionDate
        ORDER BY TransactionDate, ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningDtSub,
    SUM(ActualCost) OVER
        (
        PARTITION BY ProductID
        ORDER BY TransactionDate, ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningProdSub,
    SUM(ActualCost) OVER
        (
        PARTITION BY @@SERVERNAME
        ORDER BY TransactionDate, ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningTotal
FROM
    Production.TransactionHistory
ORDER BY
    --Use @@SERVERNAME as first column in order by so we can create a "partition" of the ENTIRE set for Running-TOTAL.
    --ORDER BY requires non-constant-values for columns, and @@SERVERNAME yields a "constant" (without being flagged as such) 
    @@SERVERNAME,
    TransactionDate, 
    ProductID;

PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('OVER / PARTITION BY Multiple-Running-Subtotals / Total Example', @starttime, DEFAULT);
--SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds 
--**********************************************************************************************
-- END: "OVER" (WINDOWING FUNCTINO) BASED Runing-Total/Running-Subtotals Technique, 
-- which produces same result (NOTE: requires SQL-Server 2012+)
--
-- 12200ms (1.2 sec) average : FAST!
-- Being "native" functionality in SQL-Server 2012, it is not surprising this approach produces
-- the SPEEDIEST EXECUTION TIME. 
--
-- But, for a non-native and/or pre-SQL2012 solution, our original approach we laid out (above)
-- is perhaps as performant as possible while being relatively easy to implement. 
--**********************************************************************************************


Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

No comments: