Thursday, April 16, 2009

Generate restore scripts from backup history


 

Over the years I' found this to be a very hand stored proc to generate the required SQL Server
T-SQL RESTORE commands to recreate a recovered database out of the latest backup info:
dumps to disk (full/differerential/log) and the history tables from MSDB

I can not emphasize enough that the MSDB database has to be an integral part of you back strategy. I recommend configuring it in 'FULL' recovery mode and include it in your log backups


CREATE
PROCEDURE
[dbo].[usp_GenerateRestoreScripts] @DBname VARCHAR(100)
AS

SET
NOCOUNT
ON-- required because we're going to print T-SQL for the restores in the messages 'tab' of SSMS

/* 
Script creates the T-SQL to restore a database with info from MSDB
It helps by creating RESTORE command constructed from the last FULL backup, the last DIFFERENTIAL backup
and all the required TRANSACTION LOG backups after this.
Neat when you have a high frequency of differential or log backups

The variable @DBName should be set to the name of the database you want to restore.

!!! BE AWARE: include MSDB in your backup plan for this T-SQL script to work in all circumstances !!!
I usually include MSDB in the log backup schedule (set the db to full recovery mode)

*/

DECLARE @lastFullBackup INT, @lastFullBackupPath VARCHAR(2000), @lastDifferentialBackup INT, @lastDifferentialBackupPath VARCHAR(2000)
DECLARE @i INT, @logBackupPath VARCHAR(1000)

-- remove temp object that might exist
IF
OBJECT_ID('tempdb..#MSDBBackupHistory')
IS
NOT
NULL
   
DROP
TABLE #MSDBBackupHistory

CREATE
TABLE #MSDBBackupHistory (
    id
INT
IDENTITY(1,1),
    backup_start_date
DATETIME,
    backup_type
CHAR(1),
    physical_device_name
VARCHAR(1000))

INSERT
INTO #MSDBBackupHistory (backup_start_date,  backup_type, physical_device_name)
   
SELECT BS.backup_start_date, BS.type, RTRIM(BMF.physical_device_name)
   
FROM msdb..backupset BS JOIN msdb..backupmediafamily BMF ON BMF.media_set_id=BS.media_set_id
   
WHERE BS.database_name
= @DBName
   
ORDER
BY BS.backup_start_date
-- dump the last backup first in table

-- get the last Full backup info.
SET @lastFullBackup =
(SELECT
MAX(id)
FROM #MSDBBackupHistory WHERE backup_type='D')
SET @lastFullBackupPath =
(SELECT physical_device_name FROM #MSDBBackupHistory WHERE id=@lastFullBackup)

-- Restore the Full backup
PRINT
'RESTORE DATABASE '
+ @DBName
PRINT
'FROM DISK='''
+ @lastFullBackupPath +
''''

-- IF it's there's no backup (differential or log) after it, we set to 'with recovery'
IF
(@lastFullBackup =
(SELECT
MAX(id)
FROM #MSDBBackupHistory))
   
PRINT
'WITH RECOVERY'
ELSE
PRINT
'WITH NORECOVERY'

PRINT
'GO'
PRINT
''

-- get the last Differential backup (it must be done after the last Full backup)
SET @lastDifferentialBackup =
(SELECT
MAX(id)
FROM #MSDBBackupHistory WHERE backup_type='I'
AND id > @lastFullBackup)
SET @lastDifferentialBackupPath =
(SELECT physical_device_name FROM #MSDBBackupHistory WHERE id=@lastDifferentialBackup)

-- when there's a differential backup after the last full backup create the restore T-SQL commands
IF
(@lastDifferentialBackup IS
NOT
NULL)
BEGIN
   
-- Restore last diff. backup
   
PRINT
'RESTORE DATABASE '
+ @DBName
   
PRINT
'FROM DISK='''
+ @lastDifferentialBackupPath +
''''

   
-- If no backup made (differential or log) after it, set to 'with recovery'
   
IF
(@lastDifferentialBackup =
(SELECT
MAX(id)
FROM #MSDBBackupHistory))
       
PRINT
'WITH RECOVERY'
   
ELSE
PRINT
'WITH NORECOVERY'

   
PRINT
'GO'
   
PRINT
''
-- new line for readability
END

-- construct the required TRANSACTION LOGs restores
IF
(@lastDifferentialBackup IS
NULL)
-- no diff backup made?
   
SET @i = @lastFullBackup +
1    -- search for log dumps after the last full
ELSE
SET @i = @lastDifferentialBackup +
1
-- search for log dumps after the last diff

-- script T-SQL restore commands from the log backup history
WHILE
(@i <=
(SELECT
MAX(id)
FROM #MSDBBackupHistory))
BEGIN

   
SET @logBackupPath =
(SELECT physical_device_name FROM #MSDBBackupHistory WHERE id=@i)
   
PRINT
'RESTORE LOG '
+ @DBName
   
PRINT
'FROM DISK='''
+ @logBackupPath +
''''

   
-- it's the last transaction log, set to 'with recovery'
   
IF
(@i =
(SELECT
MAX(id)
FROM #MSDBBackupHistory))
       
PRINT
'WITH RECOVERY'
   
ELSE
PRINT
'WITH NORECOVERY'   

   
PRINT
'GO'
   
PRINT
''
-- new line for readability

   
SET @i = @i +
1
-- try to find the next log entry
END

-- remove temp objects that exist
IF
OBJECT_ID('tempdb..#MSDBBackupHistory')
IS
NOT
NULL

   
DROP
TABLE #MSDBBackupHistory

How to use SQL Server T-SQL cursors

The general advice within the SQL Server community is do not use any T-SQL cursors at any time.

Coding guidelines:
- Avoid using cursors and try solve the problem by using set based operations.
- Try to base them on readonly tables or local (#temptable) temporarily objects.
- Cleanup! Always CLOSE and DEALLOCATE cursors.
- Specify the 'LOCAL' option.
- Specify the FAST_FORWARD clause on DECLARE CURSOR. This opens an optimized forward-only, read-only cursor.
- Never use cursors in OLTP environments.
- Never use them as source of an update: UPDATE table SET col1 = 2 WHERE CURRENT OF the_cursor;
- Cursors are memory intensive; base them on smaller sets (less 5,000-10,000 rows, your site could differ: test!).

template:

    DECLARE the_cursor CURSOR FAST_FORWARD LOCAL
FOR

       
SELECT col1,col2,col3
         
FROM dbo.mytable

   
OPEN the_cursor

   
FETCH
NEXT
FROM the_cursor INTO @col1,@col2,@col3
   
WHILE
(@@FETCH_STATUS
<>
-1)
   
BEGIN

       
FETCH
NEXT
FROM the_cursor INTO @col1,@col2,@col3
   
END

   
CLOSE the_cursor
   
DEALLOCATE the_cursor
 



TIP:
If you do need cursors in high load OLTP environments because of some complex calculation that can't be done set based take the following approach:

Copy the required result set in a temporary object. Retrieve only the rows and columns you need, but do include all the fields of the primary key.

create #temptable (CalcValue int, pkCol int)

INSERT INTO #temptable (CalcValue, pkCol)

SELECT 0, PrimaryKeyCol
FROM dbo.HighLoadOLTPtable
WHERE
-- your where clause here


Base your cursor on the temp. object.

Loop the cursor, perform your calculation and store the result in the temp. object row:

UPDATE #temptable SET CalcValue=complex_calculated_value
WHERE pkCol=pk_fields_as_fetched_by_cursor

When done looping close/deallocate the cursor, and update the high load OLTP source table(s) set based by primarykey, use a BEGIN TRANSACTION / COMMIT if required:

UPDATE dbo.HighLoadOLTPtable SET CalculatedValue = #temptable.CalcValueFROM dbo.HighLoadOLTPtable, #temptableWHERE dbo.HighLoadOLTPtable.PrimaryKeyCol = #temptable.pkCol

How to configure SQL Server TempDB


 

As a general guideline, create one data file for each CPU on the server.
Note that a dual-core CPU is considered to be two CPUs. Logical procs (hyperthreading) do not

only create one Log file

Do not use autoshrink on TempDB

If your'e using SQL Server 2008 set Page verify to ChecksumInvestigate the possibility if you can switch off the properties 'Auto create' and 'Auto Update' statistics, it can speed up the creation of objects in TempDB. Be careful however: measure the impact these changes can have on production! Change and measure only one parameter at the time.

After installation move the TempDB database to its own disks that differ from those that are used by user databases

ALTER DATABASE TempDB
MODIFY FILE (NAME=tempdev, FILENAME= 'SeparateDrive:\>path\tempdb.mdf');
GO

ALTER DATABASE TempDB
MODIFY FILE (NAME=templog, FILENAME= ' SeparateDrive:\>path\TempLog.ldf');
GO


Restart SQL Server service (not the Windows server), to make changes permanent

Pre-allocate data and log device sizes (create files with same equal size), do not rely on small auto growth steps

If you use auto growth set it to a reasonable size in MB, not a percentage

TempDB file size default FILEGROWTH increment
0 to 100 MB growth:10 MB
100 to 200 MB growth:20 MB
200 to 1000 MB growth:50 to 75 MB
1 GB or More growth:150 to 250 MB

Measure TempDB usage over time with fn_VirtualFileStats

New releases of your –vendor- software could have a different impact on TempDB load.

Thursday, April 9, 2009

UPDATE STATISTICS undocumented options

If you read the Books Online page describing the UPDATE STATISTICS command, you will see that there are some undocumented options.


 

UPDATE STATISTICS table | view
    [
        {
            { index | statistics_name }
          | ( { index |statistics_name } [ ,...n ] )
                }
    ]
    [    WITH
        [
            [ FULLSCAN ]
            | SAMPLE number { PERCENT | ROWS } ]
            | RESAMPLE
            | <update_stats_stream_option> [ ,...n ]
        ]
        [ [ , ] [ ALL | COLUMNS | INDEX ]
        [ [ , ] NORECOMPUTE ]
    ] ;

<update_stats_stream_option> ::=
    [ STATS_STREAM = stats_stream ]
    [ ROWCOUNT = numeric_constant ]
    [ PAGECOUNT = numeric contant ]

 
 

<update_stats_stream_option>

    This syntax is for internal use only and is not supported. Microsoft reserves the right to change this syntax at any time.


 

There is a very good reason why these options are undocumented. They are meant for testing and debugging purposes, and should never ever be used on production systems.


 

However, these options can also be extremely helpful to create examples and sample scripts to demonstrate various features and behaviors of the Query Optimizer and the related query plan shapes. We decided to explain what ROWCOUNT and PAGECOUNT do, so that to be able to use these commands in examples we'll be posting in some future. Feel free to use these options on development systems for experimental and educational purposes, but please do not play with fire and do not use them in production!


 

As the name of these options suggest, ROWCOUNT and PAGECOUNT alter the internal metadata of the specified table or index by overriding the counters containing the row and page counts of the object. These counters are in turn read by the Query Optimizer when processing queries that access the table and/or index in question. These commands can basically cheat the Optimizer into thinking that a table or index is extremely large.


 

SQL Server's Query Optimization process is structured in multiple stages. Further optimization stages consider a progressively larger and more sophisticated set of possible tree transformations and query optimizations. Later stages of optimization are only entered when the estimated cost of the query is sufficiently high, in order to avoid wasting precious CPU cycles against simple queries that do not need that level of sophistication anyway. The multiple optimization stages are a mean to produce efficient query plans without consuming excessive amounts of CPU. Typically, in order to make the Optimizer "think" a lot and enter these later stages it is necessary to have big tables with a large number of rows and pages, which in turn take time and space to populate. Using ROWCOUNT and PAGECOUNT allows us to exercise these code paths with relatively simple scripts that do not require an extremely complex setup phase.


 

Here is an example. When running this simple script on your SQL 2005 instance you will likely see a different query plan for the two selects before and after updating the statistics. The recompile option is used to ensure that the query plans are regenerated. From the statistics profile, you'll also see very different estimated row counts and consequently costs.


 

use tempdb

go


 

create table t1(i int, j int)

go


 

create table t2(h int, k int)

go


 

set statistics profile on

go


 

select distinct(i) from t1

go


 

select * from t1, t2 where i = k order by j + k

go


 

update statistics t1 with rowcount = 10000, pagecount = 10000

update statistics t2 with rowcount = 100000, pagecount = 100000

go


 

select distinct(i) from t1 option (recompile)

go


 

select * from t1, t2 where i = k order by j + k option (recompile)

go


 

How to Read Statistics Profile

In SQL Server, "Statistics Profile" is a mode in which a query is run where you can see the number of invocations of each physical plan operator in the tree. Instead of running a query and just printing the output rows, this mode also collects and returns per-operator row counts. Statistics Profile is used by the SQL Query Optimizer Team to identify issues with a plan which can cause the plan to perform poorly. For example, it can help identify a poor index choice or poor join order in a plan. Oftentimes, it can help identify the needed solution, such as updating statistics (as in the histograms and other statistical information used during plan generation) or perhaps adding a plan hint. This document describes how to read the statistics profile information from a query plan so that you can also debug plan issues.
A simple example query demonstrates how to retrieve the statistics profile output from a query:

use northwind
set statistics profile on
select * from customers c inner join orders o on c.customerid = o.customerid;

The profile output has a number of columns and is a bit tricky to print in a regular document. The key pieces of information that it prints are the plan, which looks like this:

StmtText

------------------------------------------------------------------------------------------------------
select * from customers c inner join orders o on c.customerid = o.customerid
|--Hash Match(Inner Join, HASH:([c].[CustomerID])=([o].[CustomerID]),
|--Clustered Index Scan(OBJECT:([nwind].[dbo].[Customers].[aaaaa_PrimaryKey] AS [c]))
|--Clustered Index Scan(OBJECT:([nwind].[dbo].[Orders].[aaaaa_PrimaryKey] AS [o])) 
 

Other pieces of useful information are the estimated row count and the actual row count for each operator and the estimated and actual number of invocations of this operator. Note that the actual rows and # of executions are physically listed as early columns, while the other columns are listed later in the output column list (so you typically have to scroll over to see them).

Rows                 Executes
-------------------- --------------------
1078                 1
1078                 1
91                   1
1078                 1 

EstimatedRows

------------------------
1051.5834
1051.5834
91.0
1078.0

EstimateExecutions
------------------------
NULL
1.0
1.0
1.0
 

Other fields, such as the estimated cost of the subtree, the output columns, the average row size, also exist in the output (but are omitted for space in this document).

Note: The output from statistics profile is typically easiest to read if you set the output from your client (Query Analyzer or SQL Server Management Studio) to output to text, using a fixed-width font. You can then see the columns pretty easily and you can even move the results into a tool like Excel if you want to cluster the estimates and actual results near each other by rearranging the columns (SSMS also can let you do this).

There are a few key pieces of information needed to understand the output from Statistics profile. First, query plans are generally represented as trees. In the output, children are printed below their parents and are indented:

StmtText
------------------------------------------------------------------------------------------------------
select * from customers c inner join orders o on c.customerid = o.customerid
|--Hash Match(Inner Join, HASH:([c].[CustomerID])=([o].[CustomerID]),
|--Clustered Index Scan(OBJECT:([nwind].[dbo].[Customers].[aaaaa_PrimaryKey] AS [c]))
|--Clustered Index Scan(OBJECT:([nwind].[dbo].[Orders].[aaaaa_PrimaryKey] AS [o])) 

In this example, the scan of Customers and Orders are both below a hash join operator. Next, the "first" child of the operator is listed first. So, the Customers Scan is the first child of the hash join. Subsequent operators follow, in order. Finally, query execution plans are executed in this "first" to "last" order. So, for this plan, the very first row is returned from the Customers table. (A more detailed discussion of operators will happen later in the article). Notice that the output from the graphical showplan is similar but slightly different:
 

In this tree representation, both Scans are printed to the right on the screen, and the first child is above the other operators. In most Computer Science classes, trees are printed in a manner transposed from this:

    Hash Join
    /     \
Customers Orders
 

The transposition makes printing trees in text easier. In this classical view, the tree is evaluated left to right with rows flowing bottom to top.

With an understanding of the nuances of the query plan display, it is possible to understand what happens during the execution of this query plan. The query returns 1078 rows. Not coincidentally, there are also 1078 orders in this database. Since there's a Foreign Key relationship between Orders and Customers, it requires that a match exist for each order to each customer. So, the 91 rows in Customers match the 1078 rows in Orders to return the result.

The query estimates that the join will return 1051.5834 rows. First, this is a bit less than the actual (1078) but is not a substantial difference. Given that the Query Optimizer is making educated guesses based on sampled statistical information that may itself be out-of-date, this estimate is actually pretty good. Second, the number is not an integer because we use floating point for our estimates to improve accuracy on estimates we make. For this query, the number of executions is 1 for both the estimate and actual. This won't always be the case, but it happens to be true for this query because of the way hash joins work. In a hash join, the first child is scanned and a hash table is built. Once the hash join is built, the second child is then scanned and each row probes the hash table to see if there is a matching row.

Loops join does not work this way, as we'll see in a slightly modified example. 

select * from customers c with (index=1) inner loop join orders o with (index=1) on c.customerid = o.customerid

In this example, I've forced a loop join and the use of clustered indexes for each table. The plan now looks like this:

StmtText
-----------------------------------------------------------------
select * from customers c with (index=1) inner loop join orders o
|--Nested Loops(Inner Join, WHERE:([nwind].[dbo].[Orders].[Cust

|--Clustered Index Scan(OBJECT:([nwind].[dbo].[Customers].
|--Table Spool
|--Clustered Index Scan(OBJECT:([nwind].[dbo].[Orders

Beyond the different join algorithm, you'll notice that there is now a table spool added to the plan. The spool is on the second child (also called the "inner" child for loops join because it is usually invoked multiple times). The spool scans rows from its child and can store them for future invocations of the inner child. The actual row count and execution count from the statistics profile is a bit different from the previous plan:

Rows Executes
-------------------- --------------------
1078 1
1078 1 <--- Loop Join
91 1 <--- Scan of Customers
98098 91 <--- Spool
1078 1 <--- Scan of Orders

In this plan, the second child of the loop join is scanned 91 times returning a total number of 98098 rows. For the actual executions, the total number of rows is to sum of all invocations of that operator, so it is 91*1078=98098. This means that the inner side of this tree is scanned 91 times. Nested Loops joins require rescans of the inner subtree (Hash Joins do not, as you saw in the first example). Note that the spool causes only one scan of the Orders table, and it only has one execution as a result. It isn't hard to see that there are far more rows touched in this plan compared to the hash join, and thus it shouldn't be a huge surprise that this plan runs more slowly.

Note: When comparing the estimated vs. actual number of rows, it is important to remember that the actual counts need to be divded by the actual number of executions to get a value that is comparable to the estimated number of rows returned. The estimate is the per-invocation estimate.

As a more complicated example, we can try something with a few more operators and see how things work on one of the TPC-H benchmark queries (Query 8, for those who are interested, on a small-scale 100MB database):

SELECT    O_YEAR, 



SUM(CASE    WHEN    NATION    = 'MOROCCO'  



            THEN    VOLUME  



            ELSE    0  



            END) / SUM(VOLUME)    AS MKT_SHARE 



FROM    (    SELECT    datepart(yy,O_ORDERDATE)        AS O_YEAR, 



            L_EXTENDEDPRICE * (1-L_DISCOUNT)    AS VOLUME, 



            N2.N_NAME                AS NATION 



        FROM    PART,  



            SUPPLIER,  



            LINEITEM,  



            ORDERS,  



            CUSTOMER,  



            NATION N1,  



            NATION N2,  



            REGION 



        WHERE    P_PARTKEY    = L_PARTKEY AND 



            S_SUPPKEY    = L_SUPPKEY AND 



            L_ORDERKEY    = O_ORDERKEY AND 



            O_CUSTKEY    = C_CUSTKEY AND 



            C_NATIONKEY    = N1.N_NATIONKEY AND 



            N1.N_REGIONKEY    = R_REGIONKEY AND 



            R_NAME        = 'AFRICA' AND 



            S_NATIONKEY    = N2.N_NATIONKEY AND 



            O_ORDERDATE    BETWEEN '1995-01-01' AND '1996-12-31' AND 



            P_TYPE        = 'PROMO BURNISHED NICKEL' AND 



            L_SHIPDATE    >= CONVERT(datetime,(1156)*(30),121) AND



 



                L_SHIPDATE    < CONVERT(datetime,((1185)+(1))*(30),121)     



    )    AS    ALL_NATIONS 



GROUP    BY    O_YEAR 



ORDER    BY    O_YEAR 



 





As the queries get more complex, it gets harder to print them in a standard page of text. So, I've truncated the plan somewhat in this example. Notice that the same tree format still exists, and the main operators in this query are Scans, Seeks, Hash Joins, Stream Aggregates, a Sort, and a Loop Join. I've also included the actual number of rows and actual number of executions columns as well.



Rows Executes Plan

0 0 Compute Scalar(DEFINE:([Expr1028]=[Expr1026]/[Expr1027]))

2 1 |--Stream Aggregate(GROUP BY:([Expr1024]) DEFINE:([Expr1026]=SUM([par

2 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([N1].[N_REGIONKEY]

10 1 |--Stream Aggregate(GROUP BY:([Expr1024], [N1].[N_REGIONKEY

1160 1 | |--Sort(ORDER BY:([Expr1024] ASC, [N1].[N_REGIONKEY] A

1160 1 | |--Hash Match(Inner Join, HASH:([N2].[N_NATIONKEY

25 1 | |--Clustered Index Scan(OBJECT:([tpch100M].[

1160 1 | |--Hash Match(Inner Join, HASH:([N1].[N_NATI

25 1 | |--Index Scan(OBJECT:([tpch100M].[dbo].

1160 1 | |--Hash Match(Inner Join, HASH:([tpch10

1000 1 | |--Index Scan(OBJECT:([tpch100M].[

1160 1 | |--Hash Match(Inner Join, HASH:([t

1160 1 | |--Hash Match(Inner Join, HAS

1432 1 | | |--Hash Match(Inner Join

126 1 | | | |--Clustered Index

0 0 | | | |--Compute Scalar(D

224618 1 | | | |--Clustered I

0 0 | | |--Compute Scalar(DEFINE

45624 1 | | |--Clustered Index

15000 1 | |--Index Scan(OBJECT:([tpch10

2 10 |--Clustered Index Seek(OBJECT:([tpch100M].[dbo].[REGION].[

 



I'll point out a few details about the statistics profile output. Notice that the Compute Scalars (also called Projects) return zero for both columns. Since Compute Scalar always returns exactly as many rows as it is given from its child, there isn't any logic to count rows again in this operator simply for performance reasons. The zeros can be safely ignored, and the values for its child can be used instead. Another interesting detail can be seen in the last operator in this printout (the Seek into the Region table). In this operator, there are 10 executions but only 2 rows returned. This means that even though there were 10 attempts to find rows in this index, only two rows were ever found. The parent operator (the Nested Loops near the top) has 10 rows coming from its first (left) child and only 2 rows output by the operator, which matches what you see in the seek. Another interesting tidbit can be found if you look at the estimates for the Seek operator:



Est.# rows Est. #executes

1.0                                           20.106487 



The SQL Server Query Optimizer will estimate a minimum of one row coming out of a seek operator. This is done to avoid the case when a very expensive subtree is picked due to an cardinality underestimation. If the subtree is estimated to return zero rows, many plans cost about the same and there can be errors in plan selection as a result. So, you'll notice that the estimation is "high" for this case, and some errors could result. You also might notice that we estimate 20 executions of this branch instead of the actual 10. However, given the number of joins that have been evaluated before this operator, being off by a factor of 2 (10 rows) isn't considered to be too bad. (Errors can increase exponentially with the number of joins).




SQL Server supports executing query plans in parallel. Parallelism can add complexity to the statistics profile output as there are different kinds of parallelism that have different impacts on the counters for each operator. Parallel Scans exist at the leaves of the tree, and these will count all rows from the table into each thread even through each thread only returns a fraction of the rows. The number of executions (the second column in the output) will also have 1 execution for each thread. So, it is typical to just divide the number of threads into the total number of rows to see how many rows were actually returned by the table. Parallel zones higher in the tree usually work the same way. These will have N (where N is the degree of parallelism) more executions than the equivalent non-parallel query. There are a few cases where we will broadcast one row to multiple threads. If you examine the type of the parallelism exchange operation, you can identify these cases and notice that one row becomes multiple rows through the counts in the statistics profile results.



The most common use of the statistics profile output is to identify areas where the Optimizer may be seeing and using incomplete or incorrect information. This is often the root cause of many performance problems in queries. If you can identify areas where the estimated and actual cardinality values are far apart, then you likely have found a reason why the Optimizer is not returning the "best" plan. The reasons for the estimate being incorrect can vary, but it can include missing or out-of-date statistics, too low of a sample rate on those statistics, correlations between data columns, or use of operators outside of the optimizer's statistical model, to name a few common cases.