MS SQL Server Devlopment Standards

This blog provides general policies, procedures and guidelines for Microsoft SQL Server 2005 and 2008 object design and query development. This document is currently in DRAFT and will continue to evolve. Most footnote references are currently SQL 2000 related. When these are updated the status of this document will be changed to FINAL.

Global Standards

Object Nomenclature

All object names should reflect the nature and purpose of the object. Object names may be comprised of alphabetic, numeric and underscore (ASCII Code 95) characters only and are limited to 'sysname' allowable length (VARCHAR(30) for SQL6x, NVARCHAR(256) for SQL7, SQL2000 and SQL2005).

No user-defined object is to begin with 'sys', 'sp_', ‘fn_’ or 'xp_' as these prefixes are indicative of system objects.

MS SQL Server and ANSI SQL keywords, reserved words, abbreviations and synonyms are not allowed as object or identifier names (see: Reserved Words - Transact SQL and Appendix I).

The intent of the Nomenclature sections of this document is not to restrict naming conventions but to convey that reasonable, consistent and non-conflicting naming schemes should be considered when designing database objects.

Abbreviations

The table below is a non-exhaustive list of abbreviations for use in object and identifier names.

Abbreviation	Complete Text
Acct	Account
Acctg	Accounting
Addr	Address
Code	Code
Descr	Description
EIN	Employer ID Number
Flag	Flag
Fmt	Format
Frm	Form
ID	Identifier
Idx \| ix	Index
Img	Image
Info	Information
Key	Key
loc	Location
mbr	Member
mkt	Market
mktg	Marketing
mod	Modify/Modified
msg	Message
no	Number
pgm	Program
recd	Received
ssn	Social Security Number
taxid	SSN/EIN (No Distinction)
temp (or tmp)	Temporary
txt	Text
url	Web Address (URL)
usr	User
val	Value
vnd	Vendor
clnt	Client
org	Organization
db	Database
tbl	Table
vw	View
fk	Foreign Key
uq or uk	Unique Key
col	Column
pk	Primary Key
dept	Department
mgt	Management
mgr	Manager
udf_	User-defined function

xref	Cross-Reference
xact	Transaction
xfr	Transfer
xmt	Transmit

Object standards

Databases

Naming Convention

Database names should reflect the application for which the database serves as the datastore

As an example, a database serving as the datastore for an application used in a study of earthquakes might be called "Seismo". A database for a similar study of volcanoes might be called "Vulcan". The shared database used by both "Seismo" and "Vulcan" that contains geographic mappings and information might be called "GeoMaps_Common".

Tables

Naming Convention

Table names should reflect the content and/or purpose of the data contained within.

1. Table name in the form: descriptive (single descriptive name)

a. descriptive is a character label that describes the content and/or purpose of the data contained within (e.g. a table containing the basic metrics of volcanoes such as name, height, location, type… might be called "MainMetrics").

b. Title case is recommended for all table names.

2. The precision of the sysname type limits the table name length.

Restrictions/Caveats/Best Methods

1. Multi-database queries require fully qualified table names.

2. Tables should be schema qualified when referenced in queries.³¹

3. Cryptic or ‘coded’ table names must be avoided.

4. Table aliases should be intuitively meaningful. Aliases such as ‘a’, ‘x’, or ‘f1’ must be avoided.

5. Use table variables or derived tables in lieu of temporary tables whenever possible.⁹

Columns

Naming Convention

Column names should reflect the content and/or purpose of the data contained within.

a. Column name in the form: descriptive (single descriptive name)

i. descriptive is a character label that describes the content and/or purpose of the data contained within (e.g. a column containing the heights of volcanoes might be called "Height").

ii. The precision of the sysname data type limits the column name length.

Restrictions/Caveats/Best Methods

1. Multi-table queries require fully qualified or aliased column names.

2. Cryptic or ‘coded’ column names must be avoided.

3. Generic and non-descriptive column names (i.e. ID, GUID, Description, Name, ParentID without a corresponding ChildID) must avoided.

4. Column Data Types.

a. VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) should be used in lieu of TEXT, NTEXT and IMAGE data types, respectively. These data types will be deprecated.

b. The usage of the CHAR(n) data type should be restricted to fixed-length values or variable length values of less than seven (7) to ten (10) characters. VARCHAR(n) should be used for longer variable length columns.

c. Use CHAR() or VARCHAR() in lieu of NCHAR() or NVARCHAR() when values are known to exclude Unicode data.

d. The scale specified for VARCHAR() and NVARCHAR() should be 2 less than a binary power (e.g. 254, 510, 1022)

e. Numeric data types, excluding BIT, binary and varbinary, should be used exclusively for values upon which arithmetic operations will be performed (i.e. Social Security Numbers and Zip Codes are not numeric values). Integer data types are allowable for IDENTITY columns.

f. The scale and precision of numeric data types should be selected according to the data to be stored (e.g. a column storing a “rating” whose values range from one (1) to ten (10) should use the data type TINYINT, not INT).

g. The DECIMAL() data type should be used in lieu of FLOAT any time the scale and precision of the values can be determined.

h. Avoid using datatype(MAX) unless the scale will exceed the specifiable maximum (CHAR()/VARCHAR() = 8000, NCHAR()/NVARCHAR() = 4000) .

i. Place datatype(MAX) on separate filegroups using the TEXTIMAGE_ON <filegroup> clause in the CREATE TABLE statement.

j. The UNIQUEIDENTIFIER datatype populated by NEWID() or a similar mechanism is a poor choice for inclusion as the primary column in a clustered index. Choose a continuously incrementing value.

k. The TIMESTAMP data type cannot be used as a true DATETIME. Although derived from the system clock, one cannot extract a real-time value from these values. The TIMESTAMP data type may be used for versioning purposes as it is of the ROWVERSION base type. Use DATETIME or SMALLDATETIME for columns where actual date or time manipulation or comparison is required.

5. Columns should be created as NOT NULL unless specifically known to require NULL’s.

Miscellaneous

CHECK constraints reject False not accept True. CHECK (col1 = col2) will be allowed if either column is NULL.³³

Stored Procedures

Naming Convention

Stored Procedure names should reflect their function or purpose.

1. The general form of a Stored Procedure name should be pre_verbObject (or pre_objectVerb) where:

a. pre_ is either ‘up_’ or “usp_”.

b. verb is an action verb indicating the function to perform (i.e. get, update, delete, list, etc.).

c. Object is the subject of the verb portion of the Stored Procedure name (i.e. Metrics, History, References, etc.).

d. Stored procedures may be prefixed with ‘up_’ or ‘usp_’

2. The precision of the sysname data type limits the SP name length.

Restrictions

1. Stored procedure may not be prefixed with “sp_” or “xp_”.

Miscellaneous

1. Always use 'SET NOCOUNT ON' as the first executable statement in non-interactive SP’s.

a. This statement suppresses the "n records affected" message generated by SET, SELECT, INSERT, UPDATE and DELETE statements. Suppressing these messages will increase the performance of the SP as well as prevent spurious 1608 (A network error was encountered while sending results to the front end) errors as well as reducing network traffic.

b. There is no need to explicitly ‘SET NOCOUNT OFF’ as the scope of the statement is the current session only.

2. All stored procedures must contain documentation.

a. Header Documentation should be placed after the AS clause of the CREATE PROCEDURE statement within Block Comment /* */ (forward-slash, asterisk – asterisk, forward-slash) delimiters. Header Documentation is required for all production SP’s (see Header Documentation Example).

i. Name: the operational name of the SP.

ii. Purpose: a brief description of the functionality of the SP.

iii. Parameters: a description of the names, datatypes and values of parameters passed to/from the SP.

iv. Return Values: a description of the datatypes and values returned by the SP.

v. Author: the name of the original author of the SP.

vi. History: the date of the revision, the initials of the reviser and a description of the revision(s) made. This information should appear before the revision within the code of the SP as well.

vii. Additional miscellaneous header documentation should be included as appropriate.

b. Body Documentation should be included as necessary to describe the functionality of statements or statement blocks that may not be readily apparent to one unfamiliar with the application or purpose for which the SP was written.

c. Single line, in-line or nested documentation (within a statement) should be placed on the line preceding the subject of the documentation or on the same line following the subject of the documentation using ( -- ) as the comment delimiter.

d. Multi-line documentation should be placed immediately before the subject of the documentation within Block Comment /* */ (forward-slash, asterisk – asterisk, forward-slash) delimiters. An example of multi-line follows:

Multi-line comments are typically 3 or more lines.

Placing the begin and end delimiters on lines by themselves and

indenting the comment text improves readability.

3. The GOTO statement should not be used.

4. PRINT statements should be avoided unless the client application is able to respond to/accept the output.^{24, 25}

5. Avoid including business logic in stored procedure code. DBMS’s are optimized for queries not logical branching, string formatting, etc.²³

a. Minimize IF-THEN-ELSE statements.

b. Avoid string formatting, concatenation and conversion.

c. Avoid computations.

Keys

Physical Location

Restrictions/Caveats/Best Methods

1. IDENTITY columns should not be used unless a child relationship (i.e. a foreign key referencing that column) to that column exists.

2. Key values should not be hard-coded into scripts or stored procedures.

3. Carefully examine the implications when considering the use of a unique identifier (GUID) datatype for a key value.^18,
19

Indices

Physical Location

All non-clustered indices should be created in a secondary filegroup on drive sets specifically reserved for that purpose. The current standard for the secondary (index) filegroup name and location is:

Name: databasename_Index

Location: drive:\Index\databasename_Index.ndf (additional drives may be available also)

Naming Convention

Non-clustered indices should begin with the prefix “idx_” or “ix_” followed by the table name and column name(s) delimited by underscore character(s). Clustered indices should begin with the prefix “cidx_”, “cdx_” or “cx_”.

Miscellaneous

User-Defined Data Types

Naming Convention

All User-Defined Data Types should begin with the prefix “udt_” or “ut_”.

Restrictions

Miscellaneous

User-Defined Functions

Naming Convention

All User-Defined Functions should begin with the prefix “udf_” or “uf_”.

Restrictions

User-Defined Function may not be prefixed with “fn_”.

Miscellaneous

Views

Naming Convention

All views should begin with the prefix “vw_”.

Miscellaneous

Views definitions should include result set column name(s). (e.g. CREATE VIEW vw_myView (col1, col2,… AS SELECT…)

Restrictions

The use of the TOP and ORDER BY clauses should be avoided in views. This syntax is not ANSI standard, may be deprecated in SQL Server and does not guarantee the order of the result set.³⁴

Cursors

Naming Convention

Cursors names should be declared either prefixed with “cur” or “cur_” or suffixed with “_cursor”.

Restrictions

Miscellaneous

1. Cursors should be avoided whenever possible. There are very few instances that cursor operations cannot be replaced by equivalent INSERT, SELECT, UPDATE or DELETE statements. The use of Common Table Expressions should be considered in lieu of cursors when constructing recursive or iterative processes. Modern RDBMS’s are set based systems not ISAM file management systems.^{1,
6, 7, 11, 13, 15, 17, 21}

2. Cursors in client programs require maintaining state on the database server. Maintaining the client state uses memory and/or temporary database resources for each client or instance of a cursor. This is one reason cursors lack scalability.^{1,
2, 5}

3. Cursors in client programs perform multiple requests to the database server to service the client unnecessarily burdening the network. This is another reason cursors lack scalability.^{1, 2}

4. When used, cursors should be defined as FAST_FORWARD or INSENSITIVE and FOR READ ONLY whenever possible. When used for positional updates/deletes the least amount of locking to accomplish the task will result in more efficient execution.¹⁷

Roles

Rules

Defaults

SQL-99 Adherance

SET & SELECT

The SELECT statement should be used for queries only. The SET statement should be used for the assignment of a value to a variable.

CAST() replaces CONVERT()

In all cases other than formatted datetime conversion to CHAR-based values (e.g. CONVERT(CHAR(10), GETDATE(), 112) the CAST() function should be used for datatype conversion.

JOIN Syntax

Prior versions of SQL Server allowed JOIN conditions ( =, *=, =*) to be specified in the WHERE clause of a query. This syntax is no longer supported and will lead to inaccurate results. The JOIN clause and associated syntax should be used explicitly.

The use of RIGHT OUTER JOIN should be minimized. All queries utilizing RIGHT joins can be rewritten as LEFT OUTER JOINs.³²

NULL

NULL can exist as one of four (4) possible conditions or values: missing, unknown, miscellaneous or inapplicable. Consider the following example:

…I am looking at a carton of Easter eggs and I want to know their colors. If I see an empty hole, I have a missing egg… If I see a foil-wrapped egg, I have an unknown color value in my set. If I see a multicolored egg, I have a miscellaneous value in my set. If I see a cue ball, I have an inapplicable value in my set.³

Additionally, NULL will propagate to the result in arithmetic, logical and concatenation operations (i.e. 5 + NULL = NULL, ‘Hello’ + NULL = NULL and X AND NULL = NULL).^{20, 32}

System table queries

Whenever possible, INFORMATION_SCHEMA views, Dynamic Management views and system functions (e.g. DATABASEPROPERTY(), OBJECTPROPERTY()) should be used in lieu of direct queries against the system tables. See BOL topics “Information Schema Views” and “Meta Data Functions” for additional information. Queries against the system tables are not allowed in business application queries.

GETDATE() vs. CURRENT_TIMESTAMP

Use the ANSI Standard CURRENT_TIMESTAMP in lieu of GETDATE().

ISNULL() vs. COALESCE()

Use the ANSI Standard COALESCE() function in lieu of ISNULL(). COALESCE() functions similarly to ISNULL() and allows the inclusion of multiple values.

ORDER BY/GROUP BY

The use of positional numbers in the ORDER BY and GROUP BY clauses of SQL statements is deprecated in SQL-92 (i.e. this feature will no longer be supported in SQL-99) and should not be used in new code and should be updated to column references in existing code.⁴

The use of TOP 100 PERCENT…ORDER BY does not guarantee a presentation order. ORDER BY is only effective in the outermost query block.³⁴

Rules

SQL Server rules may be implemented through the use CHECK constraints. As CHECK constraints are defined in the SQL-99 standard they are the preferred method. Rules have been deprecated in SQL2005.

! - used as ‘not’

The character “!” (ASCII Code 33) is not an acceptable operator modifier. !<, !> and != are expressed as >=, <= and <>, respectively.³⁷

PERFORMANCE¹⁰

Dynamic SQL vs. Stored Procedure

Avoid the use of dynamic SQL wherever possible. The additional overhead, risk and hard-coupling to the application negates its flexibility in most cases.^26,
30

COUNT()

The COUNT() function is optimized for COUNT(*) in “SELECT COUNT(*)…” versus “SELECT COUNT(column_name)…” or “SELECT COUNT(constant)” queries.⁸

EXISTS() vs. COUNT()

The EXISTS() functions returns after the first occurrence matching the query criteria whereas COUNT() must compute an exact answer for the criteria.⁸

Formatting data in queries

Avoid including data formatting functions in queries. Presentation is the function of the User Interface. The additional processing required for formatting data is better spent at the client, as the database is not optimized for these functions. Eliminating formatting from the query allows greater flexibility in the use of the query.

Functions in WHERE clauses

The use of functions in the WHERE clause may preclude the use of indices in the execution plan. It is better to manipulate the predicate value prior to its inclusion in the query.⁸

Example:

SELECT * FROM aTable WHERE DATEDIFF(day, theDate, CURRENT_TIMESTAMP) < 3 - vs. -

DECLARE @afterDate DATETIME
SET @afterDate = DATEADD(day, -3, CURRENT_TIMESTAMP)
SELECT * FROM aTable WHERE theDate < @afterDate

Search Arguments (SARGs)

IN and NOT IN

The use of IN and NOT IN, when used in conjunction with a query, should be avoided. Utilizing JOIN syntax to accomplish the task will result in improved performance particularly on larger tables/result sets.^{8, 12}

NULL

NULL values in columns of fixed length datatypes occupy the entire scale of the column just as any other data in a fixed length (i.e. CHAR(), DECIMAL(), etc) column. Consider variable length datatypes for columns of scale greater than approximately eight to sixteen bytes.¹⁴

IS NULL in a WHERE clause will preclude the use of an index forcing a table scan.¹⁴

The use of NULL may lead to convoluted, and consequently, inefficient or buggy DML.¹⁴

Ideally, NULL should be avoided.¹⁴

Qualifying Objects

To avoid excessive query/SP recompiles, fully qualify all objects.¹⁶

UDF’s

Avoid using UDF’s to perform query operations as they tend to become ‘cursor-like’ in operation. Avoid using UDF’s in the SELECT or WHERE clauses of queries which return large numbers of rows. Execution plans of queries containing UDF’s do not consider the actual cost of the UDF. The results generated for SET SHOW_STATISTICS_IO ON does not include the I/O associated with UDF execution incorporated into queries.^{27, 28}

EAV, OTLT and MUCK tables

Avoid Entity-Attribute Value (EAV), One True Lookup Table (OTLT) and Massively Unified Code-Key (MUCK) tables. The utilization of these forms lead to non-relational generalization resulting in exceedingly complex queries and very limited scalability.^{35, 36}

Appendix

Header Documentation Example

Name: dbo.usp_dbMaint9

Author: Arturo T. de Lamerens

Version: MS SQL Server 2005

Function: Perform automated cyclical DATABASE, DIFFERENTIAL & TRANSACTION LOG backups for all databases present on a SQL Server.

Returns: Nothing

Called by: SQL Agent Task engine

Usage: EXEC dbo.usp_backup8 <parmList>

Strategy: Utilizing 2 backup devices differentiated by suffixing with zero (0) or one (1); perform a cyclical backup strategy. All data required for the execution of the backups are extracted from the SQL Server system functions/views. DBA/SysOp intervention should be required only in the event of a failure.

1. On Sunday before 00:30 (12:30am)

a. Perform a DBCC CHECKDB

<…snip…>

Issues: 1. The transition from week 53 to week 1 will re-use the same backup device; this behavior has been defeated in code.

2. Ultimately, the time required to complete the backups will exceed the Scheduled Task interval. This will require the segmentation of this SP by some means -- research to circumvent this issue is on-going.

History: 2001.02.27 ATL - Global modifications for SQL8 (SQL2000)

2001.03.01 ATL - Added path/file existence verification.

- Added DATABASEPROPERTYEX('dbname', 'Recovery') checking in lieu of system table queries.

2001.03.09 ATL - Enhanced type verification for @logTbl via udf_objectType

Additional Information

Solutions by Design: The Physical Design, SQL Server Magazine

Solutions by Design: Logical Modeling, SQL Server Magazine

Solutions by Design: Entity Modeling, SQL Server Magazine

Solutions by Design: Process Modeling, SQL Server Magazine

Solutions by Design: Data Modeling, SQL Server Magazine

Database 121 - Introduction to SQL, www.free-ed.net.

TechWare

MS SQL Server Devlopment Standards

Global Standards

Object Nomenclature

Abbreviations

Object standards

Databases

Naming Convention

Tables

Naming Convention

Restrictions/Caveats/Best Methods

Columns

Naming Convention

Restrictions/Caveats/Best Methods

Miscellaneous

Stored Procedures

Naming Convention

Restrictions

Miscellaneous

Keys

Physical Location

Restrictions/Caveats/Best Methods

Indices

Physical Location

Naming Convention

Miscellaneous

User-Defined Data Types

Naming Convention

Restrictions

Miscellaneous

User-Defined Functions

Naming Convention

Restrictions

Miscellaneous

Views

Naming Convention

Miscellaneous

Restrictions

Cursors

Naming Convention

Restrictions

Miscellaneous

Roles

Rules

Defaults

SQL-99 Adherance

SET & SELECT

CAST() replaces CONVERT()

JOIN Syntax

NULL

System table queries

GETDATE() vs. CURRENT_TIMESTAMP

ISNULL() vs. COALESCE()

ORDER BY/GROUP BY

Rules

! - used as ‘not’

PERFORMANCE10

Dynamic SQL vs. Stored Procedure

COUNT()

EXISTS() vs. COUNT()

Formatting data in queries

Functions in WHERE clauses

Search Arguments (SARGs)

IN and NOT IN

NULL

Qualifying Objects

UDF’s

EAV, OTLT and MUCK tables

Appendix

Header Documentation Example

Additional Information

Labels

Comments

Post a Comment

Popular posts from this blog

ODI KM Adding Order by Option

Synchronous and Asynchronous execution in ODI

Oracle Data Integrator Tools:OdiSqlUnload, OdiFileCopy ,OdiZip

PERFORMANCE¹⁰