Skip to main content

MS SQL Server Devlopment Standards



This blog provides general policies, procedures and guidelines for Microsoft SQL Server 2005 and 2008 object design and query development.  This document is currently in DRAFT and will continue to evolve.  Most footnote references are currently SQL 2000 related.  When these are updated the status of this document will be changed to FINAL.

Global Standards

Object Nomenclature

All object names should reflect the nature and purpose of the object.  Object names may be comprised of alphabetic, numeric and underscore (ASCII Code 95) characters only and are limited to 'sysname' allowable length (VARCHAR(30) for SQL6x, NVARCHAR(256) for SQL7, SQL2000 and SQL2005).
No user-defined object is to begin with 'sys', 'sp_', ‘fn_’ or 'xp_' as these prefixes are indicative of system objects.
MS SQL Server and ANSI SQL keywords, reserved words, abbreviations and synonyms are not allowed as object or identifier names (see: Reserved Words - Transact SQL and Appendix I).
The intent of the Nomenclature sections of this document is not to restrict naming conventions but to convey that reasonable, consistent and non-conflicting naming schemes should be considered when designing database objects.

Abbreviations

The table below is a non-exhaustive list of abbreviations for use in object and identifier names.

Abbreviation
Complete Text
Acct
Account
Acctg
Accounting
Addr
Address
Code
Code
Descr
Description
EIN
Employer ID Number
Flag
Flag
Fmt
Format
Frm
Form
ID
Identifier
Idx | ix
Index
Img
Image
Info
Information
Key
Key
loc
Location
mbr
Member
mkt
Market
mktg
Marketing
mod
Modify/Modified
msg
Message
no
Number
pgm
Program
recd
Received
ssn
Social Security Number
taxid
SSN/EIN (No Distinction)
temp (or tmp)
Temporary
txt
Text
url
Web Address (URL)
usr
User
val
Value
vnd
Vendor
clnt
Client
org
Organization
db
Database
tbl
Table
vw
View
fk
Foreign Key
uq or uk
Unique Key
col
Column
pk
Primary Key
dept
Department
mgt
Management
mgr
Manager
udf_
User-defined function


xref
Cross-Reference
xact
Transaction
xfr
Transfer
xmt
Transmit

Object standards

Databases

Naming Convention

Database names should reflect the application for which the database serves as the datastore
As an example, a database serving as the datastore for an application used in a study of earthquakes might be called "Seismo".  A database for a similar study of volcanoes might be called "Vulcan".  The shared database used by both "Seismo" and "Vulcan" that contains geographic mappings and information might be called "GeoMaps_Common". 

Tables

Naming Convention

Table names should reflect the content and/or purpose of the data contained within.
1.     Table name in the form: descriptive (single descriptive name)
a.     descriptive is a character label that describes the content and/or purpose of the data contained within (e.g. a table containing the basic metrics of volcanoes such as name, height, location, type… might be called "MainMetrics").
b.    Title case is recommended for all table names.
2.     The precision of the sysname type limits the table name length.

Restrictions/Caveats/Best Methods

1.     Multi-database queries require fully qualified table names.
2.     Tables should be schema qualified when referenced in queries. 31
3.     Cryptic or ‘coded’ table names must be avoided.
4.     Table aliases should be intuitively meaningful.  Aliases such as ‘a’, ‘x’, or ‘f1’ must be avoided.
5.     Use table variables or derived tables in lieu of temporary tables whenever possible.9

Columns

Naming Convention

Column names should reflect the content and/or purpose of the data contained within.
a.     Column name in the form: descriptive (single descriptive name)
i.      descriptive is a character label that describes the content and/or purpose of the data contained within (e.g. a column containing the heights of volcanoes might be called "Height").
ii.     The precision of the sysname data type limits the column name length.

Restrictions/Caveats/Best Methods


1.     Multi-table queries require fully qualified or aliased column names.
2.     Cryptic or ‘coded’ column names must be avoided.
3.     Generic and non-descriptive column names (i.e. ID, GUID, Description, Name, ParentID without a corresponding ChildID) must avoided.
4.     Column Data Types.
a.     VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) should be used in lieu of TEXT, NTEXT and IMAGE data types, respectively.  These data types will be deprecated.
b.    The usage of the CHAR(n) data type should be restricted to fixed-length values or variable length values of less than seven (7) to ten (10) characters.  VARCHAR(n) should be used for longer variable length columns.
c.     Use CHAR() or VARCHAR() in lieu of NCHAR() or NVARCHAR() when values are known to exclude Unicode data.
d.    The scale specified for VARCHAR() and NVARCHAR() should be 2 less than a binary power (e.g. 254, 510, 1022)
e.     Numeric data types, excluding BIT, binary and varbinary, should be used exclusively for values upon which arithmetic operations will be performed (i.e. Social Security Numbers and Zip Codes are not numeric values).  Integer data types are allowable for IDENTITY columns.
f.     The scale and precision of numeric data types should be selected according to the data to be stored (e.g. a column storing a “rating” whose values range from one (1) to ten (10) should use the data type TINYINT, not INT).
g.    The DECIMAL() data type should be used in lieu of FLOAT any time the scale and precision of the values can be determined.
h.     Avoid using datatype(MAX) unless the scale will exceed the specifiable maximum (CHAR()/VARCHAR() =  8000, NCHAR()/NVARCHAR() = 4000) .
i.      Place datatype(MAX) on separate filegroups using the TEXTIMAGE_ON <filegroup> clause in the CREATE TABLE statement.
j.      The UNIQUEIDENTIFIER datatype populated by NEWID() or a similar mechanism is a poor choice for inclusion as the primary column in a clustered index.  Choose a continuously incrementing value.
k.     The TIMESTAMP data type cannot be used as a true DATETIME.  Although derived from the system clock, one cannot extract a real-time value from these values.  The TIMESTAMP data type may be used for versioning purposes as it is of the ROWVERSION base type.  Use DATETIME or SMALLDATETIME for columns where actual date or time manipulation or comparison is required.
5.     Columns should be created as NOT NULL unless specifically known to require NULL’s.

Miscellaneous

CHECK constraints reject False not accept True.  CHECK (col1 = col2) will be allowed if either column is NULL.33

Stored Procedures

Naming Convention

Stored Procedure names should reflect their function or purpose.
1.     The general form of a Stored Procedure name should be pre_verbObject (or pre_objectVerb) where:
a.     pre_ is either ‘up_’ or “usp_”.
b.    verb is an action verb indicating the function to perform (i.e. get, update, delete, list, etc.).
c.     Object is the subject of the verb portion of the Stored Procedure name (i.e. Metrics, History, References, etc.).
d.    Stored procedures may be prefixed with ‘up_’ or ‘usp_’
2.     The precision of the sysname data type limits the SP name length. 

Restrictions

1.     Stored procedure may not be prefixed with “sp_” or “xp_”.

Miscellaneous

1.     Always use 'SET NOCOUNT ON' as the first executable statement in non-interactive SP’s.
a.     This statement suppresses the "n records affected" message generated by SET, SELECT, INSERT, UPDATE and DELETE statements.  Suppressing these messages will increase the performance of the SP as well as prevent spurious 1608 (A network error was encountered while sending results to the front end) errors as well as reducing network traffic.
b.    There is no need to explicitly ‘SET NOCOUNT OFF’ as the scope of the statement is the current session only.
2.     All stored procedures must contain documentation.
a.     Header Documentation should be placed after the AS clause of the CREATE PROCEDURE statement within Block Comment /*   */ (forward-slash, asterisk – asterisk, forward-slash) delimiters.  Header Documentation is required for all production SP’s (see Header Documentation Example).
i.      Name:  the operational name of the SP.
ii.     Purpose:  a brief description of the functionality of the SP.
iii.    Parameters:  a description of the names, datatypes and values of parameters passed to/from the SP.
iv.    Return Values:  a description of the datatypes and values returned by the SP.
v.     Author:  the name of the original author of the SP.
vi.    History:  the date of the revision, the initials of the reviser and a description of the revision(s) made.  This information should appear before the revision within the code of the SP as well.
vii.   Additional miscellaneous header documentation should be included as appropriate.
b.    Body Documentation should be included as necessary to describe the functionality of statements or statement blocks that may not be readily apparent to one unfamiliar with the application or purpose for which the SP was written.
c.     Single line, in-line or nested documentation (within a statement) should be placed on the line preceding the subject of the documentation or on the same line following the subject of the documentation using ( -- ) as the comment delimiter. 
d.    Multi-line documentation should be placed immediately before the subject of the documentation within Block Comment /*   */ (forward-slash, asterisk – asterisk, forward-slash) delimiters.  An example of multi-line follows:
/*
    Multi-line comments are typically 3 or more lines.
    Placing the begin and end delimiters on lines by themselves and
    indenting the comment text improves readability.
*/
3.     The GOTO statement should not be used.
4.     PRINT statements should be avoided unless the client application is able to respond to/accept the output. 24, 25
5.     Avoid including business logic in stored procedure code.  DBMS’s are optimized for queries not logical branching, string formatting, etc. 23
a.     Minimize IF-THEN-ELSE statements.
b.    Avoid string formatting, concatenation and conversion.
c.     Avoid computations.

Keys

Physical Location

Restrictions/Caveats/Best Methods

1.     IDENTITY columns should not be used unless a child relationship (i.e. a foreign key referencing that column) to that column exists.
2.     Key values should not be hard-coded into scripts or stored procedures.
3.     Carefully examine the implications when considering the use of a unique identifier (GUID) datatype for a key value.18, 19

Indices

Physical Location

All non-clustered indices should be created in a secondary filegroup on drive sets specifically reserved for that purpose.  The current standard for the secondary (index) filegroup name and location is:
            Name:  databasename_Index
            Location:  drive:\Index\databasename_Index.ndf (additional drives may be available also)
           

Naming Convention

Non-clustered indices should begin with the prefix “idx_” or “ix_” followed by the table name and column name(s) delimited by underscore character(s).  Clustered indices should begin with the prefix “cidx_”, “cdx_” or “cx_”.

Miscellaneous


User-Defined Data Types

Naming Convention

All User-Defined Data Types should begin with the prefix “udt_” or “ut_”.

Restrictions

Miscellaneous

User-Defined Functions

Naming Convention

All User-Defined Functions should begin with the prefix “udf_” or “uf_”.

Restrictions

User-Defined Function may not be prefixed with “fn_”.

Miscellaneous

Views

Naming Convention

All views should begin with the prefix “vw_”.

Miscellaneous

Views definitions should include result set column name(s).  (e.g. CREATE VIEW vw_myView (col1, col2,… AS SELECT…)

Restrictions

The use of the TOP and ORDER BY clauses should be avoided in views.  This syntax is not ANSI standard, may be deprecated in SQL Server and does not guarantee the order of the result set.34

Cursors

Naming Convention

Cursors names should be declared either prefixed with “cur” or “cur_” or suffixed with “_cursor”.

Restrictions

Miscellaneous

1.     Cursors should be avoided whenever possible.  There are very few instances that cursor operations cannot be replaced by equivalent INSERT, SELECT, UPDATE or DELETE statements.  The use of Common Table Expressions should be considered in lieu of cursors when constructing recursive or iterative processes.  Modern RDBMS’s are set based systems not ISAM file management systems.1, 6, 7, 11, 13, 15, 17, 21
2.     Cursors in client programs require maintaining state on the database server.  Maintaining the client state uses memory and/or temporary database resources for each client or instance of a cursor.  This is one reason cursors lack scalability.1, 2, 5
3.     Cursors in client programs perform multiple requests to the database server to service the client unnecessarily burdening the network.  This is another reason cursors lack scalability. 1, 2
4.     When used, cursors should be defined as FAST_FORWARD or INSENSITIVE and FOR READ ONLY whenever possible.  When used for positional updates/deletes the least amount of locking to accomplish the task will result in more efficient execution.17

Roles

Rules

Defaults

SQL-99 Adherance

SET & SELECT

The SELECT statement should be used for queries only.  The SET statement should be used for the assignment of a value to a variable.

CAST() replaces CONVERT()

In all cases other than formatted datetime conversion to CHAR-based values (e.g. CONVERT(CHAR(10), GETDATE(), 112) the CAST() function should be used for datatype conversion.

JOIN Syntax

Prior versions of SQL Server allowed JOIN conditions ( =, *=, =*) to be specified in the WHERE clause of a query.  This syntax is no longer supported and will lead to inaccurate results.  The JOIN clause and associated syntax should be used explicitly.
The use of RIGHT OUTER JOIN should be minimized.  All queries utilizing RIGHT joins can be rewritten as LEFT OUTER JOINs.32

NULL

NULL can exist as one of four (4) possible conditions or values:  missing, unknown, miscellaneous or inapplicable.  Consider the following example: 
…I am looking at a carton of Easter eggs and I want to know their colors.  If I see an empty hole, I have a missing egg…  If I see a foil-wrapped egg, I have an unknown color value in my set.  If I see a multicolored egg, I have a miscellaneous value in my set.  If I see a cue ball, I have an inapplicable value in my set.3
Additionally, NULL will propagate to the result in arithmetic, logical and concatenation operations (i.e. 5 + NULL = NULL, ‘Hello’ + NULL = NULL and X AND NULL = NULL).20, 32

System table queries

Whenever possible, INFORMATION_SCHEMA views, Dynamic Management views and system functions (e.g. DATABASEPROPERTY(), OBJECTPROPERTY()) should be used in lieu of direct queries against the system tables.  See BOL topics “Information Schema Views” and “Meta Data Functions” for additional information.  Queries against the system tables are not allowed in business application queries.

GETDATE() vs. CURRENT_TIMESTAMP

Use the ANSI Standard CURRENT_TIMESTAMP in lieu of GETDATE().

ISNULL() vs. COALESCE()

Use the ANSI Standard COALESCE() function in lieu of ISNULL().  COALESCE() functions similarly to ISNULL() and allows the inclusion of multiple values.

ORDER BY/GROUP BY

The use of positional numbers in the ORDER BY and GROUP BY clauses of SQL statements is deprecated in SQL-92 (i.e. this feature will no longer be supported in SQL-99) and should not be used in new code and should be updated to column references in existing code.4
The use of TOP 100 PERCENT…ORDER BY does not guarantee a presentation order.  ORDER BY is only effective in the outermost query block. 34

Rules

SQL Server rules may be implemented through the use CHECK constraints.  As CHECK constraints are defined in the SQL-99 standard they are the preferred method.  Rules have been deprecated in SQL2005.

!  - used as ‘not’

The character “!” (ASCII Code 33) is not an acceptable operator modifier.  !<, !> and != are expressed as >=, <= and <>, respectively.37

PERFORMANCE10

Dynamic SQL vs. Stored Procedure

Avoid the use of dynamic SQL wherever possible.  The additional overhead, risk and hard-coupling to the application negates its flexibility in most cases. 26, 30

COUNT()

The COUNT() function is optimized for COUNT(*) in “SELECT COUNT(*)…” versus “SELECT COUNT(column_name)…” or “SELECT COUNT(constant)” queries.8

EXISTS() vs. COUNT()

The EXISTS() functions returns after the first occurrence matching the query criteria whereas COUNT() must compute an exact answer for the criteria.8

Formatting data in queries

Avoid including data formatting functions in queries.  Presentation is the function of the User Interface.  The additional processing required for formatting data is better spent at the client, as the database is not optimized for these functions.  Eliminating formatting from the query allows greater flexibility in the use of the query.

Functions in WHERE clauses

The use of functions in the WHERE clause may preclude the use of indices in the execution plan.  It is better to manipulate the predicate value prior to its inclusion in the query.8
Example: 
            SELECT * FROM aTable WHERE DATEDIFF(day, theDate, CURRENT_TIMESTAMP) < 3                          - vs. -
DECLARE @afterDate DATETIME
SET @afterDate = DATEADD(day, -3, CURRENT_TIMESTAMP)
SELECT * FROM aTable WHERE theDate < @afterDate

Search Arguments (SARGs)

IN and NOT IN

The use of IN and NOT IN, when used in conjunction with a query, should be avoided.  Utilizing JOIN syntax to accomplish the task will result in improved performance particularly on larger tables/result sets.8, 12

NULL

NULL values in columns of fixed length datatypes occupy the entire scale of the column just as any other data in a fixed length (i.e. CHAR(), DECIMAL(), etc) column.  Consider variable length datatypes for columns of scale greater than approximately eight to sixteen bytes.14
IS NULL in a WHERE clause will preclude the use of an index forcing a table scan.14
The use of NULL may lead to convoluted, and consequently, inefficient or buggy DML.14
Ideally, NULL should be avoided.14

Qualifying Objects

To avoid excessive query/SP recompiles, fully qualify all objects.16

UDF’s

Avoid using UDF’s to perform query operations as they tend to become ‘cursor-like’ in operation.  Avoid using UDF’s in the SELECT or WHERE clauses of queries which return large numbers of rows.  Execution plans of queries containing UDF’s do not consider the actual cost of the UDF.  The results generated for SET SHOW_STATISTICS_IO ON does not include the I/O associated with UDF execution incorporated into queries.27, 28

EAV, OTLT and MUCK tables

Avoid Entity-Attribute Value (EAV), One True Lookup Table (OTLT) and Massively Unified Code-Key (MUCK) tables.  The utilization of these forms lead to non-relational generalization resulting in exceedingly complex queries and very limited scalability.35, 36

 

Appendix

Header Documentation Example

/*
Name:       dbo.usp_dbMaint9
Author:      Arturo T. de Lamerens
Version:    MS SQL Server 2005
Function:   Perform automated cyclical DATABASE, DIFFERENTIAL & TRANSACTION LOG backups for all databases present on a SQL Server.
Returns:    Nothing
Called by:  SQL Agent Task engine
Usage:       EXEC dbo.usp_backup8 <parmList>
Strategy:   Utilizing 2 backup devices differentiated by suffixing with zero (0) or one (1); perform a cyclical backup strategy.  All data required for the execution of the backups are extracted from the SQL Server system functions/views.  DBA/SysOp intervention should be required only in the event of a failure.
                  1. On Sunday before 00:30 (12:30am)
                       a. Perform a DBCC CHECKDB
                  <…snip…>
Issues:      1.  The transition from week 53 to week 1 will re-use the same backup device; this behavior has been defeated in code.
             2.  Ultimately, the time required to complete the backups will exceed the Scheduled Task interval.  This will require the segmentation of this SP by some means -- research to circumvent this issue is on-going.
History:    2001.02.27 ATL - Global modifications for SQL8 (SQL2000)
                2001.03.01 ATL - Added path/file existence verification.
                                           - Added DATABASEPROPERTYEX('dbname', 'Recovery') checking in lieu of system table queries.
                2001.03.09 ATL - Enhanced type verification for @logTbl via udf_objectType
*/

Additional Information

Database 121 - Introduction to SQL, www.free-ed.net.

Comments

Popular posts from this blog

ODI KM Adding Order by Option

You can add Order by statement to queries by editing KM.I have edited IKM SQL Control Append to provide Order by.  1) Add an option to KM named USE_ORDER_BY, its type is Checkbox and default value is False. This option determines you want an order by statement at your query. 2)Add second option to KM named ORDER_BY, type is Text. You will get order by values to your query by this option. 3) Editing Insert New Rows detail of KM. Adding below three line code after having clause. That's it! <% if (odiRef.getOption("USE_ORDER_ BY").equals("1")) { %> ORDER BY <%=odiRef.getOption("ORDER_BY" )%> <%} %>  If USE_ORDER_BY option is not used, empty value of ORDER_BY option get error. And executions of KM appears as such below; At this execution, I checked the KM to not get errors if ORDER_BY option value is null. There is no prove of ORDER BY I'm glad.  Second execution to get  Ord

Creating Yellow Interface in ODI

Hello everyone! In Oracle data integrator (ODI), an  interface  is an object which populates one datastore, called the  target , with data coming from one or more other datastores, known as  sources . The fields of the source datastore are linked to those in the target datastore using the concept of  Mapping . Temporary interfaces used in ODI are popularly known as  Yellow Interfaces . It is because ODI generates a yellow icon at the time of creation of a yellow interface as opposed to the blue icon of a regular interface. The advantage of using a yellow interface is to avoid the creation of  Models each time you need to use it in an interface. Since they are temporary, they are not a part of the data model and hence don’t need to be in the Model. So let’s begin and start creating our yellow interface! Pre-requisites : Oracle 10g Express Edition with *SQL Plus, Oracle Data Integrator 11g. Open *SQL Plus and create a new table  Sales  in Oracle. You can use any existing ta

Running Count in Talend Open Studio

Most Talend components keep a count of the records processed using variables like NB_LINE or NB_LINE_OK.  But these are only available after all processing is completed.  Define your own counter variable to keep a running count for use in a tMap. Variables like tFilterRow.NB_LINE or tAccessOutput.NB_LINE_INSERTED can be used to report the number of affected lines after a subjob's processing.  However, it may be of use to get the current line index for use in a tMap.  The index variables used to form NB_LINE aren't available during processing; they're only written out the globalMap at the end of processing. In this example, staging records are loaded from Excel to Access.  The order in which the Excel records are read is preserved in a database column called DISPLAY_SEQ_NB.  Note that there is an auto-increment column used for record ID in the Access table.  This could be used to infer a loading order, but this job uses a separate column to keep the ID as a meaningless surr