This blog provides general policies, procedures and guidelines for Microsoft
SQL Server 2005 and 2008 object design and query development. This document is currently in DRAFT and will
continue to evolve. Most footnote
references are currently SQL 2000 related.
When these are updated the status of this document will be changed to
FINAL.
Global Standards
Object Nomenclature
All object names should reflect the nature and purpose of the
object. Object names may be comprised of
alphabetic, numeric and underscore (ASCII Code 95) characters only and are
limited to 'sysname' allowable length (VARCHAR(30) for SQL6x, NVARCHAR(256) for
SQL7, SQL2000 and SQL2005).
No user-defined object is to begin with 'sys', 'sp_', ‘fn_’ or
'xp_' as these prefixes are indicative of system objects.
MS SQL Server and ANSI SQL keywords, reserved words, abbreviations
and synonyms are not allowed as object or identifier names (see: Reserved Words -
Transact SQL and Appendix I).
The intent of the Nomenclature sections of this document is not
to restrict naming conventions but to convey that reasonable, consistent and
non-conflicting naming schemes should be considered when designing database
objects.
Abbreviations
The table below is a non-exhaustive list of abbreviations for use
in object and identifier names.
Abbreviation
|
Complete Text
|
Acct
|
Account
|
Acctg
|
Accounting
|
Addr
|
Address
|
Code
|
Code
|
Descr
|
Description
|
EIN
|
Employer ID Number
|
Flag
|
Flag
|
Fmt
|
Format
|
Frm
|
Form
|
ID
|
Identifier
|
Idx | ix
|
Index
|
Img
|
Image
|
Info
|
Information
|
Key
|
Key
|
loc
|
Location
|
mbr
|
Member
|
mkt
|
Market
|
mktg
|
Marketing
|
mod
|
Modify/Modified
|
msg
|
Message
|
no
|
Number
|
pgm
|
Program
|
recd
|
Received
|
ssn
|
Social Security Number
|
taxid
|
SSN/EIN (No Distinction)
|
temp (or tmp)
|
Temporary
|
txt
|
Text
|
url
|
Web Address (URL)
|
usr
|
User
|
val
|
Value
|
vnd
|
Vendor
|
clnt
|
Client
|
org
|
Organization
|
db
|
Database
|
tbl
|
Table
|
vw
|
View
|
fk
|
Foreign Key
|
uq or uk
|
Unique Key
|
col
|
Column
|
pk
|
Primary Key
|
dept
|
Department
|
mgt
|
Management
|
mgr
|
Manager
|
udf_
|
User-defined function
|
xref
|
Cross-Reference
|
xact
|
Transaction
|
xfr
|
Transfer
|
xmt
|
Transmit
|
Object standards
Databases
Naming Convention
Database names should reflect the application for which the
database serves as the datastore
As an example, a database serving as the datastore for an
application used in a study of earthquakes might be called
"Seismo". A database for a
similar study of volcanoes might be called "Vulcan". The shared database used by both
"Seismo" and "Vulcan" that contains geographic mappings and
information might be called "GeoMaps_Common".
Tables
Naming Convention
Table names should reflect the content and/or purpose of the data
contained within.
1. Table
name in the form: descriptive (single
descriptive name)
a. descriptive is a character label that
describes the content and/or purpose of the data contained within (e.g. a table
containing the basic metrics of volcanoes such as name, height, location, type…
might be called "MainMetrics").
b. Title
case is recommended for all table names.
2. The
precision of the sysname type limits the table name length.
Restrictions/Caveats/Best Methods
1. Multi-database
queries require fully qualified table names.
2. Tables
should be schema qualified when referenced in queries. 31
3. Cryptic
or ‘coded’ table names must be avoided.
4. Table
aliases should be intuitively meaningful.
Aliases such as ‘a’, ‘x’, or ‘f1’ must be avoided.
5. Use
table variables or derived tables in lieu of temporary tables whenever
possible.9
Columns
Naming Convention
Column names should reflect the content and/or purpose of the
data contained within.
a. Column
name in the form: descriptive (single descriptive name)
i. descriptive
is a character label that describes the content and/or purpose of the data
contained within (e.g. a column containing the heights of volcanoes might be
called "Height").
ii. The
precision of the sysname data type limits the column name length.
Restrictions/Caveats/Best Methods
1.
Multi-table queries require fully qualified or
aliased column names.
2.
Cryptic or ‘coded’ column names must be avoided.
3.
Generic and non-descriptive column names (i.e.
ID, GUID, Description, Name, ParentID without a corresponding ChildID) must
avoided.
4.
Column Data Types.
a. VARCHAR(MAX),
NVARCHAR(MAX) and VARBINARY(MAX) should be used in lieu of TEXT, NTEXT and
IMAGE data types, respectively. These
data types will be deprecated.
b. The
usage of the CHAR(n) data type should be restricted to fixed-length values or
variable length values of less than seven (7) to ten (10) characters. VARCHAR(n) should be used for longer variable
length columns.
c. Use
CHAR() or VARCHAR() in lieu of NCHAR() or NVARCHAR() when values are known to
exclude Unicode data.
d. The
scale specified for VARCHAR() and NVARCHAR() should be 2 less than a binary
power (e.g. 254, 510, 1022)
e. Numeric
data types, excluding BIT, binary and varbinary, should be used exclusively for
values upon which arithmetic operations will be performed (i.e. Social Security
Numbers and Zip Codes are not numeric values).
Integer data types are allowable for IDENTITY columns.
f. The
scale and precision of numeric data types should be selected according to the
data to be stored (e.g. a column storing a “rating” whose values range from one
(1) to ten (10) should use the data type TINYINT, not INT).
g. The
DECIMAL() data type should be used in lieu of FLOAT any time the scale and
precision of the values can be determined.
h. Avoid
using datatype(MAX) unless the scale will exceed the specifiable maximum (CHAR()/VARCHAR()
= 8000, NCHAR()/NVARCHAR() = 4000) .
i. Place
datatype(MAX) on separate filegroups using the TEXTIMAGE_ON <filegroup>
clause in the CREATE TABLE statement.
j. The
UNIQUEIDENTIFIER datatype populated by NEWID() or a similar mechanism is a poor
choice for inclusion as the primary column in a clustered index. Choose a continuously incrementing value.
k. The
TIMESTAMP data type cannot be used as a true DATETIME. Although derived from the system clock, one
cannot extract a real-time value from these values. The TIMESTAMP data type may be used for
versioning purposes as it is of the ROWVERSION base type. Use DATETIME or SMALLDATETIME for columns
where actual date or time manipulation or comparison is required.
5.
Columns should be created as NOT NULL unless
specifically known to require NULL’s.
Miscellaneous
CHECK constraints reject False not accept True. CHECK (col1 = col2) will be allowed if either
column is NULL.33
Stored Procedures
Naming Convention
Stored Procedure names should reflect their function or purpose.
1.
The general form of a Stored Procedure name should
be pre_verbObject (or pre_objectVerb) where:
a. pre_ is either ‘up_’ or “usp_”.
b. verb is an action verb indicating the
function to perform (i.e. get, update, delete, list, etc.).
c. Object is the subject of the verb
portion of the Stored Procedure name (i.e. Metrics, History, References, etc.).
d. Stored
procedures may be prefixed with ‘up_’ or ‘usp_’
2.
The precision of the sysname data type limits
the SP name length.
Restrictions
1.
Stored procedure may not be prefixed with “sp_”
or “xp_”.
Miscellaneous
1.
Always use 'SET NOCOUNT ON'
as the first executable statement in non-interactive SP’s.
a. This
statement suppresses the "n records affected" message generated by
SET, SELECT, INSERT, UPDATE and DELETE statements. Suppressing these messages will increase the
performance of the SP as well as prevent spurious 1608 (A network error was
encountered while sending results to the front end) errors as well as reducing
network traffic.
b. There
is no need to explicitly ‘SET NOCOUNT OFF’ as the scope of the statement is the
current session only.
2.
All stored procedures must contain
documentation.
a. Header
Documentation should be placed after the AS clause of the CREATE PROCEDURE
statement within Block Comment /* */ (forward-slash,
asterisk – asterisk, forward-slash) delimiters.
Header Documentation is required for all production SP’s (see Header Documentation Example).
i. Name: the operational name of the SP.
ii. Purpose: a brief description of the functionality of
the SP.
iii. Parameters: a description of the names, datatypes and
values of parameters passed to/from the SP.
iv. Return
Values: a description of the datatypes
and values returned by the SP.
v. Author: the name of the original author of the SP.
vi. History: the date of the revision, the initials of the
reviser and a description of the revision(s) made. This information should appear before the
revision within the code of the SP as well.
vii. Additional
miscellaneous header documentation should be included as appropriate.
b. Body
Documentation should be included as necessary to describe the functionality of
statements or statement blocks that may not be readily apparent to one
unfamiliar with the application or purpose for which the SP was written.
c. Single
line, in-line or nested documentation (within a statement) should be placed on
the line preceding the subject of the documentation or on the same line
following the subject of the documentation using ( -- ) as the comment
delimiter.
d. Multi-line
documentation should be placed immediately before the subject of the
documentation within Block Comment /*
*/ (forward-slash, asterisk – asterisk, forward-slash) delimiters. An example of multi-line follows:
/*
Multi-line comments are typically 3 or more
lines.
Placing the begin and end delimiters on
lines by themselves and
indenting the comment text improves
readability.
*/
3.
The GOTO statement should not be used.
4.
PRINT statements should be avoided unless the
client application is able to respond to/accept the output. 24, 25
5.
Avoid including business logic in stored
procedure code. DBMS’s are optimized for
queries not logical branching, string formatting, etc. 23
a. Minimize
IF-THEN-ELSE statements.
b. Avoid
string formatting, concatenation and conversion.
c. Avoid
computations.
Keys
Physical Location
Restrictions/Caveats/Best Methods
1.
IDENTITY columns should not be used unless a
child relationship (i.e. a foreign key referencing that column) to that column
exists.
2.
Key values should not be hard-coded into scripts
or stored procedures.
3.
Carefully examine the implications when
considering the use of a unique identifier (GUID) datatype for a key value.18,
19
Indices
Physical Location
All non-clustered indices should be created in a secondary
filegroup on drive sets specifically reserved for that purpose. The current standard for the secondary
(index) filegroup name and location is:
Name: databasename_Index
Location: drive:\Index\databasename_Index.ndf
(additional drives may be available also)
Naming Convention
Non-clustered indices should begin with the prefix “idx_” or “ix_”
followed by the table name and column name(s) delimited by underscore
character(s). Clustered indices should
begin with the prefix “cidx_”, “cdx_” or “cx_”.
Miscellaneous
User-Defined Data Types
Naming Convention
All User-Defined Data Types should begin with the prefix “udt_”
or “ut_”.
Restrictions
Miscellaneous
User-Defined Functions
Naming Convention
All User-Defined Functions should begin with the prefix “udf_” or
“uf_”.
Restrictions
User-Defined Function may not be prefixed with “fn_”.
Miscellaneous
Views
Naming Convention
All views should begin with the prefix “vw_”.
Miscellaneous
Views definitions should include result set column name(s). (e.g. CREATE VIEW vw_myView (col1, col2,… AS
SELECT…)
Restrictions
The use of the TOP and ORDER BY clauses should be avoided in
views. This syntax is not ANSI standard,
may be deprecated in SQL Server and does not guarantee the order of the result
set.34
Cursors
Naming Convention
Cursors names should be declared either prefixed with “cur” or “cur_”
or suffixed with “_cursor”.
Restrictions
Miscellaneous
1.
Cursors should be avoided whenever
possible. There are very few instances
that cursor operations cannot be replaced by equivalent INSERT, SELECT, UPDATE
or DELETE statements. The use of Common
Table Expressions should be considered in lieu of cursors when constructing
recursive or iterative processes. Modern
RDBMS’s are set based systems not ISAM file management systems.1,
6, 7, 11, 13, 15, 17, 21
2.
Cursors in client programs require maintaining
state on the database server.
Maintaining the client state uses memory and/or temporary database
resources for each client or instance of a cursor. This is one reason cursors lack scalability.1,
2, 5
3.
Cursors in client programs perform multiple requests
to the database server to service the client unnecessarily burdening the
network. This is another reason cursors
lack scalability. 1, 2
4.
When used, cursors should be defined as
FAST_FORWARD or INSENSITIVE and FOR READ ONLY whenever possible. When used for positional updates/deletes the
least amount of locking to accomplish the task will result in more efficient
execution.17
Roles
Rules
Defaults
SQL-99 Adherance
SET & SELECT
The SELECT statement should be used for queries only. The SET statement should be used for the
assignment of a value to a variable.
CAST() replaces CONVERT()
In all cases other than formatted datetime conversion to
CHAR-based values (e.g. CONVERT(CHAR(10), GETDATE(), 112) the CAST() function
should be used for datatype conversion.
JOIN Syntax
Prior versions of SQL Server allowed JOIN conditions ( =, *=, =*)
to be specified in the WHERE clause of a query.
This syntax is no longer supported and will lead to inaccurate
results. The JOIN clause and associated
syntax should be used explicitly.
The use of RIGHT OUTER JOIN should be minimized. All queries utilizing RIGHT joins can be
rewritten as LEFT OUTER JOINs.32
NULL
NULL can exist as one of four (4) possible conditions or
values: missing, unknown, miscellaneous
or inapplicable. Consider the following
example:
…I am looking at a carton of Easter
eggs and I want to know their colors. If
I see an empty hole, I have a missing egg… If I see a foil-wrapped egg, I have an unknown
color value in my set. If I see a
multicolored egg, I have a miscellaneous value in my set. If I see a cue ball, I have an inapplicable
value in my set.3
Additionally, NULL will propagate to the result in arithmetic, logical
and concatenation operations (i.e. 5 + NULL = NULL, ‘Hello’ + NULL = NULL and X
AND NULL = NULL).20, 32
System table queries
Whenever possible, INFORMATION_SCHEMA views, Dynamic Management
views and system functions (e.g. DATABASEPROPERTY(), OBJECTPROPERTY()) should
be used in lieu of direct queries against the system tables. See BOL topics “Information Schema Views” and
“Meta Data Functions” for additional information. Queries against the system tables are not
allowed in business application queries.
GETDATE() vs. CURRENT_TIMESTAMP
Use the ANSI Standard CURRENT_TIMESTAMP in lieu of GETDATE().
ISNULL() vs. COALESCE()
Use the ANSI Standard COALESCE() function in lieu of
ISNULL(). COALESCE() functions similarly
to ISNULL() and allows the inclusion of multiple values.
ORDER BY/GROUP BY
The use of positional numbers in the ORDER BY and GROUP BY
clauses of SQL statements is deprecated in SQL-92 (i.e. this feature will no
longer be supported in SQL-99) and should not be used in new code and should be
updated to column references in existing code.4
The use of TOP 100 PERCENT…ORDER BY does not guarantee a
presentation order. ORDER BY is only
effective in the outermost query block. 34
Rules
SQL Server rules may be implemented through the use CHECK
constraints. As CHECK constraints are
defined in the SQL-99 standard they are the preferred method. Rules have been deprecated in SQL2005.
! - used as ‘not’
The character “!” (ASCII Code 33) is not an acceptable operator
modifier. !<, !> and != are
expressed as >=, <= and <>, respectively.37
PERFORMANCE10
Dynamic SQL vs. Stored Procedure
Avoid the use of dynamic SQL wherever possible. The additional overhead, risk and
hard-coupling to the application negates its flexibility in most cases. 26,
30
COUNT()
The COUNT() function is optimized for COUNT(*) in “SELECT
COUNT(*)…” versus “SELECT COUNT(column_name)…” or “SELECT COUNT(constant)” queries.8
EXISTS() vs. COUNT()
The EXISTS() functions returns after the first occurrence
matching the query criteria whereas COUNT() must compute an exact answer for
the criteria.8
Formatting data in queries
Avoid including data formatting functions in queries. Presentation is the function of the User
Interface. The additional processing
required for formatting data is better spent at the client, as the database is
not optimized for these functions. Eliminating
formatting from the query allows greater flexibility in the use of the query.
Functions in WHERE clauses
The use of functions in the WHERE clause may preclude the use of
indices in the execution plan. It is
better to manipulate the predicate value prior to its inclusion in the query.8
Example:
SELECT * FROM
aTable WHERE DATEDIFF(day, theDate, CURRENT_TIMESTAMP) < 3 - vs. -
DECLARE @afterDate DATETIME
SET @afterDate = DATEADD(day, -3, CURRENT_TIMESTAMP)
SELECT * FROM aTable WHERE theDate < @afterDate
SET @afterDate = DATEADD(day, -3, CURRENT_TIMESTAMP)
SELECT * FROM aTable WHERE theDate < @afterDate
Search Arguments (SARGs)
IN and NOT IN
The use of IN and NOT IN, when used in conjunction with a query,
should be avoided. Utilizing JOIN syntax
to accomplish the task will result in improved performance particularly on
larger tables/result sets.8, 12
NULL
NULL values in columns of fixed length datatypes occupy the
entire scale of the column just as any other data in a fixed length (i.e.
CHAR(), DECIMAL(), etc) column. Consider
variable length datatypes for columns of scale greater than approximately eight
to sixteen bytes.14
IS NULL in a WHERE clause will preclude the use of an index
forcing a table scan.14
The use of NULL may lead to convoluted, and consequently,
inefficient or buggy DML.14
Ideally, NULL should be avoided.14
Qualifying Objects
To avoid excessive query/SP recompiles, fully qualify all
objects.16
UDF’s
Avoid using UDF’s to perform query operations as they tend to
become ‘cursor-like’ in operation. Avoid
using UDF’s in the SELECT or WHERE clauses of queries which return large
numbers of rows. Execution plans of
queries containing UDF’s do not consider the actual cost of the UDF. The results generated for SET
SHOW_STATISTICS_IO ON does not include the I/O associated with UDF execution incorporated
into queries.27, 28
EAV, OTLT and MUCK tables
Avoid Entity-Attribute Value (EAV), One True Lookup Table (OTLT)
and Massively Unified Code-Key (MUCK) tables.
The utilization of these forms lead to non-relational generalization
resulting in exceedingly complex queries and very limited scalability.35, 36
Appendix
Header Documentation Example
/*
Name: dbo.usp_dbMaint9
Author: Arturo
T. de Lamerens
Version: MS SQL
Server 2005
Function: Perform automated cyclical DATABASE,
DIFFERENTIAL & TRANSACTION LOG backups for all databases present on a SQL
Server.
Returns: Nothing
Called
by: SQL Agent Task engine
Usage: EXEC dbo.usp_backup8 <parmList>
Strategy: Utilizing 2 backup devices differentiated by
suffixing with zero (0) or one (1); perform a cyclical backup strategy. All data required for the execution of the
backups are extracted from the SQL Server system functions/views. DBA/SysOp intervention should be required
only in the event of a failure.
1.
On Sunday before 00:30 (12:30am)
a. Perform a DBCC CHECKDB
<…snip…>
Issues: 1.
The transition from week 53 to week 1 will re-use the same backup device;
this behavior has been defeated in code.
2.
Ultimately, the time required to complete the backups will exceed the
Scheduled Task interval. This will require
the segmentation of this SP by some means -- research to circumvent this issue
is on-going.
History:
2001.02.27 ATL - Global modifications for SQL8 (SQL2000)
2001.03.01 ATL - Added path/file existence verification.
- Added
DATABASEPROPERTYEX('dbname', 'Recovery') checking in lieu of system table
queries.
2001.03.09 ATL - Enhanced type verification for @logTbl via
udf_objectType
*/
Additional Information
Solutions
by Design: The Physical Design, SQL
Server Magazine
Solutions
by Design: Logical Modeling, SQL
Server Magazine
Solutions
by Design: Entity Modeling, SQL
Server Magazine
Solutions
by Design: Process Modeling, SQL
Server Magazine
Solutions
by Design: Data Modeling, SQL Server Magazine
Database 121 - Introduction to SQL, www.free-ed.net.
Comments
Post a Comment