Standardizing the T-SQL code across an organization is essential to facilitating effective and efficient code reviews, troubleshooting T-SQL script, supporting joint development efforts and ensuring smooth hand-off of projects from one group to another. Standardization helps keep code quality, readability and performance in check, while making the code easy to maintain.
In this post, we will discuss the following key aspects of SQL Best Practices standardization:
Proper formatting is extremely important for code readability (code review, hand-off & joint development). It is just as vital for code migration between different Databases as well as code maintainability and troubleshooting.
Where should the focus be (decide your own standard and follow)
One of the primary focus areas in SQL formatting is text case. Common areas of focus here and are keywords, data types, user-defined objects, etc. One must decide which components of the script / query will be in CAPS, and which will be in lower case. Should all keywords be uppercase, and data types all lower case, while user-defined objects are in camel case (compound words where the first letter of an identifier is lowercase and the first letter of each subsequent concatenated word is capitalized). Such as table names for example?
The following is an example of such a text case standardization.
IF OBJECT_ID (“ProductDocs”,“U”)IS NOT NULL
DROP TABLE ProductDocs
GO
CREATE TABLE ProductDocs
(
DocID intNOT NULL IDENTITY,
DocTitle nvarchar (50) NOT NULL,
DocFileName nvarchar (400) NOT NULL,
CONSTRAINT PK_ProductDocs_DocID PRIMARYKEYCLUSTERED (DocID ASC)
)
GO
The placement of commas is also very important in SQL as they can cause syntax errors or alternatively increase readability, depending on where they are used. Should commas be used to end the line or begin the line? Should spaces or tabs ever be used before or after commas? This is what needs to be decided. Regardless of whichever approach is used, consistency in their usage must be maintained throughout the code. For example:
SELECT FirstName
,MiddleName
,LastName
,City
,StateProvinceName
FROM HumanResources.vEmployee
WHERE JobTitle LIKE“Production Technician%”
ORDERBY StateProvinceName, City
This is by no means limited to just SELECT lists and ORDER BY clauses, commas can also be used in DDL statements, as depicted below:
IF OBJECT_ID (“ProductDocs”,“U”)IS NOT NULL
DROP TABLE ProductDocs
GO
CREATE TABLE ProductDocs
(
DocID int NOT NULL IDENTITY,
DocTitle nvarchar (50) NOT NULL,
DocFileName nvarchar (400) NOT NULL,
CONSTRAINT PK_ProductDocs_DocID PRIMARY KEY CLUSTERED (DocID ASC)
)
GO
Another formatting consideration is deciding when to use Space and linefeed. For example, should they be used after each column, for each Reserve Word or for each SQL clause, etc.? The practice of indentation must also be standardized, whether it”s from the left or right; indentation must be consistent throughout the code to enhance readability. A good example for this is the sample given below.
SELECT FirstName
,MiddleName
,LastName
,City
,StateProvinceName
FROM HumanResources.vEmployee
WHERE JobTitle LIKE “Production Technician%”
ORDERBY StateProvinceName, City
Comments are also very helpful for query readability and aid in understanding complex logic when reading thorough hundreds of lines of code. One must make it a point to use comments in queries.
A good approach, is to keep multiple line comments in /*….*/ and one line comment in –, as depicted in the code sample below.
/*
Retrieves employee data for Production Technicians
Orders data by state, then city
*/
SELECT FirstName
,MiddleName + LastName –retrieve full name
,City
,dbo.proper_case(StateProvinceName) — User Defined Function
FROM
HumanResources.vEmployee
— Pull Production Technicians only
WHERE JobTitle LIKE “Production Technician%”
ORDER BY StateProvinceName,
City –order by state, then city
/*
CONFIDENTIAL
*/
Aliases should also be used as much as possible, as they are not only helpful in query maintainability and avoiding referencing conflict, but also for performance because when you prefix each object with an alias, the optimizer does not need extra cost to look into the data dictionary for that particular object”s reference.
SELECT
(cnt.FirstName + ” “ + cnt.LastName) FullName
, emp.LoginID
, emp.Title
FROM
HumanResources.Employee emp
INNER JOIN Person.Contact cnt
ON
emp.ContactID = cnt.ContactID
ORDER BY cnt.LastName
It’s also a good practice to use ANSI standard joins instead of old style joins, as this enables you to migrate code easily between different databases. For example:
–old style join
SELECT a.Au_id,
t.Title
FROM TITLES t,
AUTHORS a,
TITLEAUTHOR ta
WHERE a.Au_id = ta.Au_id
AND ta.Title_id = t.Title_id
ANDt.Title LIKE “%Computer%“
–ANSI join syntax
SELECT a.Au_id,
t.Title
FROM AUTHORS a
INNERJOIN TITLEAUTHOR ta
ON a.Au_id = ta.Au_id
INNERJOIN TITLES t
ON ta.Title_id = t.Title_id
WHERE t.Title LIKE “%Computer%“
Avoid using Select * in queries. Since the more data you choose to read from tables, the slower your query will become. Large data size increases the time it takes for disk operations. Also, in cases where the database server is separate from the web server, fetching large data chunks in queries will result in longer network delays, as the data has to be transferred between the servers. For example:
Use:
SELECT customer_id, last_name, first_name, street, city FROM customer
Rather than:
SELECT * FROM customer;
Sample MySQL Code
// do I have any users from Alabama?
// what NOT to do:
$r = mysql_query(“SELECT
* FROM user WHERE state = “Alabama””);
if (mysql_num_rows($r) >
0) {
// …
}
// much better:
$r = mysql_query(“SELECT
1 FROM user WHERE state = “Alabama” LIMIT 1″);
if (mysql_num_rows($r) >
0) {
// …
}
UNIONs perform a DISTINCT operation which requires extra cost. This must be avoided in order to facilitate faster query execution. For example:
Use:
SELECT employeeID, firstname, lastname
FROM names WHERE
dept = “prod”
UNION ALL
SELECT employeeID, firstname, IF EXISTS
(SELECT org_idastname
FROM names WHERE
city = “Orlando”
Rather than:
SELECT employeeID, firstname, lastname
FROM names WHERE
dept = “prod”
UNION
SELECT employeeID,
firstname, lastname FROM
names WHERE city = “Orlando”
Another best practice is to use Exists when you want to check the existence of records, rather than COUNT (*). For example:
Use:
IF EXISTS(SELECT org_id
FROM dbo.[PatReq.Organization] AS org
WHERE org.ID = 1)
PRINT “Record Exists”
Rather than:
DECLARE @n INT
SELECT @n= Count(org_id)
FROM dbo.[PatReq.Organization] AS org
WHERE org.ID = 1
IF @n > 0
PRINT “Record Exists”
Always use table aliases in column names, in order to avoid situations where the optimizer has to search for columns in the data dictionary. Using this approach will result in faster query execution. For example:
Use:
SELECT
(cnt.FirstName
+ ” “ + cnt.LastName) AS FullName,
emp.LoginID,
emp.Title
FROM HumanResources.Employee AS emp
INNERJOIN Person.ContactAS cnt
ON emp.ContactID = cnt.Contact_ID
Rather than:
SELECT
(FirstName + ” “ + LastName) AS FullName,
LoginID,
Title
FROM HumanResources.Employee
INNER JOIN Person.Contact
ON ContactID = Contact_ID
Using the BETWEEN function is much more efficient than using IN. Therefore when searching for records in sequence, it”s always preferable to use BETWEEN rather than IN, as depicted below.
Use:
SELECT org.*
FROM dbo.[PatReq.Organization] AS org
WHERE org.ID
BETWEEN 12654 AND 12657
Rather than:
SELECT org.*
FROM dbo.[PatReq.Organization] AS org
WHERE org.ID
IN(12654,12655,12656,12657)
Using column numbers in Order By clauses incurs additional CPU cost, so it”s better not to use them in conjunction. The following is a good example of this scenario:
Use:
SELECT OrderID, OrderDate
FROM Orders
ORDER BY
OrderDate
Rather than:
SELECT OrderID, OrderDate
FROM Orders
ORDER BY 2
If you are already working on unique rows don”t use the DISTINCT keyword, as this too incurs additional CPU cost. For example:
Use:
SELECT d.dept_id, d.dept
FROM dept;
Rather than:
SELECT DISTINCT
d.dept_id, d.dept
FROM dept;
Its always best to use sargable conditions in Where clause instead of non-sargable search conditions, as non-sargable search conditions have a time cost associated with them. Therefor a best practice of query optimization is to use sargable conditions wherever possible. For example:
Use:
=, >, >=, <, =<, BETWEEN, LIKE ‘literal%’
Rather Than:
<>, !=, !>, !<, NOT EXISTS, NOT IN, NOT LIKE, LIKE ‘%literal%’
Use:
SELECT org.Name
FROM dbo.[PatReq.Organization] AS org
WHERE org.Name LIKE “SAMA%”
Rather Than:
SELECT org.Name
FROM dbo.[PatReq.Organization] AS org
WHERE org.Name LIKE “%SAMA%”
For proper index usage it is best to avoid using arithmetic operators on Where clauses. For example:
Use:
SELECT org.ID, org.ProcessedDate
FROM dbo.[PatReq.Organization] AS org
WHERE org.ProcessedDate = “2002-04-04” + 2
Rather Than:
SELECT org.ID, org.ProcessedDate
FROM dbo.[PatReq.Organization] AS org
WHERE org.ProcessedDate + 2 = “2002-04-06”
It”s also best to avoid using Function on WHERE clauses, as doing so will not utilize the index properly. The following code sample depicts the best practice to use in this situation.
Use:
SELECT org.ID, org.ProcessedDate
FROM dbo.[PatReq.Organization]AS org
WHERE org.ProcessedDate >= “2002-04-01”
AND org.ProcessedDate < “2002-05-01
Rather Than:
SELECT org.ID, org.ProcessedDate
FROM dbo.[PatReq.Organization] AS org
WHERE DATEPART(yy,org.ProcessedDate) = “2002”
AND DATEPART(mm,org.ProcessedDate) = “4”
Sometimes we can avoid an extra row scan by replacing NOT IN with LEFT OUTER JOIN. For example, if we
Use:
SELECT org.*
FROM dbo.[PatReq.Organization] AS org
LEFT OUTER
JOIN [dbo].[PatReq.Document] doc
ON org.DocumentId = doc.Idextension
WHERE doc.Idextension IS NULL
Rather Than:
SELECT org.*
FROM dbo.[PatReq.Organization] AS org
WHERE org.DocumentId
NOT IN
(SELECT doc.Idextension
FROM [dbo].[PatReq.Document] doc)
If Derive has fewer rows than the Join table and only one column in Select, then it is best to avoid using JOIN and use a sub-query instead, as depicted in the sample below:
Use:
SELECT pd.Idextension
,(SELECT TOP 1 PO.Name
FROM [dbo].[PatReq.Organization] PO
WHERE PO.DocumentId = PD.IdExtension)
AS RepresentedOrganization
FROM [PatReq.Document] PD
Rather Than:
SELECT pd.Idextension
, PO.Name
FROM [PatReq.Document] PD
INNER JOIN [dbo].[PatReq.Organization] PO
ON PO.DocumentId = PD.IdExtension
IF you have duplicate rows and want to fetch just one row, its best practice to use TOP or LIMIT, as depicted below:
SELECT TOP 1 org_id
FROM dbo.[PatReq.Organization] AS org
WHERE org.ID = 1
As a Big Data solution provider, Folio3 has extensive expertise in query optimization and performance management solutions. For more information about our Big Data services, please visit here or contact us for details.
USA408 365 4638
1301 Shoreway Road, Suite 160,
Belmont, CA 94002
Whether you are a large enterprise looking to augment your teams with experts resources or an SME looking to scale your business or a startup looking to build something.
We are your digital growth partner.
Tel:
+1 408 365 4638
Support:
+1 (408) 512 1812
COMMENTS ()
Tweet