best practices of sql

Upload: mutthumh

Post on 06-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Best Practices of SQL

    1/12

    BEST PRACTICES OF SQL

    BEST PRACTICES IN CODING SQL QUERIES:

    Always include a WHERE clause in your SELECT statement to narrow the number of rows

    returned. If you don't use a WHERE clause, then SQL Engine will perform a table scan of

    your table and return all of the rows. In some case you may want to return all rows, and not

    using a WHERE clause is appropriate in this case. But if you don't need all the rows returned,

    use a WHERE clause to limit the number of rows returned.

    Don't be afraid to make liberal use of in-line and block comments in your SQL code, they will

    not affect the performance of your application.

    If possible, avoid using cursors. They generally use a lot of resources and reduce the

    performance and scalability of your applications. If you need to perform row-by-row

    operations, try to find another method to perform the task. Some options are to perform the

    task at the client, use tempdb tables, use derived tables, use a correlated sub-query, or use

    the CASE statement.

    When using the UNION statement, keep in mind that by default it performs the equivalent of a

    SELECT DISTINCT on the results set. If you know that there will not be any duplication of

    rows created as a result of using the UNION statement, then use the UNION ALL statementinstead. This variation of the statement does not look for duplicate rows and runs much faster

    than the UNION, which does look for duplicate rows, whether or not there are any.

    Carefully evaluate whether your query needs the DISTINCT clause or not. The DISTINCT

    clause slows down virtually every query it is in. Some developers automatically add this

    clause to every one of their SELECT statements, even when it is not necessary. This is a bad

    habit that should be stopped. In addition, keep in mind that in some cases, duplicate results in

    a query are not a problem. If this is the case, then don't use a DISTINCT clause.

    In your queries, don't return column data you don't need. For example, you should not use

    SELECT * to return all the columns from a table if you don't need all the data from each

    column. In addition, using SELECT * prevents the use of covered indexes, further potentially

    hurting query performance.

    Try to avoid WHERE clauses that contains search arguments in the WHERE clause, such as

    "IS NULL", "OR", "", "!=", "!>", "!

  • 8/3/2019 Best Practices of SQL

    2/12

  • 8/3/2019 Best Practices of SQL

    3/12

    Of the search criterions in the WHERE clause, at least one of them should be based on ahighly selective column that has an index.

    If at least one of the search criterions in the WHERE clause is not highly selective,consider adding indexes to all of the columns referenced in the WHERE clause.

    If none of the column in the WHERE clause are selective enough to use an index on theirown, consider creating a covering index for this query.

    A query with one or more OR clauses can sometimes be rewritten as a series of queries that

    are combined with a UNION ALL statement, in order to boost the performance of the query.

    For example, let's take a look at the following query:

    SELECT EmpNo, EmpName

    FROM EMP

    WHERE DeptNo = 10 OR Salary < 10000 OR MgrNo = 1010

    This query has three separate conditions in the WHERE clause. In order for this query to use

    an index, then there must be an index on all three columns found in the WHERE clause.

    This same query can be written using UNION ALL instead of OR, like this example:

    SELECT EmpNo, EmpName FROM EMP WHERE DeptNo = 10

    UNION ALL

    SELECT EmpNo, EmpName FROM EMP WHERE Salary < 10000

    UNION ALL

    SELECT EmpNo, EmpName FROM EMP WHERE MgrNo = 1010

    Views are often convenient to use, especially for restricting users from seeing data they

    should not see, but they aren't good for performance. So if database performance is the goal,

    avoid using views.

    Here's why. When the Query Optimizer gets a request to run a view, it runs it just as if youhad run the view's SELECT statement from the Query Analyzer. In fact, a view runs slightlyslower than the same SELECT statement run from the Query Analyzer-- but we probablywould not notice the difference--because of the additional overhead caused by the view.Unlike stored procedures, views offer no pre-optimization.

    Instead of embedding SELECT statements in a view, put them in a stored procedure insteadfor optimum performance. Not only do you get the added performance boost, you can alsouse the stored procedure to restrict user access to table columns they should not see.

    Try to avoid nesting views (referring to a view from within a view). While this is not prohibited,

    it makes it more difficult to identify the source of any performance problems. A better idea is

    to create separate views instead of nesting them.

    Don't use DISTINCT or ORDER BY in your SELECT statements unless you really need them.Both options can add a lot of additional overhead to your query, and they aren't always

    needed for the application.

    If SELECT statement includes an IN option along with a list of values to be tested in the

    query, order the list of values so that the most frequently found values are placed at the first

    of the list, and the less frequently found values are placed at the end of the list. This can

    speed performance because the IN option returns true as soon as any of the values in the list

    produce a match. The sooner the match is made, the faster the query completes.

  • 8/3/2019 Best Practices of SQL

    4/12

  • 8/3/2019 Best Practices of SQL

    5/12

    getting immediate feedback, the user gets the impression that the application is fast,even though in the background, it is not.

    Don't repeatedly reuse the same function to calculate the same result over and over within

    SQL code. For example, if you need to reuse the value of the length of a string over and over

    within your code, perform the LEN function once on the string, and this assign the result to a

    variable, and then use this variable, over and over, as needed in your code. Don't recalculatethe same value over and over again by reusing the LEN function each time you need the

    value, as it wastes Database Server resources and hurts performance.

    Slow queries can be caused by a wide variety of reasons.

    Some include:

    Lack of useful indexes

    Lack of useful I/O striping

    Out-of-date statistics or lack of useful statistics

    Lack of physical memory

    Slow network connection

    SQL queries transferring large amounts of data from the server to the client

    Blocked locks or deadlocks

    Performing OLTP and OLAP queries on the same physical server.

    Poorly designed query.

  • 8/3/2019 Best Practices of SQL

    6/12

    BEST PRACTICES IN CODING STORED PROCEDURES

    Whenever a client application needs to send SQL to Database Server, send it in the form of astored procedure instead of a script or embedded Transact-SQL. This not only reduces networktraffic (only the EXECUTE or CALL is issued over the network), but it can speed up the SQLbecause the code in the stored procedure residing on the server is already pre-compiled. In

    addition, after a stored procedure is run for the first time, it stays cached in Database Serversmemory where it can potentially be reused, further reducing overhead on the Server.

    Keep in mind that just because we use a stored procedure does not mean that it will run fast. Thecode we use within our stored procedure must be well designed for both speed and reuse.

    By default, every time a stored procedure is executed, a message is sent from the server tothe client indicating the number of rows that were affected by the stored procedure. Rarely isthis information useful to the client. By turning off this default behavior, you can reducenetwork traffic between the server and the client, helping to boost overall performance of yourserver and applications. There are two main ways to turn this feature off. You can also turnthis feature off using a server trace setting, but it is unnecessary as there are easier ways, asdescribed here.

    To turn this feature off on at the stored procedure level, you can include the statement in MS-SQL :

    SET NOCOUNT ON

    at the beginning of each stored procedure you write. This statement should be included inevery stored procedure you write.

    Keep SQL transactions as short as possible. This helps to reduce the number of locks,helping to speed up the overall performance of Database application. Two ways to helpreduce the length of a transaction are to:

    Break up the entire job into smaller steps so each step can be committed as soon aspossible.

    Take advantage of Database Server statement batches, which acts to reduce thenumber of round-trips between the client and server.

    When a stored procedure is first executed (and it does not have the WITH RECOMPILEoption), it is optimized and a query plan is compiled and cached in Database Server buffer. Ifthe same stored procedure is called again from the same connection, it will used the cachedquery plan instead of creating a new one, saving time and boosting performance.

    If the query in the stored procedure is exactly the same each time, then this is a good thing.But if the query is dynamic (the WHERE clauses changes from one execution of the storedprocedure to the next), then this is a bad thing, as the query will not be optimized when it is

    run, and the performance of the query can suffer greatly.

    If we know that our query will vary each time it is run from the stored procedure, you will wantto add the WITH RECOMPILE option when you create the stored procedure. This will forcethe stored procedure to be re-compiled each time it is run, ensuring the query is optimizedeach time it is run.

    Design the application to allow users to cancel running queries. Not doing so may force theuser to reboot the client, which can cause irresolvable performance problems.

  • 8/3/2019 Best Practices of SQL

    7/12

    Before you are done with your stored procedure code, review it for any unused code that youmay have forgotten to remove while you were making changes, and remove it. Unused code

    just adds unnecessary bloat to your stored procedures.

    For best performance, all objects that are called within the same stored procedure should allbe owned by the same owner, preferably admin user. If they are not, then Database Servermust perform name resolution on the objects if the object names are the same but the owners

    are different. When this happens, Database Server cannot use a stored procedure "in-memory plan" over, instead, it must re-execute the stored procedure, which hindersperformance.

    Use of dynamic SQL (using EXECUTE IMMEDIATE in ORACLE and SP_EXECUTESQL inMS-SQL) improves the performance and flexibility.

    Database Server will automatically recompile a stored procedure if any of the following happens:

    If any schema changes occur to any of the objects referenced in the stored procedure.This includes adding or dropping rules, defaults, and constraints.

    New distribution statistics are generated. If database is restored that includes the stored procedure or any of the objects it

    references.

    If the stored procedure is aged out of Database Server's cache. An index used by the execution plan of the stored procedure is dropped. A major number of INSERTS, UPDATES or DELETES are made to a table referenced by

    a stored procedure. The stored procedure includes both DDL (Data Definition Language) and DML (Data

    Manipulation Language) statements, and they are interleaved with each other.

    One hidden performance problem of using stored procedures is when a stored procedurerecompiles too often. Normally, we want a stored procedure to compile once and to be storedin Database Server's cache so that it can be re-used without it having to recompile each timeit is used. This is one of the major benefits of using stored procedures. But in some cases, a

    stored procedure is recompiled much more often than it needs to be recompiled, hurting yourserver's performance. In fact, it is possible for a stored procedure to have to be recompiledwhile it is executing! Here are three potential problems you want to look out for when writingstored procedures.

    Unnecessary Stored Procedure Recompilations Due to Row Modifications and AutomatedStatistics Update

    Unnecessary Stored Procedure Recompilations Due to Mixing DDL and DML Statements inthe Same Stored Procedure.

    To prevent unnecessary stored procedure recompilations, you should include all of your DDLstatements at the first of the stored procedure so they are not intermingled with DMLstatements.

    Unnecessary Stored Procedure Recompilations Due to Specific Temporary Table OperationsImproper use of temporary tables in a stored procedure can force them to be recompiledevery time the stored procedure is run. Here's how to prevent this from happening:

    Any references to temporary tables in your stored procedure should only refer totables created by that stored procedure, not to temporary tables created outside

  • 8/3/2019 Best Practices of SQL

    8/12

    your stored procedure, or in a string executed using either the sp_executesql or theEXECUTE statement.

    All of the statements in your stored procedure that include the name of a temporarytable should appear syntactically after the temporary table.

    The stored procedure should not declare any cursors that refer to a temporarytable.

    Any statements in a stored procedure that refer to a temporary table shouldprecede any DROP TABLE statement found in the stored procedure.

    The stored procedure should not create temporary tables inside a control-of-flowstatement.

    Stored procedures can better boost performance if they are called via Transaction Server(TS) (like Microsoft Transaction Server (MTS), all application servers provides the transactionservers), instead of being called directly from your application. A stored procedure can bereused from the procedure cache only if the connection settings calling the stored procedureare the same.

    Avoid nesting stored procedures, although it is perfectly legal to do so. Nesting not onlymakes debugging more difficult, it makes it much more difficult to identify and resolveperformance-related problems.

    When calling a stored procedure from your application, it is important that we call it using itsfully qualified name.

    Such as:

    database_name.user_name.myProcedure in MS-SQL

    schema_name.myProcedure in Oracle

    instead of:

    myProcedure

    Using fully qualified names helps to eliminate any potential confusion about which storedprocedure you want to run, helping to prevent bugs and other potential problems. But moreimportantly, doing so allows Database Server to access the stored procedures execution planmore directly, and in turn, speeding up the performance of the stored procedure.

    If a stored procedure needs to return only a single value, and not a RecordSet or ResultSet,consider returning the single value as an output statement. While output statements aregenerally used for error-checking, they can actually be used for any reason you like.Returning a single value as at output statement is faster than returning a single value as partof a recordset.

    Avoid coding nested if-then-else blocks.

    Handle the exceptions properly and react based on the severity. It is advisable to handle the

    exception and sending the appropriate status to the application, instead propagating theexception to the application.

    Or, another way is catch the exception and raise an appropriate exception that gives fullinformation regarding the exception.

    Try to modularize the stored procedures for each application. Encapsulate the storedprocedure depending on the sub-system identified as part of the system analysis.

  • 8/3/2019 Best Practices of SQL

    9/12

    Try to optimize using the implicit cursors within the stored procedures. Otherwise, the storedprocedures unnecessarily fill the buffer memory of Database Server.

  • 8/3/2019 Best Practices of SQL

    10/12

    BEST PRACTICES IN USING INDEXES

    Please consider these tips as general guidelines. As with any general guideline, there areexceptions. Because of this, it is a good idea to test out various indexing strategies for the mostcommon queries run against your database.

    Use of these indexing is once again depends on the context and scope of your application.

    Indexes should be considered on all columns that are frequently accessed by the WHERE,ORDER BY, GROUP BY, TOP, and DISTINCT clauses. Without an index, each of theseoperations will require a table scan of your table, potentially hurting performance.

    Please keep in mind the word "considered". An index created to support the speed of aparticular query may not be the best index for another query on the same table. Sometimesyou have to balance indexes to attain acceptable performance on all the various queries thatare run against a table.

    Don't automatically add indexes on a table because it seems like the right thing to do. Onlyadd indexes if you know that they will be used by the queries run against the table.

    As a rule of thumb, every table should have at least a index. Generally, but not always, the

    index should be on a column that monotonically increases - such as an identity column, orsome other column where the value is increasing - and is unique. In many cases, the primarykey is the ideal column for an index.

    Don't over index your transaction tables (most accessible tables), as every additional indexincreases the time it takes to perform INSERTS, UPDATES, and DELETES. There must be afine line drawn between having the ideal number of indexes (for SELECTs) and the idealnumber for data modifications.

    Don't accidentally add the same index twice on a table. For example, you add a primary keyto a column, which of course creates an index to enforce what you want to happen. Addingunique index will not give more performance to you, rather reduces the performance as queryoptimizer need to consider two indexes on the same column.

    Drop indexes that are never used by the Query Optimizer. Unused indexes slow datamodifications, causes unnecessary I/O reads when reading pages, and wastes space in your

    database, increasing the amount of time it takes to backup and restore databases. Keep the "width" (number of columns) of your indexes as narrow as possible, especially when

    creating composite (multi-column) indexes. This reduces the size of the index and reducesthe number of disk I/O reads required to read the index, boosting performance.

    If possible, try to create indexes on columns that have integer values instead of characters,REAL, FLOAT unless it is required, as it hurts the performance.

    An index is generally only useful to a query if the WHERE clause of the query matches thecolumn(s) that are leftmost in the index. So if you create a composite index, such as "EmpNo,DeptNo" on EMP table, then a query such as "WHERE EmpNo = 1000" will use the index, butthe query "WHERE DeptNo = 10" will not use the index.

    Even if the WHERE clause in a query does not specify the first column of an available index(which normally disqualifies the index from being used), if the index is a composite index andcontains all of the columns referenced in the query, the query optimizer can still use the

    index, because the index is a covering index. When you create an index with a composite key, the order of the columns in the key is

    important. Try to order the columns in the key as to enhance selectivity, with the mostselective columns to the leftmost of the key. If this is not considered, then most of the timesthe index will not be used for the intended purpose.

    Try to avoid having composite indexes (index containing multiple-columns) as much aspossible as we can.

    If you have two or more tables that are frequently joined together, then the columns used forthe joins should have an appropriate index.

  • 8/3/2019 Best Practices of SQL

    11/12

    If you want to boost the performance of a query that includes an AND operator in the WHEREclause, consider the following:

    Of the search criterions in the WHERE clause, at least one of them should bebased on a highly selective column that has an index.

    If at least one of the search criterions in the WHERE clause is not highly selective,

    consider adding indexes to all of the columns referenced in the WHERE clause. If none of the column in the WHERE clause are selective enough to use an index

    on their own, consider creating a covering index for this query.

    To make complex queries easier to analyze, consider breaking them down into their smallerconstituent parts. One way to do this is to simply create lists of the key components of thequery, such as:

    List all of the columns that are to be returned List all of the columns that are used in the WHERE clause List all of the columns used in the JOINs (if applicable) List all the tables used in JOINs (if applicable)

    Once you have the above information organized in this easy-to-comprehend form, it ismust easier to identify those columns that could potentially make use of indexes whenexecuted.

  • 8/3/2019 Best Practices of SQL

    12/12

    MISCELLANEOUS:

    The following are some of the best practices to be considered in general for optimum databasedesign and implementation.

    Don't run a screensaver on your production Database Server, it can unnecessarily use CPU

    cycles that should be going to your application. The only exception to this is the "blankscreen" screensaver, which is OK to use.

    Don't run Production Database Server on the same physical server that you are running otherapplications that consume more RAM and CPU cycles. Have a dedicated Database Serverin the Production environment.

    Sometimes, a user thread has to wait until the resources it needs are available. Hopefully,this won't happen often, but it is a fact of life. But sometimes, long waits can indicate potentialperformance problems than can be corrected. Long waits can be caused by blocking locks,slow I/O, and other factors.

    For best performance, don't mix production databases and development (test or staging)databases on the same physical server. This not only serves to better separate the twofunctions (production and development), but prevents developers from using up serverresources that could be better used by production users.