Best Practices for Writing Optimizable SQL Code

itsimiro
5 min readOct 24, 2024

--

Writing SQL code that can be easily optimized is critical to ensure that database queries are efficient, scalable, and perform well even as data volume increases. Poorly optimized queries can lead to performance bottlenecks, slow response times, and negative user experiences. In this article, we’ll look at best practices to help you write SQL code that is easy to optimize and that delivers top performance.

Use Indexes Wisely

Indexes are one of the most powerful tools for speeding up SQL queries, especially for large datasets. An index allows the database to find rows more efficiently, but excessive indexing can negatively affect performance on INSERT, UPDATE, and DELETE operations.

Best Practice:

  • Index columns are frequently used in WHERE clauses, JOIN conditions, and sorting operations (ORDER BY).
  • Focus on indexing primary keys and foreign keys to speed up join operations.
CREATE INDEX idx_customer_id ON orders (customer_id);

What to Avoid:

  • Avoid indexing columns that are updated frequently or contain many unique values (e.g., timestamps), as this can degrade performance.
  • Don’t create too many indexes on a single table, as it adds overhead to write operations.

Avoid Using SELECT *

When you use SELECT *, all columns are extracted from the table, even those you don’t need. This results in unnecessary data transfer, which can slow down query execution, especially when querying large tables or merging multiple tables.

Best Practice:

  • Explicitly specify only the columns you need to retrieve.
SELECT first_name, last_name, email FROM customers;

What to Avoid:

  • Avoid retrieving unnecessary columns by using SELECT *
SELECT * FROM customers;

This may result in additional I/O and memory usage, leading to performance issues, especially with large tables or when joined with other large tables.

Use WHERE Clauses to Filter Early

Filtering data early in the query execution can significantly improve performance by reducing the number of rows processed. Make sure WHERE formulas are as selective as possible and use indexed columns for filtering.

Best Practice:

  • Use indexed columns in the WHERE clause to speed up filtering.
  • Combine conditions to filter out as much data as possible early in the query.
SELECT order_id, order_date
FROM orders
WHERE customer_id = 123 AND order_status = 'shipped';

What to Avoid:

  • Avoid filtering with non-indexed columns or performing operations (like functions) on columns inside the WHERE clause. This can prevent the use of indexes.
SELECT order_id, order_date
FROM orders
WHERE YEAR(order_date) = 2023;

This query forces the database to apply the YEAR() function to each row, making it impossible to use the index by order_date.

Optimize JOIN Operations

Joins are often used in SQL queries, but can be a source of inefficiency if not optimized properly. The key to efficient joins is to ensure that the columns to be joined are indexed and that the join conditions are selective.

Best Practice:

  • Ensure that the columns used in JOIN conditions are indexed.
  • Use inner joins (INNER JOIN) when possible, as they are generally more efficient than outer joins.
SELECT c.first_name, o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date > '2023-01-01';

What to Avoid:

  • Avoid joining large tables without filtering. Always include a WHERE clause to limit the dataset before or after the join.
  • Avoid Cartesian joins by always including proper JOIN conditions.
SELECT c.first_name, o.order_date
FROM customers c, orders o;

This will result in a Cartesian product of both tables, which can lead to massive performance issues when joining large datasets.

Leverage Aggregate Functions and Grouping Efficiently

When using aggregate functions (e.g. COUNT(), SUM(), AVG()), consider the size of the dataset you are aggregating. Aggregating large amounts of data can reduce query performance, so use indexes and GROUP BY wisely.

Best Practice:

  • Aggregate data in smaller batches where possible.
  • Ensure that the columns in the GROUP BY clause are indexed to improve performance.
SELECT customer_id, COUNT(*) AS order_count
FROM orders
WHERE order_date >= '2023-01-01'
GROUP BY customer_id;

What to Avoid:

  • Avoid aggregating on unfiltered datasets, which can lead to high memory usage and slower query times.
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

This query aggregates all records in the orders table, even those that may not be relevant. Applying a WHERE clause would optimize the query.

Limit the Use of Subqueries and Use CTE Instead

Subqueries can be inefficient, especially when nested within the WHERE or FROM clauses. Instead, use Common Table Expressions (CTE) to improve readability and optimize execution.

Best Practice:

  • Use CTE for better performance and clarity, especially when dealing with complex queries.
WITH RecentOrders AS (
SELECT customer_id, order_date
FROM orders
WHERE order_date >= '2023-01-01'
)
SELECT c.first_name, r.order_date
FROM customers c
JOIN RecentOrders r ON c.customer_id = r.customer_id;

What to Avoid:

  • Avoid deeply nested subqueries, which can be difficult to optimize and lead to poor performance.
SELECT first_name, last_name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
WHERE order_date >= '2023-01-01'
);

CTEs are often more easily handled by query optimizers and reduce the overhead associated with repetitive subqueries.

Limit the Results with LIMIT or Pagination

If your query is likely to return a large dataset, consider using LIMIT or pagination to reduce the result set size and avoid overloading the system with too much data at once.

Best Practice:

  • Use LIMIT to reduce the number of rows returned by the query.
SELECT first_name, last_name
FROM customers
WHERE country = 'USA'
LIMIT 100;
  • Implement pagination for queries that return large datasets. Use OFFSET and LIMIT to fetch specific chunks of data.
SELECT first_name, last_name
FROM customers
WHERE country = 'USA'
LIMIT 50 OFFSET 100;

What to Avoid:

  • Avoid retrieving large datasets at a time, especially in applications where users only need a portion of the data at a time.
SELECT first_name, last_name
FROM customers
WHERE country = 'USA';

This query could retrieve millions of records, resulting in unnecessary overhead.

Analyze Query Execution Plans

Always analyze the query execution plan to understand how the database engine handles the query. This will help you identify performance bottlenecks such as full table scans or inefficient joins.

Best Practice:

  • Use the EXPLAIN statement to analyze the execution plan of your queries. It will show how the database optimizer processes the query, whether indexes are used, and which operations are the most expensive.
EXPLAIN SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id;

What to Avoid:

  • Ignoring the execution plan can result in missed optimization opportunities, such as identifying slow table scans or missing indexes.

Use Batch Processing for Large Data Modifications

When modifying large datasets, such as bulk INSERT, UPDATE, or DELETE operations, breaking them into smaller batches can help reduce the load on the database and prevent locking issues.

Best Practice:

  • Break large operations into smaller chunks using transactions or batching.
UPDATE orders
SET order_status = 'shipped'
WHERE order_date < '2023-01-01'
LIMIT 1000;

What to Avoid:

  • Avoid making massive modifications in a single query, which can lead to lock contention and slow down the entire database.
UPDATE orders
SET order_status = 'shipped'
WHERE order_date < '2023-01-01';

Conclusion

Writing optimized SQL code requires a balance between readability, performance, and scalability. A proactive approach to optimization not only improves the performance of your queries but also ensures that your application can handle growing data sets and increased complexity without bottlenecks.

--

--

itsimiro
itsimiro

Written by itsimiro

Passionate developer exploring the realms of software engineering. Writing about tech, coding adventures, and everything in between!

No responses yet