Understanding Query Optimization Fundamentals
Query optimization is selecting the most efficient way to execute a SQL query. When you submit a query, the database optimizer evaluates multiple execution plans and chooses the fastest one.
The Optimization Pipeline
Three components work together: the parser checks syntax, the optimizer determines the best execution plan, and the execution engine runs it. Understanding this pipeline helps you recognize where optimization opportunities exist.
Key Metrics to Track
Focus on these measurements when studying optimization:
- Execution time (how long the query takes)
- I/O operations (disk reads and writes)
- CPU usage
- Memory consumption
For example, adding an index reduces I/O operations but increases storage and slows down INSERT/UPDATE operations.
Core Concepts You Must Know
Table scans versus index seeks represent different access methods. Join algorithms (nested loop, hash join, sort merge) determine how tables combine. Selectivity refers to how many rows a WHERE clause filters out; higher selectivity is better because it reduces the dataset the optimizer must process.
Flashcards help you internalize these relationships and recall them under exam pressure.
Index Strategies and Execution Plans
Indexes are among the most powerful optimization tools available. Understanding when and how to use them correctly changes everything.
Clustered vs. Non-Clustered Indexes
A clustered index determines the physical order of table rows on disk. Each table has exactly one clustered index, usually the primary key. A non-clustered index creates a separate lookup structure without reorganizing the table.
Use a clustered index on your most-searched column because it provides the fastest range queries. Use non-clustered indexes on frequently filtered or joined columns where sequential access isn't critical.
Composite Index Rules
Composite indexes span multiple columns, and column order matters significantly. Place equality predicate columns before range predicate columns. For example, if you query WHERE region = 'North' AND salary > 50000, put region first, then salary.
Reading Execution Plans
Execution plans show exactly how the database will execute your query. They display as visual trees or text showing operations like Clustered Index Seek, Hash Join, Sort, and Filter. The cost percentages indicate relative resource consumption.
Focus optimization efforts on the costliest operations first. Create flashcards comparing different index types, their advantages, disadvantages, and use cases. Include specific examples showing how different indexes affect the same query.
Join Optimization Techniques
Joins combine data from multiple tables, and optimizing them significantly impacts overall performance. Three primary join algorithms exist, each with different performance characteristics.
Nested Loop Joins
Nested loop joins iterate through each row of the outer table and search for matching rows in the inner table. This works well for small datasets or when an index exists on the inner table's join column. With large tables, nested loops become prohibitively slow because they perform many lookups.
Hash Joins
Hash joins build an in-memory hash table from the smaller input and probe it with rows from the larger input. They're efficient for large tables without relevant indexes but require sufficient memory. If memory isn't available, the database must spill hash data to disk, dramatically reducing performance.
Sort-Merge Joins
Sort-merge joins work best when both inputs are already sorted on the join column. They require no additional memory but take time to sort unsorted inputs. Understanding when your database naturally has sorted data is important.
Join Order Optimization
Join order matters tremendously. Filtering tables earliest and reducing their size before joins accelerates subsequent operations. Create flashcards presenting specific query scenarios and asking you to identify the optimal join strategy and order. Include examples showing how join order changes affect execution plans.
Query Rewriting and Optimization Patterns
Sometimes the most effective optimizations involve rewriting queries in logically equivalent but structurally different ways.
Common Rewriting Techniques
Subquery unnesting converts subqueries into joins when possible, allowing the optimizer to consider more execution plans. Predicate pushdown moves filtering conditions as close as possible to data sources, reducing data flowing through subsequent operations.
Key patterns include:
- Replace NOT IN with NOT EXISTS when dealing with nullable columns
- Use UNION ALL instead of UNION when duplicates aren't a concern
- Avoid SELECT * in favor of explicitly listing needed columns
These patterns work because they reduce data volume or allow better optimizer choices.
Handling Correlated Subqueries
Correlated subqueries reference outer query columns and often perform poorly because they execute repeatedly for each outer row. Rewriting these as joins, window functions, or aggregate subqueries typically improves performance dramatically.
View and Materialized View Optimization
View optimization involves understanding how views expand and whether the optimizer can push predicates through them. Materialized views pre-compute results, trading storage for query speed.
Create flashcards showing original query versions alongside optimized versions. Include the reasoning behind each change and expected performance improvement. This helps you recognize optimization opportunities and understand underlying principles.
Statistics, Cardinality Estimation, and Query Analyzer Tools
Query optimizers make decisions based on table and index statistics including row counts, data distribution, and cardinality (distinct value counts). Outdated statistics lead to poor optimizer choices because the optimizer underestimates or overestimates row volumes.
Understanding Cardinality Estimation
Cardinality estimation is notoriously difficult. If a column has 1000 distinct values in a million-row table, the optimizer assumes roughly 1000 rows match any given value (assuming uniform distribution). In reality, data is often skewed. When optimization plans consistently perform worse than expected, statistics misalignment is often the culprit.
Using Query Analyzer Tools
Query analyzer tools like SQL Server's Execution Plan Analyzer, MySQL's EXPLAIN, and PostgreSQL's EXPLAIN ANALYZE provide crucial insights. These tools show estimated versus actual row counts, revealing cardinality estimation errors. Learning to read these tools is essential for practical optimization work.
Key metrics displayed include:
- Estimated Rows and Actual Rows
- I/O Statistics (logical and physical reads)
- Execution time
- Cost percentages
Discrepancies between estimated and actual rows indicate where the optimizer made wrong assumptions. Creating flashcards showing execution plan snippets and asking you to identify performance issues trains you to spot problems quickly.
Include flashcards covering command syntax in EXPLAIN (PostgreSQL), EXPLAIN PLAN (Oracle), and STATISTICS (SQL Server). Practice interpreting different execution plan icons and operations.
