🇨🇳 中文

MySQL SQL Complete Guide: From Beginner to Advanced (With Interview Questions)

Master MySQL SQL from basics to advanced topics: indexes, B+ trees, transactions, MVCC, locks, and query optimization. Includes real-world examples and common interview questions.

Bruce

MySQLSQLDatabaseInterviewIndexTransactionPerformance

MySQL

5481  Words

2026-01-23


MySQL SQL Complete Guide: From Beginner to Advanced

SQL (Structured Query Language) is the universal language for communicating with databases. Whether you are a backend developer, data analyst, or DevOps engineer, SQL proficiency is a must-have skill. This guide uses MySQL as the reference implementation and walks you through everything from basic syntax to the internals that interviewers love to ask about.

This is a long-form article (~15,000 words). Bookmark it and work through one section at a time. Use the table of contents to jump to the topics you need.

1. SQL Fundamentals

1.1 What Is SQL?

SQL stands for Structured Query Language and is used to manage relational databases. It lets you:

  • Query data — retrieve information from one or more tables
  • Manipulate data — insert, update, and delete rows
  • Define structure — create and alter databases and tables
  • Control access — manage user privileges and permissions

SQL statements fall into four categories:

CategoryFull NameCommon StatementsPurpose
DDLData Definition LanguageCREATE, ALTER, DROPDefine database structure
DMLData Manipulation LanguageSELECT, INSERT, UPDATE, DELETEManipulate data
DCLData Control LanguageGRANT, REVOKEControl access
TCLTransaction Control LanguageCOMMIT, ROLLBACKManage transactions

1.2 Database Operations

-- Create a database
CREATE DATABASE shop DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- List all databases
SHOW DATABASES;

-- Switch to a database
USE shop;

-- Show the current database
SELECT DATABASE();

-- Drop a database (use with caution!)
DROP DATABASE shop;

Best practice: Always use the utf8mb4 character set. It fully supports Unicode, including emoji.

1.3 Table Operations

-- Create a table
CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT 'User ID',
    username VARCHAR(50) NOT NULL UNIQUE COMMENT 'Username',
    email VARCHAR(100) NOT NULL COMMENT 'Email address',
    password VARCHAR(255) NOT NULL COMMENT 'Hashed password',
    age TINYINT UNSIGNED DEFAULT 0 COMMENT 'Age',
    status TINYINT DEFAULT 1 COMMENT 'Status: 1=active, 0=disabled',
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP COMMENT 'Created at',
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Updated at'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Users table';

-- Inspect table structure
DESC users;
SHOW CREATE TABLE users;

-- Alter a table
ALTER TABLE users ADD COLUMN phone VARCHAR(20) COMMENT 'Phone number';   -- add column
ALTER TABLE users MODIFY COLUMN phone VARCHAR(15);                       -- change column type
ALTER TABLE users CHANGE phone mobile VARCHAR(15);                       -- rename column
ALTER TABLE users DROP COLUMN mobile;                                    -- drop column

-- Drop a table
DROP TABLE IF EXISTS users;

1.4 Choosing the Right Data Types

Picking the right data type is one of the most important decisions in schema design.

Integer Types

TypeBytesSigned RangeTypical Use
TINYINT1-128 to 127Status flags, age
SMALLINT2-32,768 to 32,767Small counters
INT4~-2.1 billion to ~2.1 billionPrimary keys, counts
BIGINT8Very largeHigh-volume primary keys

String Types

TypeLengthTypical Use
CHAR(n)Fixed n bytesFixed-length strings (phone numbers, country codes)
VARCHAR(n)Variable, max nVariable-length strings (usernames, emails)
TEXTUp to 64 KBLong text (article bodies)
MEDIUMTEXTUp to 16 MBVery long text

Date and Time Types

TypeFormatRangeWhen to Use
DATEYYYY-MM-DD1000-01-01 to 9999-12-31Date only
DATETIMEYYYY-MM-DD HH:MM:SS1000-01-01 to 9999-12-31Date and time
TIMESTAMPYYYY-MM-DD HH:MM:SS1970 to 2038Auto-updating timestamps

Interview tip: What is the difference between DATETIME and TIMESTAMP?

  • DATETIME uses 8 bytes, has a wider range, and is timezone-agnostic.
  • TIMESTAMP uses 4 bytes, has a narrower range, and automatically converts between timezones.

2. CRUD Operations In Depth

2.1 INSERT — Adding Data

-- Single-row insert
INSERT INTO users (username, email, password, age)
VALUES ('alice', '[email protected]', 'hashed_password', 25);

-- Multi-row insert (recommended for efficiency)
INSERT INTO users (username, email, password, age) VALUES
('bob', '[email protected]', 'hashed_password', 30),
('charlie', '[email protected]', 'hashed_password', 28),
('diana', '[email protected]', 'hashed_password', 35);

-- Upsert: insert or update on duplicate key
INSERT INTO users (id, username, email, password)
VALUES (1, 'alice', '[email protected]', 'new_password')
ON DUPLICATE KEY UPDATE email = VALUES(email), password = VALUES(password);

-- Insert and silently skip duplicates
INSERT IGNORE INTO users (username, email, password)
VALUES ('alice', '[email protected]', 'password');

2.2 SELECT — Querying Data

SELECT is the most frequently used — and most complex — SQL statement.

Basic Queries

-- Select all columns (avoid in production — hurts performance)
SELECT * FROM users;

-- Select specific columns (recommended)
SELECT id, username, email FROM users;

-- Column aliases
SELECT id AS user_id, username AS name FROM users;

-- Remove duplicates
SELECT DISTINCT status FROM users;

-- Limit rows
SELECT * FROM users LIMIT 10;           -- first 10 rows
SELECT * FROM users LIMIT 10 OFFSET 20; -- skip 20, take 10
SELECT * FROM users LIMIT 20, 10;       -- shorthand for the above

WHERE Clauses

-- Comparison operators
SELECT * FROM users WHERE age >= 18;
SELECT * FROM users WHERE status != 0;
SELECT * FROM users WHERE age <> 18;  -- alternative not-equal syntax

-- Logical operators
SELECT * FROM users WHERE age >= 18 AND status = 1;
SELECT * FROM users WHERE age < 18 OR age > 60;
SELECT * FROM users WHERE NOT status = 0;

-- Range queries
SELECT * FROM users WHERE age BETWEEN 18 AND 30;   -- inclusive
SELECT * FROM users WHERE age IN (18, 20, 25, 30);

-- NULL checks
SELECT * FROM users WHERE email IS NULL;
SELECT * FROM users WHERE email IS NOT NULL;

-- Pattern matching
SELECT * FROM users WHERE username LIKE 'alice%';  -- starts with "alice"
SELECT * FROM users WHERE username LIKE '%son';    -- ends with "son"
SELECT * FROM users WHERE username LIKE '%ob%';    -- contains "ob"
SELECT * FROM users WHERE username LIKE 'alic_';   -- _ matches a single character

Performance warning: LIKE '%value' (leading wildcard) cannot use an index and triggers a full table scan!

ORDER BY

-- Ascending (default)
SELECT * FROM users ORDER BY age ASC;

-- Descending
SELECT * FROM users ORDER BY created_at DESC;

-- Multi-column sort
SELECT * FROM users ORDER BY status DESC, created_at DESC;

-- Handling NULLs (MySQL treats NULL as the smallest value)
SELECT * FROM users ORDER BY age IS NULL, age;  -- pushes NULLs to the end

GROUP BY

-- Basic grouping
SELECT status, COUNT(*) AS user_count
FROM users
GROUP BY status;

-- Multi-column grouping
SELECT status, age, COUNT(*) AS count
FROM users
GROUP BY status, age;

-- HAVING filters groups (WHERE filters rows before grouping; HAVING filters after)
SELECT status, COUNT(*) AS user_count
FROM users
GROUP BY status
HAVING user_count > 10;

-- WITH ROLLUP adds a summary row
SELECT status, COUNT(*) AS user_count
FROM users
GROUP BY status WITH ROLLUP;

2.3 UPDATE — Modifying Data

-- Update a single field
UPDATE users SET status = 0 WHERE id = 1;

-- Update multiple fields
UPDATE users SET status = 0, updated_at = NOW() WHERE id = 1;

-- Conditional update
UPDATE users SET status = 0 WHERE last_login < '2024-01-01';

-- Batch update with CASE WHEN
UPDATE users SET status = CASE
    WHEN age < 18 THEN 0
    WHEN age >= 60 THEN 2
    ELSE 1
END
WHERE id IN (1, 2, 3, 4, 5);

-- Limit the number of rows updated
UPDATE users SET status = 0 ORDER BY created_at LIMIT 100;

Safety tip: Always include a WHERE clause with UPDATE and DELETE. Consider enabling sql_safe_updates mode to prevent accidental mass modifications.

2.4 DELETE — Removing Data

-- Delete with a condition
DELETE FROM users WHERE status = 0;

-- Delete a limited number of rows
DELETE FROM users WHERE status = 0 ORDER BY created_at LIMIT 100;

-- Truncate the entire table (faster, but cannot be rolled back)
TRUNCATE TABLE users;

-- DELETE vs TRUNCATE
-- DELETE:   row-by-row, can be rolled back, triggers fire, auto-increment continues
-- TRUNCATE: instant wipe, cannot be rolled back, triggers do not fire, auto-increment resets

3. Multi-Table Queries (JOIN)

3.1 Setting Up Sample Tables

-- Orders table
CREATE TABLE orders (
    id INT PRIMARY KEY AUTO_INCREMENT,
    user_id INT NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    status TINYINT DEFAULT 1,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Products table
CREATE TABLE products (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    stock INT DEFAULT 0
);

-- Order line items
CREATE TABLE order_items (
    id INT PRIMARY KEY AUTO_INCREMENT,
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);

3.2 JOIN Types Explained

-- INNER JOIN: returns only rows that match in both tables
SELECT u.username, o.id AS order_id, o.total_amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

-- LEFT JOIN: returns all rows from the left table; NULLs where there is no match on the right
SELECT u.username, o.id AS order_id, o.total_amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

-- RIGHT JOIN: returns all rows from the right table; NULLs where there is no match on the left
SELECT u.username, o.id AS order_id, o.total_amount
FROM users u
RIGHT JOIN orders o ON u.id = o.user_id;

-- Find users who have never placed an order
SELECT u.username
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL;

-- Self join: a table joined to itself
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

3.3 Multi-Table Joins

-- Three-table join: fetch full order details
SELECT
    u.username,
    o.id AS order_id,
    p.name AS product_name,
    oi.quantity,
    oi.price
FROM orders o
INNER JOIN users u ON o.user_id = u.id
INNER JOIN order_items oi ON o.id = oi.order_id
INNER JOIN products p ON oi.product_id = p.id
WHERE o.status = 1
ORDER BY o.created_at DESC;

Interview tip: What is the difference between INNER JOIN and LEFT JOIN?

  • INNER JOIN returns only rows with matches in both tables.
  • LEFT JOIN returns every row from the left table, filling in NULLs for unmatched right-side columns.

4. Subqueries and Advanced Queries

4.1 Subqueries

-- Scalar subquery (returns a single value)
SELECT * FROM users
WHERE age > (SELECT AVG(age) FROM users);

-- Row subquery (returns one row)
SELECT * FROM users
WHERE (age, status) = (SELECT MAX(age), 1 FROM users);

-- Column subquery (returns one column)
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total_amount > 1000);

-- EXISTS subquery (checks for existence)
SELECT * FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);

-- NOT EXISTS (find users with no orders)
SELECT * FROM users u
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);

4.2 UNION

-- UNION (removes duplicates)
SELECT username, email FROM users WHERE status = 1
UNION
SELECT username, email FROM archived_users WHERE status = 1;

-- UNION ALL (keeps duplicates — better performance)
SELECT username, 'active' AS type FROM users WHERE status = 1
UNION ALL
SELECT username, 'inactive' AS type FROM users WHERE status = 0;

4.3 Window Functions (MySQL 8.0+)

Window functions let you perform calculations across a set of rows without collapsing them into groups.

-- ROW_NUMBER(): assigns a unique sequential number to each row
SELECT
    username,
    age,
    ROW_NUMBER() OVER (ORDER BY age DESC) AS rank
FROM users;

-- RANK(): same rank for ties, then skips numbers
SELECT
    username,
    age,
    RANK() OVER (ORDER BY age DESC) AS rank
FROM users;

-- DENSE_RANK(): same rank for ties, no gaps
SELECT
    username,
    age,
    DENSE_RANK() OVER (ORDER BY age DESC) AS rank
FROM users;

-- Ranking within groups (rank per status)
SELECT
    username,
    status,
    age,
    ROW_NUMBER() OVER (PARTITION BY status ORDER BY age DESC) AS rank_in_group
FROM users;

-- LAG / LEAD: access previous / next row values
SELECT
    username,
    created_at,
    LAG(created_at, 1) OVER (ORDER BY created_at) AS prev_created,
    LEAD(created_at, 1) OVER (ORDER BY created_at) AS next_created
FROM users;

-- Running total
SELECT
    username,
    total_amount,
    SUM(total_amount) OVER (ORDER BY created_at) AS running_total
FROM orders;

4.4 Common Table Expressions (CTEs)

-- Basic CTE
WITH active_users AS (
    SELECT * FROM users WHERE status = 1
)
SELECT * FROM active_users WHERE age >= 18;

-- Multiple CTEs
WITH
    active_users AS (
        SELECT * FROM users WHERE status = 1
    ),
    user_orders AS (
        SELECT user_id, COUNT(*) AS order_count
        FROM orders
        GROUP BY user_id
    )
SELECT u.username, COALESCE(o.order_count, 0) AS orders
FROM active_users u
LEFT JOIN user_orders o ON u.id = o.user_id;

-- Recursive CTE (e.g., organizational hierarchy)
WITH RECURSIVE org_tree AS (
    -- Base case: top-level nodes
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive step: child nodes
    SELECT e.id, e.name, e.manager_id, t.level + 1
    FROM employees e
    INNER JOIN org_tree t ON e.manager_id = t.id
)
SELECT * FROM org_tree;

5. Indexes Deep Dive (Interview Favorite)

Indexes are the backbone of database performance and one of the most common interview topics.

5.1 What Is an Index?

Think of an index as a book’s table of contents. It lets the database engine jump directly to the rows it needs instead of scanning the entire table.

5.2 Index Types

-- Primary key index
CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT
);

-- Unique index
CREATE UNIQUE INDEX idx_email ON users(email);
-- or
ALTER TABLE users ADD UNIQUE INDEX idx_email(email);

-- Regular index
CREATE INDEX idx_username ON users(username);

-- Composite (multi-column) index
CREATE INDEX idx_status_created ON users(status, created_at);

-- Prefix index (for long string columns)
CREATE INDEX idx_email_prefix ON users(email(10));

-- Full-text index
CREATE FULLTEXT INDEX idx_content ON articles(content);

5.3 B+ Tree Internals (Frequently Asked in Interviews)

MySQL InnoDB uses the B+ tree as its index data structure. Here is why it matters:

B+ Tree vs. B Tree:

PropertyB TreeB+ Tree
Data storageAll nodesLeaf nodes only
Leaf node linkingNoneDoubly linked list
Range query efficiencyLowHigh
Keys per I/O pageFewerMore

Why B+ tree wins:

  1. Fewer disk I/O operations — Internal nodes store only keys, so each page holds more entries.
  2. Efficient range scans — Leaf nodes form a sorted linked list; scanning a range is just a sequential traversal.
  3. Consistent performance — Every lookup reaches a leaf node, so time complexity is always O(log n).

Interview question: Why does MySQL use a B+ tree instead of a red-black tree or a hash index?

  • Red-black tree: Much taller, which means more disk I/O per lookup.
  • Hash index: Cannot support range queries or ordered scans.
  • B+ tree: Short and wide structure minimizes I/O; the leaf linked list enables range queries natively.

5.4 Clustered vs. Secondary Indexes

Clustered index (primary key index):

  • Leaf nodes store the full row data.
  • A table can have only one clustered index.
  • Selection order: primary key > first unique non-null index > hidden row_id.

Secondary index (non-primary key index):

  • Leaf nodes store the primary key value.
  • Queries require a bookmark lookup (also called a “table access by index rowid”): first look up the secondary index to get the primary key, then look up the clustered index to get the full row.
-- Assume users has a primary key on `id` and an index idx_username on `username`

-- This query requires a bookmark lookup
SELECT * FROM users WHERE username = 'alice';
-- Step 1: traverse idx_username to find the primary key for username='alice'
-- Step 2: traverse the clustered index using that primary key to get the full row

-- This query does NOT require a bookmark lookup (covering index)
SELECT id, username FROM users WHERE username = 'alice';
-- The secondary index already contains both id and username — done in one step

5.5 When Indexes Stop Working (Must-Know for Interviews)

All of the following situations cause the optimizer to skip your index:

-- 1. Using a function on an indexed column
SELECT * FROM users WHERE YEAR(created_at) = 2024;  -- index not used
SELECT * FROM users WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';  -- index used

-- 2. Implicit type conversion
-- Assume phone is VARCHAR
SELECT * FROM users WHERE phone = 13800138000;   -- numeric comparison, index skipped
SELECT * FROM users WHERE phone = '13800138000';  -- string comparison, index used

-- 3. Leading wildcard in LIKE
SELECT * FROM users WHERE username LIKE '%alice';  -- index not used
SELECT * FROM users WHERE username LIKE 'alice%';  -- index used

-- 4. OR with a non-indexed column
SELECT * FROM users WHERE username = 'alice' OR age = 25;  -- if age has no index, the whole query goes full scan

-- 5. Composite index violating the leftmost prefix rule
-- Assume index idx_a_b_c(a, b, c)
SELECT * FROM users WHERE a = 1 AND b = 2;  -- index used
SELECT * FROM users WHERE b = 2 AND c = 3;  -- leftmost prefix violated, index not used
SELECT * FROM users WHERE a = 1 AND c = 3;  -- only column a is used

-- 6. Not-equal conditions
SELECT * FROM users WHERE status != 1;  -- likely full table scan

-- 7. IS NOT NULL (depends on data distribution)
SELECT * FROM users WHERE email IS NOT NULL;  -- if most rows are non-NULL, a full scan may be cheaper

5.6 The Leftmost Prefix Rule

This is the governing principle for composite indexes and comes up in almost every database interview:

-- Composite index idx_a_b_c(a, b, c)

-- Index IS used
WHERE a = 1
WHERE a = 1 AND b = 2
WHERE a = 1 AND b = 2 AND c = 3
WHERE a = 1 AND c = 3  -- only column a participates

-- Index is NOT used
WHERE b = 2
WHERE c = 3
WHERE b = 2 AND c = 3

Pro tip: Place the most selective column (highest cardinality) first in your composite index.

5.7 Covering Indexes

When every column in your query is present in the index, the engine never needs to look up the actual table row. This is the fastest possible access pattern:

-- Assume index idx_name_age(name, age)

-- Covering index: all queried columns are in the index
SELECT name, age FROM users WHERE name = 'alice';  -- EXPLAIN shows "Using index"

-- Needs a bookmark lookup: email is not in the index
SELECT name, age, email FROM users WHERE name = 'alice';  -- must access the clustered index for email

Check the Extra column in EXPLAIN output for Using index to confirm a covering index is being used.


6. Transactions and Locks (Interview Essentials)

6.1 ACID Properties

Every transaction must satisfy four properties:

PropertyMeaningImplementation in InnoDB
AtomicityAll or nothing — a transaction either fully succeeds or fully rolls backUndo log
ConsistencyThe database moves from one valid state to anotherGuaranteed by the other three properties working together
IsolationConcurrent transactions do not interfere with each otherMVCC + locks
DurabilityOnce committed, data survives crashesRedo log

6.2 Transaction Basics

-- Start a transaction
START TRANSACTION;
-- or
BEGIN;

-- Commit
COMMIT;

-- Roll back
ROLLBACK;

-- Savepoints
SAVEPOINT sp1;
ROLLBACK TO sp1;

-- Example: money transfer
START TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE user_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE user_id = 2;

-- Verify constraints
SELECT balance FROM accounts WHERE user_id = 1;
-- If balance is negative, roll back
-- ROLLBACK;

COMMIT;

6.3 Isolation Levels (High-Frequency Interview Topic)

The SQL standard defines four isolation levels:

Isolation LevelDirty ReadNon-Repeatable ReadPhantom Read
READ UNCOMMITTEDPossiblePossiblePossible
READ COMMITTEDPreventedPossiblePossible
REPEATABLE READPreventedPreventedPossible*
SERIALIZABLEPreventedPreventedPrevented

MySQL InnoDB default: REPEATABLE READ. InnoDB largely prevents phantom reads at this level through MVCC and gap locks.

What the three anomalies mean:

  1. Dirty read — Reading data written by an uncommitted transaction (data may vanish if that transaction rolls back).
  2. Non-repeatable read — Reading the same row twice within a transaction and getting different values because another transaction modified it in between.
  3. Phantom read — Running the same query twice within a transaction and getting a different set of rows because another transaction inserted or deleted rows.
-- Check the current isolation level
SELECT @@transaction_isolation;

-- Change the isolation level for the current session
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

6.4 MVCC (Multi-Version Concurrency Control)

MVCC is how InnoDB achieves high concurrency without readers blocking writers (or vice versa).

How it works:

  1. Every row has hidden columns: DB_TRX_ID (the transaction ID that last modified the row) and DB_ROLL_PTR (a pointer to the previous version in the undo log).
  2. When a row is updated, the old version is preserved in the undo log, forming a version chain.
  3. When a transaction reads a row, it consults a Read View to decide which version is visible.

Read View contents:

  • creator_trx_id — The transaction ID that created this Read View.
  • m_ids — The list of currently active (uncommitted) transaction IDs.
  • min_trx_id — The smallest ID in m_ids.
  • max_trx_id — The next transaction ID to be assigned.

Visibility rules:

  1. If the row’s trx_id equals creator_trx_id, it is visible (you modified it yourself).
  2. If the row’s trx_id is less than min_trx_id, it is visible (the transaction that wrote it has already committed).
  3. If the row’s trx_id is greater than or equal to max_trx_id, it is not visible (the transaction had not started when this Read View was created).
  4. If the row’s trx_id is in m_ids, it is not visible (the transaction is still in progress).

Key difference between RC and RR:

  • READ COMMITTED (RC): A new Read View is created on every SELECT.
  • REPEATABLE READ (RR): A single Read View is created at the start of the transaction and reused.

6.5 Lock Mechanisms

Lock Categories

DimensionTypesNotes
GranularityTable lock, Row lockRow locks offer higher concurrency but more overhead
ModeShared (S), Exclusive (X)S locks allow concurrent reads; X locks provide exclusive write access
AlgorithmRecord lock, Gap lock, Next-key lockUsed to prevent phantom reads

Row Lock Types (InnoDB)

-- Shared lock (S): multiple transactions can read simultaneously
SELECT * FROM users WHERE id = 1 LOCK IN SHARE MODE;
-- MySQL 8.0+ syntax
SELECT * FROM users WHERE id = 1 FOR SHARE;

-- Exclusive lock (X): only one transaction can read or write
SELECT * FROM users WHERE id = 1 FOR UPDATE;

-- Gap lock: locks the space between index records
-- Activated under RR isolation level for range queries
SELECT * FROM users WHERE age > 20 FOR UPDATE;  -- locks gaps where age > 20

-- Next-key lock = record lock + gap lock
-- This is InnoDB's default row-locking algorithm

Deadlocks

A deadlock occurs when two transactions each hold a lock the other one needs:

-- Transaction A
START TRANSACTION;
UPDATE users SET status = 1 WHERE id = 1;  -- locks row id=1
UPDATE users SET status = 1 WHERE id = 2;  -- waits for row id=2...

-- Transaction B
START TRANSACTION;
UPDATE users SET status = 2 WHERE id = 2;  -- locks row id=2
UPDATE users SET status = 2 WHERE id = 1;  -- waits for row id=1...

-- Deadlock! InnoDB detects it and rolls back one of the transactions.

How to avoid deadlocks:

  1. Always access resources in a consistent order.
  2. Use indexed lookups so locks target specific rows, not entire tables.
  3. Keep transactions short to minimize lock-holding time.
  4. Consider a lower isolation level if your use case allows it.
-- View the latest deadlock information
SHOW ENGINE INNODB STATUS;

-- View current lock waits
SELECT * FROM performance_schema.data_lock_waits;

7. Query Optimization in Practice

7.1 EXPLAIN Execution Plans

The first step in diagnosing slow queries is to examine the execution plan. For a deeper dive, see MySQL EXPLAIN Execution Plan Explained.

EXPLAIN SELECT * FROM users WHERE username = 'alice';

Key fields to watch:

  • type — Access method, ranked best to worst: const > eq_ref > ref > range > index > ALL
  • key — The index actually used
  • rows — Estimated number of rows scanned
  • Extra — Additional info; watch out for Using filesort and Using temporary

7.2 Slow Query Log

-- Check slow query configuration
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

-- Enable the slow query log
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;  -- log queries taking longer than 1 second

-- Analyze the slow query log with mysqldumpslow
-- mysqldumpslow -s t -t 10 /var/lib/mysql/slow.log

7.3 Common Optimization Techniques

Pagination Optimization

-- Problem: deep pagination is extremely slow
SELECT * FROM orders ORDER BY id LIMIT 1000000, 10;  -- scans 1 million rows

-- Solution 1: deferred join
SELECT * FROM orders o
INNER JOIN (SELECT id FROM orders ORDER BY id LIMIT 1000000, 10) t
ON o.id = t.id;

-- Solution 2: keyset (cursor-based) pagination
SELECT * FROM orders WHERE id > 1000000 ORDER BY id LIMIT 10;

Batch Updates for Large Tables

-- Problem: updating millions of rows at once causes severe lock contention
UPDATE orders SET status = 2 WHERE created_at < '2024-01-01';

-- Solution: process in batches
UPDATE orders SET status = 2
WHERE created_at < '2024-01-01' AND status != 2
LIMIT 1000;

-- Repeat until affected rows = 0

COUNT Optimization

-- COUNT(*) vs COUNT(1) vs COUNT(column)
-- MySQL treats COUNT(*) and COUNT(1) identically — both count all rows.
-- COUNT(column) counts only non-NULL values in that column.

-- For InnoDB, COUNT(*) always requires scanning rows.
-- Optimization strategies:
-- 1. Use an approximate count: SHOW TABLE STATUS LIKE 'orders';
-- 2. Maintain a counter in a cache (e.g., Redis)
-- 3. Use a summary table

Sort Optimization

-- Problem: filesort is slow
SELECT * FROM orders WHERE status = 1 ORDER BY created_at DESC LIMIT 10;

-- Solution: create a composite index
CREATE INDEX idx_status_created ON orders(status, created_at);
-- The index is already sorted, so no extra sorting is needed

7.4 SQL Optimization Checklist

AreaWhat to Check
SELECTAvoid SELECT *; list only the columns you need
WHEREIndex your filter columns; avoid functions and implicit type conversions
JOINDrive from the smaller table; index join columns
ORDER BYLeverage index ordering whenever possible
LIMITUse keyset pagination or deferred joins for deep pages
SubqueriesConsider rewriting as JOINs
UNIONUse UNION ALL when deduplication is not needed

8. Interview Questions You Should Know

8.1 Core Concepts

Q1: What storage engines does MySQL offer? How do InnoDB and MyISAM differ?

FeatureInnoDBMyISAM
TransactionsYesNo
Row-level lockingYesTable-level only
Foreign keysYesNo
Crash recoveryYesNo
Full-text searchSince MySQL 5.6Yes
COUNT(*) speedRequires scanStores row count

Q2: What are the three normal forms?

  • 1NF — Every column holds atomic (indivisible) values.
  • 2NF — All non-key columns depend on the entire primary key (no partial dependencies).
  • 3NF — No non-key column depends on another non-key column (no transitive dependencies).

In practice, controlled denormalization is common for performance.

Q3: Does VARCHAR(100) use more space than VARCHAR(10)?

  • Disk storage: Identical for the same actual data length.
  • Memory allocation: MySQL may allocate buffers based on the declared maximum.
  • Constraint: Limits the maximum string length at the application level.

Q4: Auto-increment ID or UUID for primary keys?

AspectAuto-IncrementUUID
Write performanceSequential inserts (fast)Random inserts (page splits)
Storage4–8 bytes36 bytes
PredictabilityGuessableNot guessable
Distributed systemsRequires coordinationNaturally globally unique

Rule of thumb: use auto-increment for single-node setups; use Snowflake IDs for distributed systems.

8.2 Indexes

Q5: Why B+ tree instead of B tree, hash, or red-black tree?

  • vs. B tree: B+ trees store data only in leaves, so each internal page holds more keys (fewer I/O reads). The leaf linked list is ideal for range scans.
  • vs. Hash: Hash indexes cannot support range queries or ordering.
  • vs. Red-black tree: Much deeper, leading to more disk I/O.

Q6: Clustered vs. non-clustered index?

  • Clustered: Leaf nodes contain the full row. One per table.
  • Non-clustered: Leaf nodes contain the primary key value. Queries may require a bookmark lookup.

Q7: When does an index become useless?

  • Applying a function to the indexed column
  • Implicit type conversion
  • Leading-wildcard LIKE
  • OR with a non-indexed column
  • Violating the leftmost prefix rule on a composite index
  • Not-equal comparisons (in many cases)

Q8: Explain the leftmost prefix rule for a composite index (a, b, c).

  • WHERE a = 1 — uses the index
  • WHERE a = 1 AND b = 2 — uses the index
  • WHERE b = 2 — does not use the index
  • WHERE a = 1 AND c = 3 — only column a is used

8.3 Transactions and Locks

Q9: What is ACID, and how does MySQL guarantee each property?

  • Atomicity: Undo log enables rollback.
  • Consistency: Ensured by the other three properties together.
  • Isolation: MVCC + locking.
  • Durability: Redo log survives crashes.

Q10: What isolation levels does MySQL support? What is the default?

Four levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE.

InnoDB defaults to REPEATABLE READ.

Q11: How does MVCC work?

Multi-Version Concurrency Control maintains a version chain (via undo logs) and uses a Read View to determine which row versions are visible. Under RC, a new Read View is created per SELECT. Under RR, the Read View is created once and reused.

Q12: What is a deadlock, and how do you prevent one?

A deadlock happens when two transactions each wait for a lock held by the other.

Prevention strategies:

  • Access rows in a consistent order
  • Keep transactions short
  • Use indexed lookups to minimize lock scope
  • Set a lock wait timeout

8.4 Performance

Q13: How do you find slow queries?

  1. Enable the slow query log.
  2. Run EXPLAIN on the offending query.
  3. Examine the type, key, rows, and Extra fields.

Q14: How do you optimize deep pagination?

-- Deferred join
SELECT * FROM t
INNER JOIN (SELECT id FROM t ORDER BY id LIMIT 1000000, 10) t2
ON t.id = t2.id;

-- Keyset pagination
SELECT * FROM t WHERE id > last_id ORDER BY id LIMIT 10;

Q15: A query is running slowly. What could be wrong?

  1. Missing index or index not being used
  2. Lock contention (blocked by another transaction)
  3. Table is too large for the current query plan
  4. Server resources exhausted (CPU, memory, disk I/O)
  5. Network latency

9. Hands-On Exercises

9.1 SQL Writing Practice

Sample schema:

CREATE TABLE students (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50),
    class_id INT,
    score INT
);

CREATE TABLE classes (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50)
);

Exercises:

-- 1. Average score per class, ordered descending
SELECT c.name AS class_name, AVG(s.score) AS avg_score
FROM classes c
LEFT JOIN students s ON c.id = s.class_id
GROUP BY c.id
ORDER BY avg_score DESC;

-- 2. Top scorer(s) in each class
SELECT s.*
FROM students s
INNER JOIN (
    SELECT class_id, MAX(score) AS max_score
    FROM students
    GROUP BY class_id
) t ON s.class_id = t.class_id AND s.score = t.max_score;

-- 3. Students scoring above the overall average
SELECT *
FROM students
WHERE score > (SELECT AVG(score) FROM students);

-- 4. Rank students within each class using window functions
SELECT
    name,
    class_id,
    score,
    RANK() OVER (PARTITION BY class_id ORDER BY score DESC) AS rank_in_class
FROM students;

-- 5. Find users who logged in for at least 3 consecutive days
WITH ranked_logins AS (
    SELECT
        user_id,
        login_date,
        DATE_SUB(login_date, INTERVAL ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) DAY) AS grp
    FROM login_logs
)
SELECT DISTINCT user_id
FROM ranked_logins
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;

9.2 Real-World Scenario: E-Commerce Analytics

-- 1. Daily order count and revenue
SELECT
    DATE(created_at) AS order_date,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_sales
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY DATE(created_at)
ORDER BY order_date;

-- 2. Each user's first order date and lifetime spend
SELECT
    user_id,
    MIN(created_at) AS first_order_time,
    SUM(total_amount) AS total_spent,
    COUNT(*) AS order_count
FROM orders
GROUP BY user_id;

-- 3. Top 10 customers by total spend
SELECT
    u.id,
    u.username,
    COALESCE(SUM(o.total_amount), 0) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id
ORDER BY total_spent DESC
LIMIT 10;

-- 4. Monthly product sales leaderboard
SELECT
    DATE_FORMAT(o.created_at, '%Y-%m') AS month,
    p.name AS product_name,
    SUM(oi.quantity) AS total_quantity
FROM order_items oi
JOIN orders o ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id
GROUP BY month, p.id
ORDER BY month, total_quantity DESC;

10. Summary

This guide covered the full spectrum of MySQL SQL knowledge:

SectionKey Topics
FundamentalsDatabase and table operations, data type selection
CRUDINSERT, SELECT, UPDATE, DELETE
Multi-table queriesJOIN types, subqueries
Advanced featuresWindow functions, CTEs
IndexesB+ tree internals, index invalidation, covering indexes
TransactionsACID, isolation levels, MVCC, locking
OptimizationEXPLAIN, slow query log, pagination strategies

Recommendations for continued learning:

  1. Practice relentlessly — Work through the LeetCode SQL 50 problem set.
  2. Understand the internals — Knowing how to write SQL is not enough; understanding how indexes and transactions work under the hood sets you apart.
  3. Make EXPLAIN a habit — Run it on every non-trivial query before deploying to production.

References

Comments

Join the discussion — requires a GitHub account