FXJ Wiki

Back

How MySQL Preserves Consistency: Transactions, Locks, MVCC, and Logs
30 min read
English interview prep / mysql / log / lock

How MySQL Preserves Consistency: Transactions, Locks, MVCC, and Logs

Previously recorded content about MySQL in Apple Notes
views | comments

Content Included#

  • MySQL Transactions and Logs

MySQL Transactions, Locks, MVCC, and Logs#

Transactions#

ACID Properties#

The ACID properties and how they are guaranteed.

  • Atomicity: Guaranteed by undo log.
  • Consistency: Guaranteed by the other three properties.
  • Isolation: Guaranteed by locking mechanisms / MVCC.
  • Durability: Guaranteed by redo log.

Transaction Isolation Levels#

  1. Read Uncommitted: Changes made by a transaction before it commits can be seen by other transactions.
  2. Read Committed: Changes made by a transaction can only be seen by other transactions after it commits.
  3. Repeatable Read: Data seen during a transaction’s execution is consistent with the data seen when the transaction started. This is the default isolation level for MySQL’s InnoDB engine.
  4. Serializable: Acquires read and write locks on records. When multiple transactions perform read/write operations on a record and a read-write conflict occurs, subsequent transactions must wait for the previous transaction to complete before continuing execution.

Generally speaking, using Repeatable Read (the default) can largely avoid phantom read issues (though they can still occur). The Serializable isolation level impacts performance. Phantom reads are basically resolved through the following two methods:

  1. For ordinary SELECT statements (snapshot reads), MVCC resolves phantom reads.
  2. For SELECT…FOR UPDATE statements (current reads), next-key locks (record lock + gap lock) resolve phantom reads.

Implementation methods:

  • For the Read Uncommitted isolation level, because uncommitted changes from other transactions can be read, the latest data is read directly.
  • For the Serializable isolation level, concurrent access is prevented by adding read/write locks.
  • For the Read Committed and Repeatable Read isolation levels, they are implemented using Read Views. Their difference lies in when the Read View is created. Think of a Read View as a data snapshot, like a camera capturing the scenery at a specific moment. The Read Committed isolation level re-generates a Read View before each statement execution, whereas the Repeatable Read isolation level generates a Read View when the transaction starts and uses it throughout the entire transaction.

Commands to start a transaction in MySQL:

  1. BEGIN / START TRANSACTION: The transaction is not truly considered started until the first SELECT statement is executed.
  2. START TRANSACTION WITH CONSISTENT SNAPSHOT: Starts the transaction immediately.

Dirty Read / Non-repeatable Read / Phantom Read#

  • Dirty Read: A transaction reads data that has been modified by another uncommitted transaction.
  • Non-repeatable Read: In the same transaction, reading the same data multiple times results in different values.
  • Phantom Read: In the same transaction, querying the number of records that meet a certain condition multiple times results in different counts.

MVCC (Important)#

How does Read View work in MVCC? Read View:

After creating a Read View, the record’s trx_id falls into one of these three categories:

This method of controlling concurrent transaction access to the same record through a “version chain” is called MVCC (Multi-Version Concurrency Control).

Simply put, the MVCC logic chain can be remembered like this:

  • Each row record carries trx_id and roll_pointer.
  • trx_id indicates the transaction ID that last modified this row.
  • roll_pointer points from the current version to the previous undo log version.
  • Thus, a record forms a version chain through undo logs.
  • During a snapshot read, a transaction uses its own Read View to traverse the version chain backwards until it finds a version visible to it.

Add a few points most likely to be asked in follow-up questions from Apple Notes:

  • undo log is not just for rollback; it’s also the true source of MVCC historical versions.
  • For DELETE, InnoDB does not physically delete immediately. It first marks the record for deletion, which is later cleaned up by the purge thread.
  • For UPDATE:
    • If the primary key column is updated, it’s essentially treated as “delete old row + insert new row”.
    • If a non-primary key column is updated, the old value is recorded in the undo log. During rollback or snapshot reads, the historical version can be accessed along the version chain.
  • undo pages themselves also go into the Buffer Pool. True persistence still relies on redo log as the fallback.
Read View Visibility Determination#
  • trx_id < min_trx_id: The transaction that modified this version had already committed when the snapshot was created. The current version is visible.
  • trx_id >= max_trx_id: This transaction ID was assigned after the snapshot was created. The current version is not visible; need to look for an older version.
  • min_trx_id <= trx_id < max_trx_id:
    • If trx_id is in the list of active transactions, it means this transaction hadn’t committed when the snapshot was taken. The version is not visible.
    • Otherwise, the transaction had already committed, so the version is visible.
How does MVCC implement Read Committed / Repeatable Read?#

For Read Committed:

  • A new Read View is regenerated before each SELECT statement execution.

For Repeatable Read:

  • A Read View is generated when the transaction starts (upon the first SELECT or BEGIN), and this Read View remains valid throughout the entire transaction lifecycle without being regenerated.
In which scenario can MVCC not completely prevent phantom reads?#
## Transaction A-----------------
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t_stu where id = 5;
Empty set (0.01 sec)

## Transaction B-----------------
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t_stu values(5, 'Xiaomei', 18);
Query OK, 1 row affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

## Transaction A-----------------
mysql> update t_stu set name = 'Xiaolin Coding' where id = 5;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from t_stu where id = 5;
+----+----------------+------+
| id | name           | age  |
+----+----------------+------+
|  5 | Xiaolin Coding |   18 |
+----+----------------+------+
1 row in set (0.00 sec)
sql

Attention: The main reason is that MVCC only supports SELECT. It’s ineffective when UPDATE is involved…

However, phantom reads can be completely resolved using MVCC + next-key lock!

The InnoDB storage engine resolves phantom reads at the RR level through MVCC and Next-key Lock:

  1. Executing ordinary SELECT: Data is read using MVCC snapshot reads. In snapshot reads, the RR isolation level generates a Read View only upon the first query in the transaction and uses it until the transaction commits. Therefore, updates and inserts made by other transactions after the Read View is generated are not visible to the current transaction, achieving repeatable reads and preventing “phantom reads” under snapshot reads.

  2. Executing current reads like SELECT...FOR UPDATE / LOCK IN SHARE MODE, INSERT, UPDATE, DELETE: Under current reads, the latest data is always read. If another transaction inserts a new record that falls within the current transaction’s query scope, a phantom read would occur! InnoDB uses Next-key Lock to prevent this. When a current read is executed, it locks the records that are read and also locks the gaps between them, preventing other transactions from inserting data within the query scope. Preventing insertion prevents phantom reads.

graph TD
    MVCC[MVCC Mechanism] --> UndoLog[Undo Log Version Chain]
    MVCC --> ReadView[ReadView Consistent View]
    UndoLog --> |Constructs historical versions| RowVersions[Row Multi-Version Chain]
    ReadView --> |Determines visibility| Visibility[Data Visibility Rules]
    
    RowVersions --> Row1[Current Version: trx_id=200]
    RowVersions --> Row2[Historical Version 1: trx_id=150]
    RowVersions --> Row3[Historical Version 2: trx_id=100]
    
    ReadView --> RV1[Active Transaction List: 180,220]
    ReadView --> RV2[Minimum Active Transaction ID: 180]
    ReadView --> RV3[Maximum Allocated Transaction ID: 250]
mermaid

Locking Mechanism#

  • Global Locks
  • Table-level Locks
    • Table Locks
    • Metadata Locks (MDL)
    • Intention Locks
    • AUTO-INC Locks
  • Row-level Locks
    • Record Lock: Locks a single index record.
    • Gap Lock: Locks a gap between index records (open interval).
    • Next-Key Lock: Combination of Gap Lock and Record Lock (usually seems to be left-open, right-closed).
    • Insert Intention Lock

The previous explanation from Gemini was quite good.

Usage Scenarios#

This section directly brings back the lock details from Apple Notes, focusing on: How Record Lock, Gap Lock, and Next-Key Lock are applied for different index types + equality/range queries.

First, remember the command to observe locks:

select * from performance_schema.data_locks\G;
sql
  • Unique Index Equality Query
    • Record exists: next-key lock degenerates into a record lock.
    • Record does not exist: Degenerates into a gap lock. Because locks are placed on indexes, a non-existent record itself cannot be record-locked.
  • Unique Index Range Query
    • id > target: Scans to the right using next-key intervals like (target, next]. The final segment might become (last, +∞].
    • id >= target: The starting point might first degrade to a record lock, but subsequent intervals are still locked according to the range.
    • id < target / id <= target: Key point is whether the right boundary crosses the query’s upper bound. Once crossed, the final segment often degrades to a gap lock, as the main goal is to prevent phantom reads caused by inserts within the interval.
  • Non-Unique Index Equality Query
    • This usually locks not only the matched records but also adds gap/next-key locks.
    • Typical reason: If only the currently matched rows are locked, a new record with the same index value but a different primary key could still be inserted, changing the result set on the next query.
    • The note’s example age = 22 for update demonstrates a combination: a next-key lock for (21,22], a gap lock for (22,39), and record locks on the primary keys of the matched rows.
  • Non-Unique Index Range Query
    • next-key lock generally does not degrade easily. The core goal is to ensure “re-querying this range yields the same result set”.
  • Query Without an Index
    • Leads to a full table scan. All records encountered along the way are locked. UPDATE/DELETE without an index behave the same. Therefore, such statements are both slow and tend to significantly widen the lock scope.

Deadlock Supplement#

The deadlock example mentioned in Apple Notes essentially illustrates the conflict between gap locks and insert intention locks.

  • gap lock and gap lock do not conflict with each other.
  • However, if two transactions both acquire a gap lock on a certain interval and then each tries to insert data into that interval, they will request an insert intention lock.
  • The insert intention lock conflicts with the gap lock held by the other transaction. Thus, both wait for each other, creating a deadlock.

So, for this question, don’t just say “a deadlock occurred”. Explain more clearly:

  • Why could they coexist during the query phase?
  • Why did they block each other once the insertion phase started?
  • Why does InnoDB eventually have to roll back one of the transactions?

Logs#

  • redo log
  • binlog
  • undo log
  • Buffer pool

For this part, Heima’s video content seems decent and can be referenced.

undo log#

  • Inserting a record: Records the primary key value. For rollback, simply delete this inserted record.
  • Deleting a record: First marks it for deletion while retaining the old record information. For rollback, restore the record. Actual physical deletion is handled by the purge thread.
  • Updating a record:
    • Updating a primary key column is essentially treated as “delete old row + insert new row”.
    • Updating a non-primary key column logs the old value, allowing a reverse update during rollback.

undo log is a logical log.

Functions:

  • Guarantees transaction atomicity, enabling transaction rollback.
  • Implements MVCC via Read View + undo log.
  • Chains historical versions using trx_id + roll_pointer.

Buffer Pool#

The cache pool.

One of the core structures that handles read/write performance in the InnoDB engine.

  • When reading data, if the data page is already in the buffer pool, it’s read directly from memory.
  • When modifying data, the in-memory page is updated first, and the page is marked as “dirty”.
  • Dirty pages are not flushed to disk immediately. A background thread writes them back to disk at an appropriate time.

Common scenarios triggering dirty page flush:

  • The redo log is almost full.
  • Buffer Pool space is low, requiring dirty page eviction.
  • A background thread performs periodic flushing.
  • Before MySQL shuts down normally, it tries to flush dirty pages as much as possible.

Apple Notes also added a few often-overlooked structures:

  • free list: Quickly obtain an available clean cache page without scanning the entire memory area.
  • flush list: Strings dirty pages together separately, allowing the background thread to traverse directly when flushing.
  • LRU list: Manages hot and cold pages.

Moreover, InnoDB doesn’t use a simple LRU; it divides the LRU list into young and old regions:

  • Solves read-ahead failure: Pages brought in by read-ahead are first placed in the old region, preventing them from immediately occupying hot spots.
  • Solves Buffer Pool pollution: Pages from large scans must spend enough time in the old region before qualifying for promotion to young.

The Buffer Pool contains not just data pages, but also:

  • Data pages
  • Index pages
  • Insert buffer pages
  • undo pages
  • Adaptive hash index
  • Lock information

When querying a single record, InnoDB doesn’t just load that one row; it loads the entire page into the cache, then locates the record via the page directory.

redo log#

To prevent data loss during a power failure, InnoDB uses WAL (Write-Ahead Logging):

  • First, update the in-memory page.
  • Concurrently, write the page modification as a redo log entry.
  • Later, at an appropriate time, flush the dirty page back to disk.

redo log is a physical log: It records “which tablespace, which page, which offset, was changed to what value”.

Its two core values are:

  • Crash recovery: Can redo the modifications of committed transactions that hadn’t been flushed to disk yet.
  • Provides durability guarantees for in-memory modifications like undo pages and data pages.

Points often discussed alongside redo log:

  • redo log buffer: Buffers log entries in memory to reduce frequent disk flushes.
  • page cache: The operating system’s filesystem page cache.
  • innodb_flush_log_at_trx_commit: Controls the flush strategy at transaction commit time.

binlog#

binlog belongs to the MySQL Server layer, not the InnoDB layer.

It is a logical log, commonly used for:

  • Master-slave replication.
  • Point-in-time recovery.
  • Auditing changes.

In an interview, explaining these points is usually sufficient:

  • redo log handles crash recovery and ensures InnoDB durability.
  • binlog records “what this change did” for the MySQL Server layer’s use.

Why the Two-Phase Commit for redo log and binlog?#

If only one log were written before committing, problems would occur:

  • redo log exists, binlog missing: The master can recover, but master-slave replication would lose the transaction.
  • binlog exists, redo log missing: A slave might have applied this change, but if the master crashes and recovers, it might miss this data.

Therefore, InnoDB uses a two-phase commit:

  1. Write the redo log to the PREPARE state.
  2. Write the binlog.
  3. Mark the redo log as COMMIT.

This way, even if a crash occurs midway, the state of redo log + binlog can be used to determine whether the transaction should be recovered or not.

Optimization!#

JavaGuide:

  • Read-Write Separation
  • Database/Table Sharding
    • Solving Master-Slave Delay
    • Conditions warranting sharding
  • Data Tiering (Hot/Cold Separation)
  • SQL Performance Optimization?
    • Disk I/O optimization related to logs (see notes)
  • Caching Mechanisms?
How MySQL Preserves Consistency: Transactions, Locks, MVCC, and Logs
https://fxj.wiki/en/blog/interview-mysql-2
Author 玛卡巴卡
Published at 2025年6月5日
Comment seems to stuck. Try to refresh?✨