Skip to content
/ server Public

Conversation

@Thirunarayanan
Copy link
Member

@Thirunarayanan Thirunarayanan commented Jan 21, 2026

Problem:
=======
An assertion failure occurs in InnoDB during consecutive
ALTER TABLE operations using the COPY algorithm. The crash
happens during the table rename phase because the
source table (dict_table_t::n_ref_count) is non-zero, despite
the thread holding an exclusive Metadata Lock (MDL).

Reason:
========
When ALTER IGNORE TABLE is executed via the COPY algorithm,
it generates undo logs for every row inserted into the
intermediate table (e.g., #sql-alter-...). The background Purge
Thread, responsible for cleaning up these undo logs, attempts
to take an MDL on the table to prevent the table from
being dropped while in use.

Race condition:
==================
First ALTER: Creates #sql-alter-, copies data, and renames it to t1.

Purge Activation: The Purge thread picks up the undo logs from step 1.
It takes an MDL on the temporary name (#sql-alter-) and increments
the table's n_ref_count.

Identity Shift: InnoDB renames the physical table object to t1, but
the Purge thread still holds a reference to this object.

Second ALTER: Starts a new copy process. When it attempts to rename
the "new" t1 to a backup name, it checks if n_ref_count == 0.
Because the Purge thread is still "pinning" the object to
clean up logs from the first ALTER, the count is > 0,
    triggering the assertion failure.

Solution:
========
By having the Server take an MDL on the temporary table name that maps
to the same internal InnoDB lock as the target table, we ensure that
the Purge Thread and the DDL thread are serialized.

dict_table_t::parse_tbl_name() : Extracts database and table
components from full table names

mysql_alter_table(): MDL acquistion on temporary tables

innobase_rename_table(), delete_table(): Added a check to
verify MDL ownership

InnoDB purge thread now takes MDL on table name directly
instead of relying on mdl_name. Removed mdl_name from
dict_table_t

innodb_has_mdl(): This function verifies whether current
thread hold MDL on the given table. It does the following:

1) Returns true for inplace intermediate temporary tables(#sql-ib)
2) Returns true for FTS Auxiliary Tables becuase the Purge system is
explicitly stopped during DDL changes to Full-Text tables,
these are inherently safe from the Purge and Rename race condition.
3) Returns true for partition names. This is because when InnoDB
operates on a partition while holding an MDL on the base table name,
but the scenarios like adding a partition where a base table
MDL might not be held in the same way.
4) Returns true only if the thread holds an MDL_EXCLUSIVE lock.
5) Returns FALSE if the thread does not hold the appropriate MDL.

@Thirunarayanan Thirunarayanan requested a review from dr-m January 21, 2026 17:08
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@svoj
Copy link
Contributor

svoj commented Jan 21, 2026

This looks suspicious to my eye. I didn't dive deep into this, from what I understand it is needed to hold purge from accessing intermediate tables. It would be good if commit comment would state the problem clearly and explain the solution. I mean not something like "we fix assertion failure", but rather something like "purge thread may prematurely clear intermediate table history".

Registering temporary tables in MDL subsystem may lead to conflicts between threads, even though names are quite unique.

Problem:
=======
An assertion failure occurs in InnoDB during consecutive
ALTER TABLE operations using the COPY algorithm. The crash
happens during the table rename phase because the
source table (dict_table_t::n_ref_count) is non-zero, despite
the thread holding an exclusive Metadata Lock (MDL).

Reason:
========
When ALTER IGNORE TABLE is executed via the COPY algorithm,
it generates undo logs for every row inserted into the
intermediate table (e.g., #sql-alter-...). The background Purge
Thread, responsible for cleaning up these undo logs, attempts
to take an MDL on the table to prevent the table from
being dropped while in use.

Race condition:
==================
First ALTER: Creates #sql-alter-, copies data, and renames it to t1.

Purge Activation: The Purge thread picks up the undo logs from step 1.
It takes an MDL on the temporary name (#sql-alter-) and increments
the table's n_ref_count.

Identity Shift: InnoDB renames the physical table object to t1, but
the Purge thread still holds a reference to this object.

Second ALTER: Starts a new copy process. When it attempts to rename
the "new" t1 to a backup name, it checks if n_ref_count == 0.
Because the Purge thread is still "pinning" the object to
clean up logs from the first ALTER, the count is > 0,
triggering the assertion failure.

Solution:
========
By having the Server take an MDL on the temporary table name that maps
to the same internal InnoDB lock as the target table, we ensure that
the Purge Thread and the DDL thread are serialized.

dict_table_t::parse_tbl_name() : Extracts database and table
components from full table names

mysql_alter_table(): MDL acquistion on temporary tables

innobase_rename_table(), delete_table(): Added a check to
verify MDL ownership

InnoDB purge thread now takes MDL on table name directly
instead of relying on mdl_name. Removed mdl_name from
dict_table_t

innodb_has_mdl(): This function verifies whether current
thread hold MDL on the given table. It does the following:

1) Returns true for inplace intermediate temporary tables(#sql-ib)
2) Returns true for FTS Auxiliary Tables becuase the Purge system is
explicitly stopped during DDL changes to Full-Text tables,
these are inherently safe from the Purge and Rename race condition.
3) Returns true for partition names. This is because when InnoDB
operates on a partition while holding an MDL on the base table name,
but the scenarios like adding a partition where a base table
MDL might not be held in the same way.
4) Returns true only if the thread holds an MDL_EXCLUSIVE lock.
5) Returns FALSE if the thread does not hold the appropriate MDL.
@Thirunarayanan Thirunarayanan changed the title MDEV-36436 Acquire MDL on temporary table during DDL MDEV-36436 Acquire MDL on temporary name of persistent table during DDL Jan 23, 2026
@dr-m
Copy link
Contributor

dr-m commented Jan 23, 2026

This looks suspicious to my eye. I didn't dive deep into this, from what I understand it is needed to hold purge from accessing intermediate tables. It would be good if commit comment would state the problem clearly and explain the solution. I mean not something like "we fix assertion failure", but rather something like "purge thread may prematurely clear intermediate table history".

Registering temporary tables in MDL subsystem may lead to conflicts between threads, even though names are quite unique.

I already pointed out to @Thirunarayanan that the title is misleading. The InnoDB purge subsystem does not deal with any temporary tables at all. This is about temporary names of persistent tables during DDL operation. This comment includes a test case that only involves a persistent table that is being rebuilt twice:

--source include/have_innodb.inc
create table t1(f1 int not null, f2 int default null,
                f3 int not null)engine=innodb;
insert into t1 values(1, 1, 1);
show create table t1;
ALTER IGNORE TABLE `test`.`t1` CHANGE IF EXISTS `f2` f2 BOOL NULL DEFAULT NULL;
ALTER IGNORE TABLE test.t1 CHANGE f3 f3 bool null default null;
show create table t1;
drop table t1;

@Thirunarayanan Thirunarayanan marked this pull request as draft January 23, 2026 12:12
@svoj
Copy link
Contributor

svoj commented Jan 23, 2026

The problem is #sql-alter is a valid table identifier. Yes, it gets escaped, but still users may create tables with such names:

source include/have_metadata_lock_info.inc;
create table `#sql-alter-3d2867-8`(a INT);
lock table `#sql-alter-3d2867-8` read;
select * from information_schema.metadata_lock_info;
unlock tables;
drop table `#sql-alter-3d2867-8`;

Why do you have to do any logging/trx processing for such tables? IIUC correctly the only undo record that should be there is to drop the table if something goes wrong. But that's handled by the ddl log facility.

@dr-m
Copy link
Contributor

dr-m commented Jan 23, 2026

The problem is #sql-alter is a valid table identifier. Yes, it gets escaped, but still users may create tables with such names

Furthermore, names like #mysql50#sql-alter can clash at the file system level. It is a bad design.

Why do you have to do any logging/trx processing for such tables?

In ALTER IGNORE TABLE, we rely on an undo log record so that in case there is a duplicate key error, the last inserted record will be rolled back from every index where it had been successfully inserted. Like I pointed out in MDEV-36436, we can actually discard the last written undo log record after each successful insert. In that way, the ALTER IGNORE TABLE will not leave behind any undo log records that the InnoDB purge subsystem would look for.

Originally, this idea started within MDEV-36493 #4548 and I suggested that the same approach could be used to fix this bug. I now realize that we could address also the CREATE OR REPLACE TABLE problem if copy_data_between_tables() wrote the data to a table with a temporary name and if InnoDB discarded the undo log record near the end of each successfully completed ha_innobase::write_row(). In that way, we would not leave behind any such garbage for the purge subsystem to pick up later. Unfortunately, the currently suggested atomic CREATE OR REPLACE TABLE implementation would first rename the table and then copy the data while using the persistent table name. I suggested a change to that but I assume that some additional step would be needed to validate any foreign key constraints (MDEV-16356).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants