Skip to content
dbm.lt
Open menu
Back to blog
sql-server in-memory-oltp troubleshooting performance

The In-Memory OLTP filegroup you can never delete (and the CPU it quietly burns)

A database that once used a memory-optimized table, migrated back to disk, and was left with a MEMORY_OPTIMIZED_DATA filegroup it can't get rid of — and an XTP background thread burning CPU with zero in-memory tables. Reproduced on SQL Server 2019 and 2025: why it happens, every removal trick that fails (with the exact error numbers), and the only fix that actually works.

Some years ago a team tried SQL Server’s In-Memory OLTP (memory-optimized tables) to speed up a hot table. It didn’t pan out, so the table was migrated back to a normal disk-based table and dropped. Job done — except it wasn’t. Long after, one of six production servers started showing periodic 100% CPU spikes in Redgate SQL Monitor, and the culprit was a background thread doing work for a feature the database no longer used. The memory-optimized table was gone, but the filegroup it required was still there — and, as it turns out, you can’t remove it.

I reproduced the whole thing in Docker (on SQL Server 2019 to match the affected servers, and again on 2025), confirmed each symptom, and worked through every removal trick to find the one that actually works. The harness is in the repo; here’s the story.

The orphan

To use a memory-optimized table you must first give the database a special filegroup:

ALTER DATABASE [App] ADD FILEGROUP [MemOptFG] CONTAINS MEMORY_OPTIMIZED_DATA;
ALTER DATABASE [App] ADD FILE (NAME='MemOptContainer',
    FILENAME='...\MemOptContainer') TO FILEGROUP [MemOptFG];

That filegroup (type FX) is what hosts the In-Memory OLTP engine’s checkpoint files. Drop your memory-optimized tables and the filegroup stays — and so does the engine. Here’s the orphaned state, straight from the reproduction:

USE [App];
SELECT name, type_desc FROM sys.filegroups WHERE type = 'FX';
-- MemOptFG  MEMORY_OPTIMIZED_DATA_FILEGROUP
SELECT COUNT(*) AS inmem_tables FROM sys.tables WHERE is_memory_optimized = 1;
-- 0

Zero memory-optimized tables, but a fully armed XTP subsystem. And it’s not idle:

SELECT session_id, command, status, wait_type, cpu_time
FROM sys.dm_exec_requests WHERE command = 'XTP_OFFLINE_CKPT';
-- 14  XTP_OFFLINE_CKPT  background  WAIT_XTP_OFFLINE_CKPT_NEW_LOG  ...

That XTP_OFFLINE_CKPT background thread is the offline checkpoint worker. Its job is to tail the transaction log and harvest changes for memory-optimized tables. There are none — but it doesn’t know that, so it keeps reading the log forever. Meanwhile checkpoint files accumulate, including the telltale WAITING FOR LOG TRUNCATION state (sys.dm_db_xtp_checkpoint_files).

Why it burns CPU

The thread’s cost scales with how much log the database generates, because that’s what it scans. In the repro, a single ordinary insert into a disk-based table — 200,000 rows, nothing memory-optimized in sight — moved the thread’s cpu_time by +90 ms on its own. Multiply that by a busy OLTP database generating log around the clock and you get exactly the production pattern: the spikes were worst on the busiest servers (most log) and trivial on the quiet ones. The feature is doing real work scanning the log; it just has nothing to show for it.

You also can’t just kill it: KILL <session_id> on a system background thread returns Msg 6107 (“Only user processes can be killed”).

The wall: everything that doesn’t remove it

This is where it gets frustrating. The obvious move is to drop the filegroup. You can’t.

ALTER DATABASE [App] REMOVE FILE [MemOptContainer];
-- Msg 41802: Cannot drop the last memory-optimized container 'MemOptContainer'.

ALTER DATABASE [App] REMOVE FILEGROUP [MemOptFG];
-- Msg 5042: The filegroup 'MemOptFG' cannot be removed because it is not empty.

A container can’t be dropped if it’s the last one; the filegroup can’t be dropped while it has a container. Classic deadlock of rules. The internet’s favourite workaround is “add a second container, drop the original, then you can drop the filegroup.” It doesn’t work — I tested it end to end:

StepResult
Add MemOptContainer2OK
Remove the original MemOptContainerOK — you can drop a non-last container
Remove the now-last MemOptContainer2Msg 41802 — cannot drop the last container
Remove the filegroupMsg 5042 — still not empty

You can shuffle which container is last, but you can never get to zero. The filegroup is permanent for the life of the database.

What about going around it?

  • SET OFFLINE WITH ROLLBACK IMMEDIATE then SET ONLINE does kill the thread — and it respawns with a new session id the moment the database comes back, because the filegroup is still there. (In production this “fix” also meant kicking off 800+ active sessions; the relief lasted until the next log activity.)
  • DBCC CLONEDATABASE produces a clone that still has the FX filegroup. So does a BACKUP / RESTORE — they recreate the exact file layout. You cannot back up your way out of it.
  • Log backups + CHECKPOINT (to drain the checkpoint files) is at best partial: in the repro the WAITING FOR LOG TRUNCATION count actually grew (the backups generate their own log), and the thread kept running throughout. It doesn’t retire the engine.

This isn’t a bug to be worked around — it’s a design limitation. Once a database has a memory-optimized filegroup, that’s a one-way door.

The fix that works

The only supported way to truly remove the filegroup is to move the data into a new database that never had it:

-- a clean database: PRIMARY + LOG only, no memory-optimized filegroup
CREATE DATABASE [App_Clean] ON PRIMARY (...) LOG ON (...);
-- migrate schema (scripted WITHOUT the filegroup) + data, then verify:
SELECT COUNT(*) FROM App_Clean.sys.filegroups WHERE type = 'FX';   -- 0
SELECT COUNT(*) FROM sys.dm_exec_requests
WHERE command = 'XTP_OFFLINE_CKPT' AND database_id = DB_ID('App_Clean');   -- 0

In the repro that yields a database with zero FX filegroups and no XTP thread — the spikes are gone because the engine is gone. Then you cut over with a rename swap.

The catch is scale. For a multi-terabyte production database you can’t just SELECT … INTO and flip a switch, and — as above — backup/restore and CLONEDATABASE all drag the filegroup along. So you seed the clean database another way: stand up a fresh (no-filegroup) database and load it via Always On / log shipping or a staged bulk copy plus a short final-delta sync, so the actual cutover is seconds of downtime, not a full-size copy window. It’s a migration project, not an ALTER. That’s the real cost of the leftover.

Lessons

  • CONTAINS MEMORY_OPTIMIZED_DATA is effectively a permanent decision for the life of the database. Add it deliberately, not “to try the feature.” If you might back out, prototype in a throwaway database.
  • The XTP offline-checkpoint thread runs whether or not you have in-memory tables, and its CPU tracks your log volume — so an unused filegroup is not free on a busy server.
  • command = 'XTP_OFFLINE_CKPT' in sys.dm_exec_requests is the stable signal to look for (wait types vary by build — I saw WAIT_XTP_OFFLINE_CKPT_NEW_LOG, the incident saw PWAIT_DIRECTLOGCONSUMER_GETNEXT).
  • If you’re already stuck with one, plan the rebuild — there’s no ALTER that saves you. The dead end persists on SQL Server 2025, though the failure mode changed: instead of 2019’s instant Msg 41802, the 2025 REMOVE FILE kicks off an asynchronous In-Memory OLTP undeployment that, in my testing, hung for minutes without completing (filegroup still there; concurrent file operations then report Msg 41879, “undeployment in progress”). Same outcome, less honest about it.

Reproduced on SQL Server 2019 and 2025 Developer in Docker. The harness builds the orphaned state, runs the diagnostics, fails its way through every removal trick (capturing the exact Msg numbers), and rebuilds into a clean database — so you can see each receipt yourself.


Need help with something like this?

I take on focused SQL Server engagements — performance, migrations, and CI/CD.

Book a 30-min consult