Why devmem2 is a Loaded Gun: Solving RCU Stalls with mmreg

In embedded Linux, devmem2 is the standard “hammer” for toggling registers. It mmaps /dev/mem for raw access. However, on modern SoCs, this tool is a “blind” operator. If you access an address the kernel hasn’t initialized at the bus level, the system doesn’t just error out—it deadlocks. The most frustrating aspect of a devmem2 crash is the lack of immediate feedback. There are no “segmentation faults” and no immediate “kernel oops.” The system simply stops. To understand why, we have to look at the difference between a Software Exception and a Hardware Deadlock.

sudo devmem2 0xA0000000 w

[16864.533687] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[16864.539802] rcu: 0-...!: (1 GPs behind) idle=1614/1/0x4000000000000000 softirq=508243/508244 fqs=950
[16864.549026] rcu: (detected by 1, t=60002 jiffies, g=818749, q=103 ncpus=4)
[16874.556802] rcu: rcu_preempt kthread timer wakeup didn't happen for 55965 jiffies! g818749 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[16874.568371] rcu: Possible timer handling issue on cpu=2 timer-softirq=250813
[16874.575506] rcu: rcu_preempt kthread starved for 55968 jiffies! g818749 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
[16874.586121] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[16874.595252] rcu: RCU grace-period kthread stack dump:
[16874.600422] rcu: Stack dump where RCU GP kthread last ran:
...

The “Silent Death”: Anatomy of a Hardware Deadlock

When you access an invisible address (one not present in the system’s current device tree or memory map), you trigger a failure that no software can log. It can happen even by a typo. mmap is blind—it doesn’t return an error, but issues start when read_volatile is called.

The Request: The CPU issues a Load instruction for a physical address.
The Black Hole: Because the address isn’t registered, the AXI interconnect has no routing path. The request hits a “closed gate” at the bridge.
The Absolute Deadlock: Because the CPU is physically waiting for a hardware handshake, it is stuck in an instruction that cannot complete.
- No Interrupts: The core cannot stop to handle timer interrupts.
- No Context Switching: The scheduler cannot move the “stuck” devmem2 process to the background.
- No Logging: Since the core is frozen, it cannot execute the code required to write a “Bus Error” message to the system log.

The kernel doesn’t even know something is wrong yet. It is not a software crash; it is a hardware-level infinite wait. The system stays perfectly silent while dead.

The RCU Stall - The Only Event: The only reason we get any information at all is because of the other cores in the SoC. In a multi-core system, Core 0 might be frozen, but Core 1 is still running. Eventually, Core 1 notices that Core 0 has failed to check in for its periodic “heartbeat” (the RCU Grace Period).

Only then—usually 20 to 60 seconds later—does the healthy core realize the system is deadlocked and trigger the RCU Stall message. By the time you see that event, the “crime” happened a long time ago, and the system is already unrecoverable.

Software Vs Hardware Fault

In a standard software crash (like a segfault), fork() is a great safety net because the kernel can kill the child process without affecting the parent. But when dealing with memory-mapped hardware, you aren’t just dealing with software—you are dealing with the system bus.

`fork()` fails to protect

The hardware deadlock we’re discussing happens at the Electrical/Logic layer, which sits “underneath” the concept of processes.

The Shared Bus: Both the parent and the child process use the same physical CPU cores and the same AXI bus.
The Deadlock is Total: When the child process executes the read_volatile instruction to an invalid address, the CPU core issues a bus request. If the bridge is closed, the hardware logic on that CPU core stalls waiting for a response.
The Kernel is Part of the Deadlock: Since the CPU core is physically waiting for a hardware handshake, it cannot switch back to the kernel to “kill” the child. The scheduler is frozen.
The Result: The entire core is locked. Because the kernel relies on all cores to be responsive for things like RCU synchronization and memory management, the whole OS eventually grinds to a halt (the RCU Stall).

When a program accesses invalid virtual memory, the CPU triggers a Page Fault. The kernel catches this “event,” logs it, and kills the process. This is a software-controlled failure.

However, when you access a physical address behind a closed AXI bridge, the failure happens at the Electrical/Logic layer. The CPU issues a “Read Request” on the physical wires and sits in a “Wait State,” expecting a “Valid” or “Error” signal from the interconnect. Because the bridge is uninitialized, it never sends any signal back.

Software vs. Hardware “Crashes”

Feature	Software Crash (Segfault)	Hardware Deadlock (AXI Hang)
Detection	Detected by CPU Memory Management Unit (MMU)	No detection; the bus simply never responds
Isolation	Process-level isolation (Signal 11)	No isolation; the physical bus is shared
Fork Strategy	Works: Parent stays alive if child dies	Fails: The bus hang freezes the CPU core itself
Recovery	Instant (Kernel cleans up process)	None (Requires a hardware reset/reboot)

The `mmreg` Solution: “Hardware-Aware” Access

The mmreg tool solves this (version >= 0.1.2) by acting as a Software Address Decoder. Before it ever touches the hardware, it validates the user’s input against the system’s “blueprints”—the Device Tree and the Kernel Resource Map.

The Multi-Layer Validation

mmreg doesn’t just check if an address “looks” okay; it cross-references kernel subsystems to ensure the path to that memory is actually open:

The Resource Map (/proc/iomem): It checks if the kernel has already “claimed” or acknowledged the memory range.
The Object Model (/sys/bus/platform/devices/): It crawls the system’s platform devices to find matching memory ranges (reg properties) that the kernel knows about.
Device State Validation: It checks for dependency blockers like the waiting_for_supplier file that indicate a device is registered but not yet active.

Handling “Zombie” Devices

One of the most dangerous states in Linux is a device that is registered but not active. You might see a device in /sys/bus/platform/devices/, but if its waiting_for_supplier file is present, it means the device is in a dependency loop. It is waiting for a clock, power domain, or another driver to wake it up.

Standard devmem2 will attempt to read this “ghost” memory, resulting in a system hang. mmreg identifies the presence of these system-level blockers and warns the user that the hardware is not yet in a “Ready” state.

Performance Without the Risk

mmreg doesn’t sacrifice performance for safety. It simply adds a “Pre-Flight Check” that runs once at startup:

Search: It looks for a device in sysfs or iomem that “owns” the user’s target address.
Verify: If a match is found, the address is deemed Legit. mmreg proceeds to mmap and perform high-speed volatile access.
Warn: If no match is found, mmreg warns the user: “No device found at this address. The bus gate is likely closed. Accessing this may hang the system.”

This allows developers to still perform “blind” probes using a –force flag, but prevents 99% of accidental crashes caused by typos or trying to access FPGA logic before a Device Tree Overlay (DTBO) has been loaded.

In CLI mode performance impact will not be observable, and when used as a library, it performs a pre-flight check once at initialization.

Conclusion

Direct memory access shouldn’t be Russian roulette. By verifying that a hardware “blueprint” exists in the kernel before touching the bus, mmreg transforms a dangerous operation into a professional, predictable workflow. mmreg doesn’t sacrifice power; it provides Informed Consent. It turns a silent hardware deadlock into a clean, readable error message.

If the kernel can’t see it, mmreg won’t touch it.

Knowledge Base

Library

Tools