A series of posts about QEMU internals:
In this post we’ll have a glance at high level memory organisation in QEMU: memory regions (MR).
We won’t cover address spaces, because we usually manage memory regions directly. However, have a look at docs/devel/memory and the code (include/exec/memory.h) for more details.
An high level external presentation of memory organisation is available there. You will also find a very interesting internal documentation at docs/devel/loads-stores. This is an enumeration of the available QEMU APIs for accessing memory.
When you want to play with memory regions in QEMU, you can either:
cpu_physical_memory_rw()
to safely access the regionIn the blog post dedicated to the TCG, we will exactly see how translated instructions access VM memory and how we can intercept at this level.
Below is the tree of available memory regions once a PowerPC 40p
board is ready. As you can see, memory regions can contain other
memory regions (called subregions
). This a clean way to organize
memory. Each memory region has its own properties and is attached to a
kind of view called the address space
.
$ qemu-system-ppc -M 40p -s -S -monitor stdio
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) info mtree
address-space: memory
0000000000000000-ffffffffffffffff (prio 0, i/o): system
[...]
0000000080000000-00000000807fffff (prio 1, i/o): pci-io-non-contiguous
0000000080000000-00000000bf7fffff (prio 0, i/o): pci-io
0000000080000000-0000000080000007 (prio 0, i/o): dma-chan
0000000080000008-000000008000000f (prio 0, i/o): dma-cont
0000000080000020-0000000080000021 (prio 0, i/o): pic
0000000080000040-0000000080000043 (prio 0, i/o): pit
0000000080000061-0000000080000061 (prio 0, i/o): pcspk
0000000080000064-0000000080000064 (prio 0, i/o): i8042-cmd
00000000800002f8-00000000800002ff (prio 0, i/o): serial
[...]
0000000080000cf8-0000000080000cfb (prio 0, i/o): pci-conf-idx
0000000080000cfc-0000000080000cff (prio 0, i/o): pci-conf-data
0000000080800000-0000000080bfffff (prio 0, i/o): pciio
00000000c0000000-00000000feffffff (prio 0, i/o): pci-memory
00000000c00a0000-00000000c00bffff (prio 1, i/o): vga-lowmem
00000000fff00000-00000000ffffffff (prio 0, rom): bios
address-space: cpu-memory-0
0000000000000000-ffffffffffffffff (prio 0, i/o): system
[...]
00000000fff00000-00000000ffffffff (prio 0, rom): bios
memory-region: pci-memory
00000000c0000000-00000000feffffff (prio 0, i/o): pci-memory
00000000c00a0000-00000000c00bffff (prio 1, i/o): vga-lowmem
memory-region: system
0000000000000000-ffffffffffffffff (prio 0, i/o): system
[...]
00000000fff00000-00000000ffffffff (prio 0, rom): bios
Default memory regions and address spaces are created by QEMU. The
most important is the system memory region
which is created by
memory_map_init()
from
cpu_exec_init_all()
.
It can be seen as the top level one, and usually subregions
are
added to the system memory region
.
This might be one of the most desired things when creating a new
machine : get RAM and load a firmware. The correct function to invoke
is
memory_region_allocate_system_memory()
.
If we look at some other board implementations, for instance the
MIPS
R4K
:
void mips_r4k_init(MachineState *machine)
{
MemoryRegion *sys_mem = get_system_memory();
MemoryRegion *ram = g_new(MemoryRegion, 1);
memory_region_allocate_system_memory(ram, NULL, "mips_r4k.ram", ram_size);
memory_region_add_subregion(sys_mem, 0, ram);
...
}
A new memory region for the RAM is created and directly added as a
subregion
of the system memory region
. From that point, accessing
physical addresses 0
to ram_size
will hit the VM memory.
The memory_region_allocate_system_memory
function actually calls
memory_region_init_ram_shared_nomigrate()
from the QEMU memory API. It encapsulates RAM specific memory region
initialization.
The QEMU memory API allows you to create memory regions backed by file descriptors, already allocated host buffers and callbacks as we will see for IOs.
Getting back to our simple MIPS board example:
void mips_r4k_init(MachineState *machine)
{
...
MemoryRegion *iomem = g_new(MemoryRegion, 1);
memory_region_init_io(iomem, NULL, &mips_qemu_ops, NULL, "mips-qemu", 0x10000);
memory_region_add_subregion(sys_mem, 0x1fbf0000, iomem);
}
A new memory region iomem
is created with
memory_region_init_io()
and also added as a subregion
of the system memory
. This region is
not of RAM but IO type and has a special
MemoryRegionOps
argument.
static const MemoryRegionOps mips_qemu_ops = {
.read = mips_qemu_read,
.write = mips_qemu_write,
.endianness = DEVICE_NATIVE_ENDIAN,
};
static void mips_qemu_write (void *opaque, hwaddr addr,
uint64_t val, unsigned size)
{
if ((addr & 0xffff) == 0 && val == 42)
qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
...
}
static uint64_t mips_qemu_read (void *opaque, hwaddr addr,
unsigned size)
{
return 0;
}
IO memory regions expose devices memory. They usually need special
interpretation during read/write accesses to simulate the expected
device behavior. Using MemoryRegionOps
callback helps you implement
device operations.
In the previous example, the iomem
region is mapped at 0x1fbf0000 -
0x1fc00000
. Whenever the VM accesses this memory range, the
read/write callbacks will be called. The addr
argument is an offset
from the beginning of the related memory region.
So doing something like *(u8*)0x1fbf0000 = 42
on this MIPS board
triggers a system reset.
Sometimes you may like to combine both for a specific device. Part of the device memory has no special meaning and is used for transactions, while a small range might be interpreted as device registers.
static void cpiom_sdram_init(Object *obj)
{
cpiom_sdram_state_t *s = CPIOM_SDRAM(obj);
SysBusDevice *d = SYS_BUS_DEVICE(obj);
MemoryRegion *sysmem = get_system_memory();
memory_region_allocate_system_memory(&s->ram, obj,
CPIOM_SDRAM_NAME,
CPIOM_MMAP_SDRAM_SIZE);
memory_region_init_io(&s->reg, obj, &sdram_reg_ops, s,
CPIOM_SDRAM_NAME"-reg",
CPIOM_MMAP_SDRAM_REG_SIZE);
memory_region_init_io(&s->pcode.mmio, obj, &sdram_pcode_ops, s,
CPIOM_SDRAM_NAME"-pcode",
CPIOM_MMAP_SDRAM_PCODE_SIZE);
memory_region_init_io(&s->pdata.mmio, obj, &sdram_pdata_ops, s,
CPIOM_SDRAM_NAME"-pdata",
CPIOM_MMAP_SDRAM_PDATA_SIZE);
memory_region_init_io(&s->raf, obj, &sdram_raf_ops, s,
CPIOM_SDRAM_NAME"-raf", 1*4);
memory_region_add_subregion(sysmem, 0, &s->ram);
sysbus_init_mmio(d, &s->reg);
memory_region_add_subregion(sysmem,
CPIOM_MMAP_SDRAM_PCODE, &s->pcode.mmio);
memory_region_add_subregion(sysmem,
CPIOM_MMAP_SDRAM_PDATA, &s->pdata.mmio);
sysbus_init_irq(d, &s->irq_viol);
sysbus_init_irq(d, &s->irq_err);
}
In this more complex example, the CPIOM SDRAM is a fully fledged QEMU
device object with several memory regions. Some memory regions (reg
and raf
) are not directly added as subregions to the system
memory. We will explain why in the next post dedicated to device
creation.
The CPIOM SDRAM controller offers protection for code and data, as well as refresh rate tunning. It is also able to raise interrupt requests.