PTLsim provides implementations for all uops in the uopimpl.cpp file. C++ templates are combined with gcc's smart inline assembler type selection constraints to translate all possible permutations (sizes, condition codes, etc) of each uop into highly optimized code. In many cases, a real x86 instruction is used at the core of each corresponding uop's implementation; code after the instruction just captures the generated x86 condition code flags, rather than having to manually emulate the same condition codes ourselves. The code implementing each uop is then called from elsewhere in the simulator whenever that uop must be executed. Note that loads and stores are implemented elsewhere, since they are too dependent on the specific core model to be expressed in this generic manner.
An additional optimization, called synthesis, is also used whenever basic blocks are translated. Each uop in the basic block is mapped to the address of a native PTLsim function in uopimpl.cpp implementing the semantics of that uop; this function pointer is stored in the synthops[] array of the BasicBlock structure. This saves us from having to use a large jump table later on, and can map uops to pre-compiled templates that avoid nearly all further decoding of the uop during execution.
PTLsim supports a wide array of command line or scriptable configuration options, described in Section 10.3. The configuration parser engine (used by both PTLsim itself and utilities like PTLstats) is in config.cpp and config.h. For PTLsim itself, each option is declared in three places:
PTLsim uses its own custom memory manager for all allocations, given its specialized constraints (particularly for PTLsim/X, which runs on the bare hardware). The PTLsim memory manager (in mm.cpp) uses three key structures.
The page allocator allocates spans of one or more virtually contiguous pages. In userspace-only PTLsim, the page allocator doesn't really exist: it simply calls mmap() and munmap(), letting the host kernel do the actual allocation. In the full system PTLsim/X, the page allocator actually works with physical pages and is based on the extent allocator (see below). The ptl_alloc_private_pages() and ptl_free_private_pages() functions should be used to directly allocate page-aligned memory (or individual pages) from this pool.
The general allocator uses the ExtentAllocator template class to allocate large objects (greater than page sized) from a pool of free extents. This allocator automatically merges free extents and can find a matching free block in O(1) time for any allocation size. The general allocator obtains large chunks of memory (typically 64 KB at once) from the page allocator, then sub-divides these extents into individual allocations.
The slab allocator maintains a pool of page-sized ``slabs'' from which fixed size objects are allocated. Each page only contains objects of one size; a separate slab allocator handles each size from 16 bytes up to 1024 bytes, in 16-byte increments. The allocator provides extremely fast allocation performance for object oriented programs in which many objects of a given size are allocated. The slab allocator also allocates one page at a time from the global page allocator. However, it maintains a pool of empty pages to quickly satisfy requests. This is the same architecture used by the Linux kernel to satisfy memory requests.
The ptl_mm_alloc() function intelligently decides from which of the two allocators (general or slab) to allocate a given sized object, based on the size in bytes, object type and caller. The standard new operator and malloc() both use this function. Similarly, the ptl_mm_free() function frees memory. PTLsim uses a special bitmap to track which pages are slab allocator pages; if a pointer falls within a slab, the slab deallocator is used; otherwise the general allocator is used to free the extent.
The memory manager implements a garbage collection mechanism with which other subsystems register reclaim functions that get called when an allocation fails. The ptl_mm_register_reclaim_handler() function serves this role. Whenever an allocation fails, the reclaim handlers are called in sequence, followed by an extent cleanup pass, before retrying the allocation. This process repeats until the allocation succeeds or an abort threshold is reached.
The reclaim function gets passed two parameters: the size in bytes of the failed allocation, and an urgency parameter. If urgency is 0, the subsystem registering the callback should do everything in its power to free all memory it owns. Otherwise, the subsystem should progressively trim more and more unused memory with each call (and increasing urgency). Under no circumstances is a reclaim handler allowed to allocate any additional memory! Doing so will create an infinite loop; the memory manager will detect this and shut down PTLsim if it is attempted.