Next: PTLsim/X: Full System SMP/SMT Up: PTLsim Classic: Userspace Linux Previous: Getting Started with PTLsim Contents

Subsections

PTLsim Classic Internals

Low Level Startup and Injection

Note: This section deals with the internal operation of the PTLsim low level code, independent of the out of order simulation engine. If you are only interested in modifying the simulator itself, you can skip this section.

Note: This section does not apply to the full system PTLsim/X; please see the corresponding sections in Part III instead.

Startup on x86-64

PTLsim is a very unusual Linux program. It does its own internal memory management and threading without help from the standard libraries, injects itself into other processes to take control of them, and switches between 32-bit and 64-bit mode within a single process image. For these reasons, it is very closely tied to the Linux kernel and uses a number of undocumented system calls and features only available in late 2.6 series kernels.

PTLsim always starts and runs as a 64-bit process even when running 32-bit threads; it context switches between modes as needed. The statically linked ptlsim executable begins executing at ptlsim_preinit_entry in lowlevel-64bit.S. This code calls ptlsim_preinit() in kernel.cpp to set up our custom memory manager and threading environment before any standard C/C++ functions are used. After doing so, the normal main() function is invoked.

The ptlsim binary can run in two modes. If executed from the command line as a normal program, it starts up in inject mode. Specifically, main() in ptlsim.cpp checks if the inside_ptlsim variable has been set by ptlsim_preinit_entry, and if not, PTLsim enters inject mode. In this mode, ptlsim_inject() in kernel.cpp is called to effectively inject the ptlsim binary into another process and pass control to it before even the dynamic linker gets to load the program. In ptlsim_inject(), the PTLsim process is forked and the child is placed under the parent's control using ptrace(). The child process then uses exec() to start the user program to simulate (this can be either a 32-bit or 64-bit program).

However, the user program starts in the stopped state, allowing ptlsim_inject() to use ptrace() and related functions to inject either 32-bit or 64-bit boot loader code directly into the user program address space, overwriting the entry point of the dynamic linker. This code, derived from injectcode.cpp (specifically compiled as injectcode-32bit.o and injectcode-64bit.o) is completely position independent. Its sole function is to map the rest of ptlsim into the user process address space at virtual address 0x70000000 and set up a special LoaderInfo structure to allow the master PTLsim process and the user process to communicate. The boot code also restores the old code at the dynamic linker entry point after relocating itself. Finally, ptlsim_inject() adjusts the user process registers to start executing the boot code instead of the normal program entry point, and resumes the user process.

At this point, the PTLsim image injected into the user process exists in a bizarre environment: if the user program is 32 bit, the boot code will need to switch to 64-bit mode before calling the 64-bit PTLsim entrypoint. Fortunately x86-64 and the Linux kernel make this process easy, despite never being used by normal programs: a regular far jump switches the current code segment descriptor to 0x33, effectively switching the instruction set to x86-64. For the most part, the kernel cannot tell the difference between a 32-bit and 64-bit process: as long as the code uses 64-bit system calls (i.e. syscall instruction instead of int 0x80 as with 32-bit system calls), Linux assumes the process is 64-bit. There are some subtle issues related to signal handling and memory allocation when performing this trick, but PTLsim implements workarounds to these issues.

After entering 64-bit mode if needed, the boot code passes control to PTLsim at ptlsim_preinit_entry. The ptlsim_preinit() function checks for the special LoaderInfo structure on the stack and in the ELF header of PTLsim as modified by the boot code; if these structures are found, PTLsim knows it is running inside the user program address space. After setting up memory management and threading, it captures any state the user process was initialized with. This state is used to fill in fields in the global ctx structure of class CoreContext: various floating point related fields and the user program entry point and original stack pointer are saved away at this point. If PTLsim is running inside a 32-bit process, the 32-bit arguments, environment and kernel auxiliary vector array (auxv) need to be converted to their 64-bit format for PTLsim to be able to parse them from normal C/C++ code. Finally, control is returned to main() to allow the simulator to start up normally.

Startup on 32-bit x86

The PTLsim startup process on a 32-bit x86 system is essentially a streamlined version of the process above (Section 11.1.1), since there is no need for the same PTLsim binary to support both 32-bit and 64-bit user programs. The injection process is very similar to the case where the user program is always a 32-bit program.

Simulator Startup

In kernel.cpp, the main() function calls init_config() to read in the user program specific configuration as described in Sections 13.2 and 10.3, then starts up the various other simulator subsystems. If one of the -excludeld or -startrip options were given, a breakpoint is inserted at the RIP address where the user process should switch from native mode to simulation mode (this may be at the dynamic linker entry point by default).

Finally, switch_to_native_restore_context() is called to restore the state that existed before PTLsim was injected into the process and return to the dynamic linker entry point. This may involve switching from 64-bit back to 32-bit mode to start executing the user process natively as discussed in Section 11.1.

After native execution reaches the inserted breakpoint thunk code, the code performs a 32-to-64-bit long jump back into PTLsim, which promptly restores the code underneath the inserted breakpoint thunk. At this point, the switch_to_sim() function in kernel.cpp is invoked to actually begin the simulation. This is done by calling simulate() in ptlsim.cpp.

At some point during simulation, the user program or the configuration file may request a switch back to native mode for the remainder of the program. In this case, the switch_to_native_restore_context() function gets called to save the statistics data store, map the PTLsim internal state back to the x86 compatible external state and return to the 32-bit or 64-bit user code, effectively removing PTLsim from the loop.

While the real PTLsim user process is running, the original PTLsim injector process simply waits in the background for the real user program with PTLsim inside it to terminate, then returns its exit code.

Address Space Simulation

PTLsim maintains the AddressSpace class as global variable asp (see kernel.cpp) to track the attributes of each page within the virtual address space. When compiled for x86-64 systems, PTLsim uses Shadow Page Access Tables (SPATs), which are essentially large two-level bitmaps. Since pages are 4096 bytes in size, each 64 kilobyte chunk of the bitmap can track 2 GB of virtual address space. In each SPAT, each top level array entry points to a chunk mapping 2 GB, such that with 131072 top level pointers, the full 48 bit virtual address space can typically be mapped with under a megabyte of SPAT chunks, assuming the address space is sparse.

When compiled for 32-bit x86 systems, each SPAT is just a 128 KByte bitmap, with one bit for each of the 1048576 4 KB pages in the 4 GB address space.

In the AddressSpace structure, there are separate SPAT tables for readable pages (readmap field), writable pages (writemap field) and executable pages (execmap field). Two additional SPATs, dtlbmap and itlbmap, are used to track which pages are currently mapped by the simulated translation lookaside buffers (TLBs); this is discussed further in Section 25.4.

When running in native mode, PTLsim cannot track changes to the process memory map made by native calls to mmap(), munmap(), etc. Therefore, at every switch from native to simulation mode, the resync_with_process_maps() function is called. This function parses the /proc/self/maps metafile maintained by the kernel to build a list of all regions mapped by the current process. Using this list, the SPATs are rebuilt to reflect the current memory map. This is absolutely critical for correct operation, since during simulation, speculative loads and stores will only read and write memory if the appropriate SPAT indicates the address is accessible to user code. If the SPATs become out of sync with the real memory map, PTLsim itself may crash rather than simply marking the offending load or store as invalid. The resync_with_process_maps() function (or more specifically, the mqueryall() helper function) is fairly kernel version specific since the format of /proc/self/maps has changed between Linux 2.6.x kernels. New kernels may require updating this function.

Debugging Hints

When adding or modifying PTLsim, bugs will invariably crop up. Fortunately, PTLsim provides a trivial way to find the location of bugs which silently corrupt program execution. Since PTLsim can transparently switch between simulation and native mode, isolating the divergence point between the simulated behavior and what a real reference machine would do can be done through binary search. The -stopinsns configuration option can be set to stop simulation before the problem occurs, then incremented until the first x86 instruction to break the program is determined.

The out of order simulator (ooocore.cpp) includes extensive debugging and integrity checking assertions. These may be turned off by default for improved performance, but they can be easily re-enabled by defining the ENABLE_CHECKS symbol at the top of ooocore.cpp, ooopipe.cpp and oooexec.cpp. Additional check functions are in the code but commented out; these may be used as well.

You can also debug PTLsim with gdb, although the process is non-standard due to PTLsim's co-simulation architecture:

Start PTLsim on the target program like normal. Notice the ThreadN is running in XX-bit mode message printed at startup: this is the PID you will be debugging, not the ``ptlsim'' process that may also be running.
Start GDB and type ``attach 12345'' if 12345 was the PID listed above
Type ``symbol-file ptlsim'' to load the PTLsim internal symbols (otherwise gdb only knows about the benchmark code itself). You should specify the full path to the PTLsim executable here.
You're now debugging PTLsim. If you run the ``bt'' command to get a backtrace, it should show the PTLsim functions starting at address 0x70000000.

If the backtrace does not display enough information, go to the Makefile and enable the "no optimization" options (the "-O0" line instead of "-O99") since that will make more debugging information available to you.

The ``-pause-at-startup seconds'' configuration option may be useful here, to give you time to attach with a debugger before starting the simulation.

Timing Issues

PTLsim uses the CycleTimer class extensively to gather data about its own performance using the CPU's timestamp counter. At startup in superstl.cpp, the CPU's maximum frequency is queried from the appropriate Linux kernel sysfs node (if available) or from /proc/cpuinfo if not. Processors which dynamically scale their frequency and voltage in response to load (like all Athlon 64 and K8 based AMD processors) require special handling. It is assumed that the processor will be running at its maximum frequency (as reported by sysfs) or a fixed frequency (as reported by /proc/cpuinfo) throughout the majority of the simulation time; otherwise the timing results will be bogus.

External Signals and PTLsim

PTLsim can be forced to switch between native mode and sequential mode by sending it standard Linux-style signals from the command line. If your program is called ``myprogram'', start it under PTLsim and run this command from another terminal:

: killall -XCPU myprogram

This will force PTLsim to switch between native mode and simulation mode, depending on its current mode. It will print a message to the console and the logfile when you do this. The initial mode (native or simulation) is determined by the presence of the -trigger option: with -trigger, the program starts in native mode until the trigger point (if any) is reached.

Next: PTLsim/X: Full System SMP/SMT Up: PTLsim Classic: Userspace Linux Previous: Getting Started with PTLsim Contents

Matt T Yourst 2007-09-26