[PTLsim-devel] Domain crashes when swithcing back and forth between native & simulation.

Stephan Diestelhorst
Sat Oct 13 13:13:05 EDT 2007


Hi,
  I have to bother you guys again...
I have a small test-case where a programm creates a few pthreads which just 
spin and wait for some shared variable to change. That variable is changed 
(by the main thread)  after a certain amount of time has elapsed and then the 
thread waits for its 'children' with pthread_join.

(If you're curious, it is some stripped down tinySTM intset benchmark. It is 
available online)

In order to avoid the overhead, I switch to simulation mode just after thread 
creation and have experimented with switching back before and after the join.

Regardless of this however, my domain crashes when I try to invoke the testing 
tool again from command line (a script calls it and changes a command line 
parameter, which defines the number of threads to create).

This is with an almost unmodified PTLsim of R219, with a few patches applied 
to increase the number of physical registers to allow 4 SMTs and most of my 
patches regarding internal stores and prefetches (posted to the list 
previously!)

I use SMTcore, with defined ENABLE_SMT. This is without any dirty tricks such 
as SMP or MOESI.

The first few executions of the test run fine, but usually the second / third 
one crashes the domain. I have tried to insert a sleep command between each 
subsequent executions of the benchmark, as it seemed to work when I started 
the test tool by hand, rather than using a script. Logs below...

All domains have been pinned with phys_core = vcpu_id.

Is this a known issue? If so, would upgrading to a newer version help?

I'm wondering why PTLsim keeps printing the warnings about the state it is in. 
Something doesn't seem to be quite right there...

I have tried to find out something about the addresses, Xen is complaing about 
(i.e. last fixup and RIP): Neither of them is found anywhere, ie not on the 
guest's kernel, nor inside my testtool (at least not with plain objdump, 
perhaps some relocation has taken place..)

Any clues on this?

Many thanks!

Stephan

DomU (domain being simulated):
----------------------------------------------
Testing 1 cores with _empty.
Creating thread 0
Switching to core smt...
DONE
STARTING...
Test-thread finished!
STOPPING...
Waiting for thread 0
.. finished!
Switching to native...
DONE

Testing 2 cores with _empty.
Creating thread 0
Creating thread 1
Switching to core smt...
DONE
STARTING...
Test-thread finished!
STOPPING...
Test-thread finished!
Waiting for thread 0
.. finished!
Waiting for thread 1
.. finished!
Switching to native...
DONE

Testing 3 cores with _empty.
Set type     : linked list
Duration     : 1
Initial size : 256
Nb threads   : 3
Value range  : 65535
Seed         : 0
Update rate  : 20
Type sizes   : int=4/long=8/ptr=8/word=8
Initializing STM
Adding 256 entries to set
Set size     : 256
Creating thread 0
Creating thread 1
Creating thread 2
Switching to core smt...

PTLsim (logfile attached, loglevel 1):
-----------
ptlsim_bugfixes -domain ptlvm-smp -loglevel 1 -native
//
//  PTLsim: Cycle Accurate x86-64 Full System SMP/SMT Simulator
//  Copyright 1999-2007 Matt T. Yourst <yourst at yourst.com>
//
//  Revision 219 (05:35:58)
//  Built Oct 12 2007 19:10:02 on mautern.amd.com using gcc-4.1
//  Running on merill.
//

Waiting for request...
Processing -domain ptlvm-smp -loglevel 1 -native
Warning: invalid value 'ptlvm-smp' for option -domain; ignoring
System Information:
  Running on hypervisor version xen-3.0-x86_64 xen-3.0-x86_32p h
  Xen is mapped at virtual address 0xffff800000000000
  PTLsim is running across 4 VCPUs:
  Physical CPU type: AMD K8 (Opteron, Athlon 64, Turion)
  VCPU 0 core frequency: 1895 MHz
  Physical CPU affinity for all VCPUs: 0 1 2 3
Memory Layout:
  System:                4390912 pages,   17563648 KB
  Domain:                 262144 pages,    1048576 KB
  PTLsim reserved:         32768 pages,     131072 KB
  Page Tables:               356 pages,       1424 KB
  PTLsim image:              575 pages,       2300 KB
  Heap:                    31837 pages,     127348 KB
  Stack:                     256 pages,       1024 KB
Interfaces:
  PTLsim page table:       61762
  Shared info mfn:          4040
  Shadow shinfo mfn:       16401
  PTLsim hostcall:                event channel    3
  PTLsim upcall:                  event channel    4

  Switched to native mode
Breakout request received from native mode
  Switched to simulation mode
Returned from switch to native: now back in sim
Waiting for request...
Processing -native
  Switched to native mode
Breakout request received from native mode
  Switched to simulation mode
Returned from switch to native: now back in sim
Waiting for request...
Processing -core smt -run
Switching to simulation core 'smt'...
Stopping after 9223372036854775807 commits
  Completed      22416358 cycles,      21375022 commits:    196266 cycles/sec,    
176516, insns/sec: rip 0xffffffff8011918e 0xffffffff801063aa 
0xffffffff801063aa 0xffffffff80288fde
Stopped after 22419601 cycles and 21377523 instructions
  Completed      22419601 cycles,      21377523 commits:        10 cycles/sec,         
8, insns/sec: rip 0xffffffff801b7d45 0xffffffff801063aa 0xffffffff801063aa 
0x4016a7
Waiting for request...
Processing -native
  Switched to native mode
Breakout request received from native mode
  Switched to simulation mode
Returned from switch to native: now back in sim
Waiting for request...
Processing -native
  Switched to native mode
Breakout request received from native mode
  Switched to simulation mode
ptlmon: Warning: cannot switch to simulation mode: domain 1 was already in 
state 2
Returned from switch to native: now back in sim
Waiting for request...
Processing -core smt -run
Switching to simulation core 'smt'...
Stopping after 9223372036854775807 commits
  Completed       3762928 cycles,      26188714 commits:    143183 cycles/sec,    
168038, insns/sec: rip 0xffffffff80284ef7 0xffffffff801885fc 
0xffffffff801063aa 0xffffffff8016a5b9
Stopped after 3786551 cycles and 26213841 instructions
  Completed       3786551 cycles,      26213841 commits:        73 cycles/sec,        
78, insns/sec: rip 0xffffffff80284ecd 0xffffffff801063aa 0xffffffff801063aa 
0x4016a7
Waiting for request...
Processing -native
  Switched to native mode
Breakout request received from native mode
  Switched to simulation mode
Returned from switch to native: now back in sim
Waiting for request...
Processing -native
  Switched to native mode
Breakout request received from native mode
  Switched to simulation mode
ptlmon: Warning: cannot switch to simulation mode: domain 1 was already in 
state 2

Xen
-----
(XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 0, rip 
00000000004016a7, rsp 00007fffffffe1f0, ksp ffff88003fdfa000, cr3mfn 16955, 
eflags 0000000000010246
(XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip 
000000000004d3d0, rsp 0000000007fbcba0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007e9efd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000206
(XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007e9ffd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000206
(XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007ea0fd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000202
(XEN) printk: 4 messages suppressed.
(XEN) mm.c:1739:d1 Bad type (saw 00000000a8000001 != exp 00000000e0000000) for 
mfn 40d53c (pfn 8ad)
(XEN) mm.c:625:d1 Error getting mfn 40d53c (pfn 8ad) from L1 entry 
000000040d53c107 for dom1
(XEN) mm.c:1739:d1 Bad type (saw 00000000a8000001 != exp 00000000e0000000) for 
mfn 40d53f (pfn 8aa)
(XEN) mm.c:625:d1 Error getting mfn 40d53f (pfn 8aa) from L1 entry 
000000040d53f107 for dom1
(XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip 
ffffffff801063aa, rsp ffffffff803e9f50, ksp ffffffff803ea000, cr3mfn 4248815, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip 
ffffffff80143872, rsp ffff88003f5f7f08, ksp ffff88003f5f8000, cr3mfn 16956, 
eflags 0000000000000282
(XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip 
ffffffff801063aa, rsp ffff88000000df08, ksp ffff88000000e000, cr3mfn 4248815, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 0, rip 
00000000004017fe, rsp 00007fffffffe1f0, ksp ffff88003f2f8000, cr3mfn 16955, 
eflags 0000000000010246
(XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip 
000000000004d3d8, rsp 0000000007fbcba0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007e9efd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000202
(XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007e9ffd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000202
(XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007ea0fd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000206
(XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip 
ffffffff80284ecd, rsp ffffffff80424eb8, ksp ffffffff803ea000, cr3mfn 16956, 
eflags 0000000000000247
(XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip 
ffffffff801063aa, rsp ffff880000471f08, ksp ffff880000472000, cr3mfn 16956, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip 
ffffffff801063aa, rsp ffff88000000df08, ksp ffff88000000e000, cr3mfn 4248815, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 0, rip 
00000000004016a7, rsp 00007fffffffe1f0, ksp ffff88003f2f8000, cr3mfn 16955, 
eflags 0000000000010246
(XEN) printk: 118 messages suppressed.
(XEN) mm.c:1739:d0 Bad type (saw 00000000e8000001 != exp 0000000080000000) for 
mfn 423b (pfn 3fde9)
(XEN) Error installing user page table base mfn 16955 in domain 1 vcpu 0
(XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip 
000000000004d3d8, rsp 0000000007fbcba0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007e9efd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000206
(XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007e9ffd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000202
(XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007ea0fd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000202
(XEN) mm.c:1739:d1 Bad type (saw 00000000a8000001 != exp 00000000e0000000) for 
mfn 40d53e (pfn 8ab)
(XEN) mm.c:625:d1 Error getting mfn 40d53e (pfn 8ab) from L1 entry 
000000040d53e107 for dom1
(XEN) mm.c:1739:d1 Bad type (saw 0000000098000001 != exp 00000000e0000000) for 
mfn 4d2e (pfn 3f2f6)
(XEN) mm.c:625:d1 Error getting mfn 4d2e (pfn 3f2f6) from L1 entry 
0000000004d2e107 for dom1
(XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip 
ffffffff801063aa, rsp ffffffff803e9f50, ksp ffffffff803ea000, cr3mfn 19758, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip 
ffffffff801063aa, rsp ffff880000471f08, ksp ffff880000472000, cr3mfn 19758, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 0, rip 
00000000004017fe, rsp 00007fffffffe1f0, ksp ffff88003fdea000, cr3mfn 19757, 
eflags 0000000000010246
(XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip 
ffffffff801063aa, rsp ffff88000000ff08, ksp ffff880000010000, cr3mfn 4248815, 
eflags 0000000000000246
(XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip 
0000000000045000, rsp 0000000007fbd000, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000200
(XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007e9efd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000206
(XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007e9ffd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000202
(XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip 
00000000000113aa, rsp 0000000007ea0fd0, ksp 0000000000000000, cr3mfn 61762, 
eflags 0000000000000206
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 1 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.0-unstable  x86_64  debug=n  Not tainted ]----
(XEN) Trap: invalid opcode (6)
(XEN) Error code 00000000
(XEN) Last fixup: rip ffff8300001402f6, cr2 ffff88000202f060
(XEN) Guest VCPU flags: 1
(XEN) CPU:    0
(XEN) RIP:    e033:[<0000000000051cc8>]
(XEN) RFLAGS: 0000000000000282   CONTEXT: guest
(XEN) rax: 0000000000000003   rbx: 0000000007fbcd10   rcx: 0000000047110ed3
(XEN) rdx: 0000000000000003   rsi: 0000000007fbcd10   rdi: 0000000000000000
(XEN) rbp: 0000000000000000   rsp: 0000000007fbccb8   r8:  0000000000003000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000007b0
(XEN) cr3: 000000000f142000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=0000000007fbccb8:
(XEN)    0000000047110ed3 0000000000000000 0000000000051cc8 000000010000e030
(XEN)    0000000000010082 0000000007fbccf0 000000000000e02b 0000000007fbcea0
(XEN)    000000000004e55b 0000000000000007 0000000000000040 0000002600000000
(XEN)    000000000004da91 000000000018e1e0 0000000000000000 0000000000000004
(XEN)    0000000000183c20 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000051ed6 0000000000000000 0000000000000000 0000000000000000
(XEN)    0001000000000000 0000000000000033 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000033 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000

-- 
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst at amd.com, Tel.   (AMD: 8-4903)

AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, 
Deutschland
Registergericht Dresden: HRA 4896

vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, 
Delaware, USA)
Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ptlsim.log_dom_breaks_after_switch.zip
Type: application/x-zip
Size: 61520 bytes
Desc: not available
Url : https://ptlsim.org/pipermail/ptlsim-devel/attachments/20071013/5d2f53ee/attachment-0001.bin 


More information about the PTLsim-devel mailing list