[PTLsim-devel] Domain crashes when swithcing back and forth between native & simulation.
Ferad Zyulkyarov
Sat Oct 13 15:59:55 EDT 2007
Hi,
I have also seen the same problem but couldn't identify where the
problem is exactly.
Ferad
On 10/13/07, Stephan Diestelhorst <stephan.diestelhorst at amd.com> wrote:
> Hi,
> I have to bother you guys again...
> I have a small test-case where a programm creates a few pthreads which just
> spin and wait for some shared variable to change. That variable is changed
> (by the main thread) after a certain amount of time has elapsed and then the
> thread waits for its 'children' with pthread_join.
>
> (If you're curious, it is some stripped down tinySTM intset benchmark. It is
> available online)
>
> In order to avoid the overhead, I switch to simulation mode just after thread
> creation and have experimented with switching back before and after the join.
>
> Regardless of this however, my domain crashes when I try to invoke the testing
> tool again from command line (a script calls it and changes a command line
> parameter, which defines the number of threads to create).
>
> This is with an almost unmodified PTLsim of R219, with a few patches applied
> to increase the number of physical registers to allow 4 SMTs and most of my
> patches regarding internal stores and prefetches (posted to the list
> previously!)
>
> I use SMTcore, with defined ENABLE_SMT. This is without any dirty tricks such
> as SMP or MOESI.
>
> The first few executions of the test run fine, but usually the second / third
> one crashes the domain. I have tried to insert a sleep command between each
> subsequent executions of the benchmark, as it seemed to work when I started
> the test tool by hand, rather than using a script. Logs below...
>
> All domains have been pinned with phys_core = vcpu_id.
>
> Is this a known issue? If so, would upgrading to a newer version help?
>
> I'm wondering why PTLsim keeps printing the warnings about the state it is in.
> Something doesn't seem to be quite right there...
>
> I have tried to find out something about the addresses, Xen is complaing about
> (i.e. last fixup and RIP): Neither of them is found anywhere, ie not on the
> guest's kernel, nor inside my testtool (at least not with plain objdump,
> perhaps some relocation has taken place..)
>
> Any clues on this?
>
> Many thanks!
>
> Stephan
>
> DomU (domain being simulated):
> ----------------------------------------------
> Testing 1 cores with _empty.
> Creating thread 0
> Switching to core smt...
> DONE
> STARTING...
> Test-thread finished!
> STOPPING...
> Waiting for thread 0
> .. finished!
> Switching to native...
> DONE
>
> Testing 2 cores with _empty.
> Creating thread 0
> Creating thread 1
> Switching to core smt...
> DONE
> STARTING...
> Test-thread finished!
> STOPPING...
> Test-thread finished!
> Waiting for thread 0
> .. finished!
> Waiting for thread 1
> .. finished!
> Switching to native...
> DONE
>
> Testing 3 cores with _empty.
> Set type : linked list
> Duration : 1
> Initial size : 256
> Nb threads : 3
> Value range : 65535
> Seed : 0
> Update rate : 20
> Type sizes : int=4/long=8/ptr=8/word=8
> Initializing STM
> Adding 256 entries to set
> Set size : 256
> Creating thread 0
> Creating thread 1
> Creating thread 2
> Switching to core smt...
>
> PTLsim (logfile attached, loglevel 1):
> -----------
> ptlsim_bugfixes -domain ptlvm-smp -loglevel 1 -native
> //
> // PTLsim: Cycle Accurate x86-64 Full System SMP/SMT Simulator
> // Copyright 1999-2007 Matt T. Yourst <yourst at yourst.com>
> //
> // Revision 219 (05:35:58)
> // Built Oct 12 2007 19:10:02 on mautern.amd.com using gcc-4.1
> // Running on merill.
> //
>
> Waiting for request...
> Processing -domain ptlvm-smp -loglevel 1 -native
> Warning: invalid value 'ptlvm-smp' for option -domain; ignoring
> System Information:
> Running on hypervisor version xen-3.0-x86_64 xen-3.0-x86_32p h
> Xen is mapped at virtual address 0xffff800000000000
> PTLsim is running across 4 VCPUs:
> Physical CPU type: AMD K8 (Opteron, Athlon 64, Turion)
> VCPU 0 core frequency: 1895 MHz
> Physical CPU affinity for all VCPUs: 0 1 2 3
> Memory Layout:
> System: 4390912 pages, 17563648 KB
> Domain: 262144 pages, 1048576 KB
> PTLsim reserved: 32768 pages, 131072 KB
> Page Tables: 356 pages, 1424 KB
> PTLsim image: 575 pages, 2300 KB
> Heap: 31837 pages, 127348 KB
> Stack: 256 pages, 1024 KB
> Interfaces:
> PTLsim page table: 61762
> Shared info mfn: 4040
> Shadow shinfo mfn: 16401
> PTLsim hostcall: event channel 3
> PTLsim upcall: event channel 4
>
> Switched to native mode
> Breakout request received from native mode
> Switched to simulation mode
> Returned from switch to native: now back in sim
> Waiting for request...
> Processing -native
> Switched to native mode
> Breakout request received from native mode
> Switched to simulation mode
> Returned from switch to native: now back in sim
> Waiting for request...
> Processing -core smt -run
> Switching to simulation core 'smt'...
> Stopping after 9223372036854775807 commits
> Completed 22416358 cycles, 21375022 commits: 196266 cycles/sec,
> 176516, insns/sec: rip 0xffffffff8011918e 0xffffffff801063aa
> 0xffffffff801063aa 0xffffffff80288fde
> Stopped after 22419601 cycles and 21377523 instructions
> Completed 22419601 cycles, 21377523 commits: 10 cycles/sec,
> 8, insns/sec: rip 0xffffffff801b7d45 0xffffffff801063aa 0xffffffff801063aa
> 0x4016a7
> Waiting for request...
> Processing -native
> Switched to native mode
> Breakout request received from native mode
> Switched to simulation mode
> Returned from switch to native: now back in sim
> Waiting for request...
> Processing -native
> Switched to native mode
> Breakout request received from native mode
> Switched to simulation mode
> ptlmon: Warning: cannot switch to simulation mode: domain 1 was already in
> state 2
> Returned from switch to native: now back in sim
> Waiting for request...
> Processing -core smt -run
> Switching to simulation core 'smt'...
> Stopping after 9223372036854775807 commits
> Completed 3762928 cycles, 26188714 commits: 143183 cycles/sec,
> 168038, insns/sec: rip 0xffffffff80284ef7 0xffffffff801885fc
> 0xffffffff801063aa 0xffffffff8016a5b9
> Stopped after 3786551 cycles and 26213841 instructions
> Completed 3786551 cycles, 26213841 commits: 73 cycles/sec,
> 78, insns/sec: rip 0xffffffff80284ecd 0xffffffff801063aa 0xffffffff801063aa
> 0x4016a7
> Waiting for request...
> Processing -native
> Switched to native mode
> Breakout request received from native mode
> Switched to simulation mode
> Returned from switch to native: now back in sim
> Waiting for request...
> Processing -native
> Switched to native mode
> Breakout request received from native mode
> Switched to simulation mode
> ptlmon: Warning: cannot switch to simulation mode: domain 1 was already in
> state 2
>
> Xen
> -----
> (XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 0, rip
> 00000000004016a7, rsp 00007fffffffe1f0, ksp ffff88003fdfa000, cr3mfn 16955,
> eflags 0000000000010246
> (XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip
> 000000000004d3d0, rsp 0000000007fbcba0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007e9efd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000206
> (XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007e9ffd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000206
> (XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007ea0fd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000202
> (XEN) printk: 4 messages suppressed.
> (XEN) mm.c:1739:d1 Bad type (saw 00000000a8000001 != exp 00000000e0000000) for
> mfn 40d53c (pfn 8ad)
> (XEN) mm.c:625:d1 Error getting mfn 40d53c (pfn 8ad) from L1 entry
> 000000040d53c107 for dom1
> (XEN) mm.c:1739:d1 Bad type (saw 00000000a8000001 != exp 00000000e0000000) for
> mfn 40d53f (pfn 8aa)
> (XEN) mm.c:625:d1 Error getting mfn 40d53f (pfn 8aa) from L1 entry
> 000000040d53f107 for dom1
> (XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip
> ffffffff801063aa, rsp ffffffff803e9f50, ksp ffffffff803ea000, cr3mfn 4248815,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip
> ffffffff80143872, rsp ffff88003f5f7f08, ksp ffff88003f5f8000, cr3mfn 16956,
> eflags 0000000000000282
> (XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip
> ffffffff801063aa, rsp ffff88000000df08, ksp ffff88000000e000, cr3mfn 4248815,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 0, rip
> 00000000004017fe, rsp 00007fffffffe1f0, ksp ffff88003f2f8000, cr3mfn 16955,
> eflags 0000000000010246
> (XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip
> 000000000004d3d8, rsp 0000000007fbcba0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007e9efd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000202
> (XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007e9ffd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000202
> (XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007ea0fd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000206
> (XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip
> ffffffff80284ecd, rsp ffffffff80424eb8, ksp ffffffff803ea000, cr3mfn 16956,
> eflags 0000000000000247
> (XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip
> ffffffff801063aa, rsp ffff880000471f08, ksp ffff880000472000, cr3mfn 16956,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip
> ffffffff801063aa, rsp ffff88000000df08, ksp ffff88000000e000, cr3mfn 4248815,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 0, rip
> 00000000004016a7, rsp 00007fffffffe1f0, ksp ffff88003f2f8000, cr3mfn 16955,
> eflags 0000000000010246
> (XEN) printk: 118 messages suppressed.
> (XEN) mm.c:1739:d0 Bad type (saw 00000000e8000001 != exp 0000000080000000) for
> mfn 423b (pfn 3fde9)
> (XEN) Error installing user page table base mfn 16955 in domain 1 vcpu 0
> (XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip
> 000000000004d3d8, rsp 0000000007fbcba0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007e9efd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000206
> (XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007e9ffd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000202
> (XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007ea0fd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000202
> (XEN) mm.c:1739:d1 Bad type (saw 00000000a8000001 != exp 00000000e0000000) for
> mfn 40d53e (pfn 8ab)
> (XEN) mm.c:625:d1 Error getting mfn 40d53e (pfn 8ab) from L1 entry
> 000000040d53e107 for dom1
> (XEN) mm.c:1739:d1 Bad type (saw 0000000098000001 != exp 00000000e0000000) for
> mfn 4d2e (pfn 3f2f6)
> (XEN) mm.c:625:d1 Error getting mfn 4d2e (pfn 3f2f6) from L1 entry
> 0000000004d2e107 for dom1
> (XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip
> ffffffff801063aa, rsp ffffffff803e9f50, ksp ffffffff803ea000, cr3mfn 19758,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip
> ffffffff801063aa, rsp ffff880000471f08, ksp ffff880000472000, cr3mfn 19758,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 0, rip
> 00000000004017fe, rsp 00007fffffffe1f0, ksp ffff88003fdea000, cr3mfn 19757,
> eflags 0000000000010246
> (XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip
> ffffffff801063aa, rsp ffff88000000ff08, ksp ffff880000010000, cr3mfn 4248815,
> eflags 0000000000000246
> (XEN) arch_finish_context_swap(vcpu 0, domain 1): kernel? 1, rip
> 0000000000045000, rsp 0000000007fbd000, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000200
> (XEN) arch_finish_context_swap(vcpu 1, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007e9efd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000206
> (XEN) arch_finish_context_swap(vcpu 2, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007e9ffd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000202
> (XEN) arch_finish_context_swap(vcpu 3, domain 1): kernel? 1, rip
> 00000000000113aa, rsp 0000000007ea0fd0, ksp 0000000000000000, cr3mfn 61762,
> eflags 0000000000000206
> (XEN) domain_crash_sync called from entry.S
> (XEN) Domain 1 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-3.0-unstable x86_64 debug=n Not tainted ]----
> (XEN) Trap: invalid opcode (6)
> (XEN) Error code 00000000
> (XEN) Last fixup: rip ffff8300001402f6, cr2 ffff88000202f060
> (XEN) Guest VCPU flags: 1
> (XEN) CPU: 0
> (XEN) RIP: e033:[<0000000000051cc8>]
> (XEN) RFLAGS: 0000000000000282 CONTEXT: guest
> (XEN) rax: 0000000000000003 rbx: 0000000007fbcd10 rcx: 0000000047110ed3
> (XEN) rdx: 0000000000000003 rsi: 0000000007fbcd10 rdi: 0000000000000000
> (XEN) rbp: 0000000000000000 rsp: 0000000007fbccb8 r8: 0000000000003000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000007b0
> (XEN) cr3: 000000000f142000 cr2: 0000000000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
> (XEN) Guest stack trace from rsp=0000000007fbccb8:
> (XEN) 0000000047110ed3 0000000000000000 0000000000051cc8 000000010000e030
> (XEN) 0000000000010082 0000000007fbccf0 000000000000e02b 0000000007fbcea0
> (XEN) 000000000004e55b 0000000000000007 0000000000000040 0000002600000000
> (XEN) 000000000004da91 000000000018e1e0 0000000000000000 0000000000000004
> (XEN) 0000000000183c20 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000051ed6 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0001000000000000 0000000000000033 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000033 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>
> --
> Stephan Diestelhorst, AMD Operating System Research Center
> stephan.diestelhorst at amd.com, Tel. Â(AMD: 8-4903)
>
> AMD Saxony Limited Liability Company & Co. KG
> Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden,
> Deutschland
> Registergericht Dresden: HRA 4896
>
> vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington,
> Delaware, USA)
> Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
>
> _______________________________________________
> PTLsim-devel mailing list
> PTLsim-devel at ptlsim.org
> https://ptlsim.org/mailman/listinfo/ptlsim-devel
>
>
>
--
Ferad Zyulkyarov
Barcelona Supercomputing Center
c/ Gran Capita 2-4, Nexus I, 204
08034 Barcelona - SPAIN
More information about the PTLsim-devel mailing list