[PTLsim-devel] Possible bug with CMPXCHG8B
Stephan Diestelhorst
Mon Jun 18 09:46:55 EDT 2007
> > Thanks for pointing this out. CMPXCHG8B/16B is not a very common
> > instruction, especially in 64-bit code, so I'm not surprised we missed
> > it. I'll implement it now - expect a patch later today.
>
> Patch for CMPXCHG8B and CMPXCHG16B is below - just put it in
> decode-complex.cpp. I'll add this to the next release (which will be coming
> soon, to add full SSE3 support and various other new instructions).
>
[..]
Matt,
after aplying your patch, more specifically it looks like this for me:
diff -r cda7cd663389 -r c0e8a122528d decode-complex.cpp
--- a/decode-complex.cpp Thu May 31 15:37:26 2007 +0200
+++ b/decode-complex.cpp Mon Jun 18 15:29:03 2007 +0200
@@ -2217,6 +2217,65 @@ bool TraceDecoder::decode_complex() {
break;
}
+ case 0x1c7: { // cmpxchg8b/cmpxchg16b
+ DECODE(eform, rd, (rex.mode64) ? q_mode : d_mode);
+ ra = rd;
+ if (modrm.reg != 1) MakeInvalid(); // only cmpxchg8b/cmpxchg16b are valid
+ if (rd.type != OPTYPE_MEM) MakeInvalid();
+
+ int sizeincr = (rex.mode64) ? 8 : 4;
+ int sizeshift = (rex.mode64) ? 3 : 2;
+ EndOfDecode();
+
+ // cmpxchg16b
+ prefixes |= PFX_LOCK;
+ if (memory_fence_if_locked(0)) break;
+
+ /*
+
+ Microcode:
+
+ ld t0 = [mem]
+ ld t1 = [mem+8]
+ sub t2 = t0,rax
+ sub t3 = t1,rdx
+ andcc t7,flags = t2,t3
+ sel.eq t2 = t0,rbx,(t7)
+ sel.eq t3 = t1,rcx,(t7)
+ sel.eq rax = t0,rax,(t7)
+ sel.eq rdx = t1,rdx,(t7)
+ st [mem],t2
+ st [mem+8],t3
+
+ */
+
+ operand_load(REG_temp0, ra, OP_ld);
+ ra.mem.offset += sizeincr;
+ operand_load(REG_temp1, ra, OP_ld);
+
+ TransOp sublo(OP_sub, REG_temp2, REG_temp0, REG_rax, REG_zero, sizeshift,
+ 0, 0, FLAGS_DEFAULT_ALU); sublo.nouserflags = 1; this << sublo;
+ TransOp subhi(OP_sub, REG_temp3, REG_temp1, REG_rdx, REG_zero, sizeshift,
+ 0, 0, FLAGS_DEFAULT_ALU); subhi.nouserflags = 1; this << subhi;
+ this << TransOp(OP_andcc, REG_temp7, REG_temp2, REG_temp3, REG_zero,
+ sizeshift, 0, 0, FLAGS_DEFAULT_ALU);
+ { TransOp sel(OP_sel, REG_temp2, REG_temp0, REG_rbx, REG_temp7,
sizeshift);
+ sel.cond = COND_e; this << sel; }
+ { TransOp sel(OP_sel, REG_temp3, REG_temp1, REG_rcx, REG_temp7,
sizeshift);
+ sel.cond = COND_e; this << sel; }
+ { TransOp sel(OP_sel, REG_rax, REG_temp0, REG_rax, REG_temp7, sizeshift);
+ sel.cond = COND_e; this << sel; }
+ { TransOp sel(OP_sel, REG_rdx, REG_temp1, REG_rdx, REG_temp7, sizeshift);
+ sel.cond = COND_e; this << sel; }
+ result_store(REG_temp2, REG_temp4, rd);
+ rd.mem.offset += sizeincr;
+ result_store(REG_temp3, REG_temp5, rd);
+
+ if (memory_fence_if_locked(1)) break;
+
+ break;
+ }
+
default: {
MakeInvalid();
break;
I do get the following error:
Switching to simulation core 'ooo'...
Stopping after 9223372036854775807 commits
Trying to insert 16 transops, max is 15
Assert transbufcount < MAX_TRANSOPS_PER_USER_INSN failed in
decode-core.cpp:506 (void TraceDecoder::put(const TransOp&)) at 153675
cycles, 153675 iterations, 100957 user commits
The "Trying to insert..." statement was added by me. Obviously, PTLsim tries
to insert 16 transops, however I just count 10 in the CMPXCHG8 code.. Any
pointers? Or could I just increase MAX_TRANSOPS_PER_USER_INSN?
Thanks for any ideas and the quick patch!
Best,
Stephan
--
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst at amd.com, Tel. (AMD: 8-4903)
More information about the PTLsim-devel mailing list