[PTLsim-devel] Possible bug with CMPXCHG8B

Mon Jun 18 09:46:55 EDT 2007

> > Thanks for pointing this out. CMPXCHG8B/16B is not a very common
> > instruction, especially in 64-bit code, so I'm not surprised we missed
> > it. I'll implement it now - expect a patch later today.
>
> Patch for CMPXCHG8B and CMPXCHG16B is below - just put it in
> decode-complex.cpp. I'll add this to the next release (which will be coming
> soon, to add full SSE3 support and various other new instructions).
>
[..]
Matt, 
  after aplying your patch, more specifically it looks like this for me:

diff -r cda7cd663389 -r c0e8a122528d decode-complex.cpp

--- a/decode-complex.cpp        Thu May 31 15:37:26 2007 +0200
+++ b/decode-complex.cpp        Mon Jun 18 15:29:03 2007 +0200
@@ -2217,6 +2217,65 @@ bool TraceDecoder::decode_complex() {
     break;
   }

+  case 0x1c7: { // cmpxchg8b/cmpxchg16b
+    DECODE(eform, rd, (rex.mode64) ? q_mode : d_mode);
+    ra = rd;
+    if (modrm.reg != 1) MakeInvalid(); // only cmpxchg8b/cmpxchg16b are valid
+    if (rd.type != OPTYPE_MEM) MakeInvalid();
+
+    int sizeincr = (rex.mode64) ? 8 : 4;
+    int sizeshift = (rex.mode64) ? 3 : 2;
+    EndOfDecode();
+
+    // cmpxchg16b
+    prefixes |= PFX_LOCK;
+    if (memory_fence_if_locked(0)) break;
+
+    /*
+
+    Microcode:
+
+    ld   t0 = [mem]
+    ld   t1 = [mem+8]
+    sub  t2 = t0,rax
+    sub  t3 = t1,rdx
+    andcc t7,flags = t2,t3
+    sel.eq t2 = t0,rbx,(t7)
+    sel.eq t3 = t1,rcx,(t7)
+    sel.eq rax = t0,rax,(t7)
+    sel.eq rdx = t1,rdx,(t7)
+    st   [mem],t2
+    st   [mem+8],t3
+
+    */
+
+    operand_load(REG_temp0, ra, OP_ld);
+    ra.mem.offset += sizeincr;
+    operand_load(REG_temp1, ra, OP_ld);
+
+    TransOp sublo(OP_sub, REG_temp2, REG_temp0, REG_rax, REG_zero, sizeshift,
+      0, 0, FLAGS_DEFAULT_ALU); sublo.nouserflags = 1; this << sublo;
+    TransOp subhi(OP_sub, REG_temp3, REG_temp1, REG_rdx, REG_zero, sizeshift,
+      0, 0, FLAGS_DEFAULT_ALU); subhi.nouserflags = 1; this << subhi;
+    this << TransOp(OP_andcc, REG_temp7, REG_temp2, REG_temp3, REG_zero,
+      sizeshift, 0, 0, FLAGS_DEFAULT_ALU);
+    { TransOp sel(OP_sel, REG_temp2, REG_temp0, REG_rbx, REG_temp7, 
sizeshift);
+      sel.cond = COND_e; this << sel; }
+    { TransOp sel(OP_sel, REG_temp3, REG_temp1, REG_rcx, REG_temp7, 
sizeshift);
+      sel.cond = COND_e; this << sel; }
+    { TransOp sel(OP_sel, REG_rax, REG_temp0, REG_rax, REG_temp7, sizeshift);
+      sel.cond = COND_e; this << sel; }
+    { TransOp sel(OP_sel, REG_rdx, REG_temp1, REG_rdx, REG_temp7, sizeshift);
+      sel.cond = COND_e; this << sel; }
+    result_store(REG_temp2, REG_temp4, rd);
+    rd.mem.offset += sizeincr;
+    result_store(REG_temp3, REG_temp5, rd);
+
+    if (memory_fence_if_locked(1)) break;
+
+    break;
+  }
+
   default: {
     MakeInvalid();
     break;

I do get the following error: 
Switching to simulation core 'ooo'...
Stopping after 9223372036854775807 commits
Trying to insert 16 transops, max is 15
Assert transbufcount < MAX_TRANSOPS_PER_USER_INSN failed in 
decode-core.cpp:506 (void TraceDecoder::put(const TransOp&)) at 153675 
cycles, 153675 iterations, 100957 user commits

The "Trying to insert..." statement was added by me. Obviously, PTLsim tries 
to insert 16 transops, however I just count 10 in the CMPXCHG8 code.. Any 
pointers? Or could I just increase MAX_TRANSOPS_PER_USER_INSN?

Thanks for any ideas and the quick patch!

Best,
  Stephan
-- 
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst at amd.com, Tel.   (AMD: 8-4903)