[PTLsim-devel] [PATCH] Re: Simulator livelocks on while aquiring the lock

Fri Nov 9 08:59:03 EST 2007

Hi,

> > This is a bug that has been in PTLsim for quite a while now. I
> > have finally gotten around finding and fixing it. Will post a
> > patch (after cleanup) soon!
>
> This is great! I've been looking for this bug for a very long time.
> Does this have something to do with total store ordering not being
> implemented correctly between threads?

no, that was one of the possibilities, but it turned out to be 
slightly more subtle.

The bug occurs, because concurrent atomic (i.e. locked) RMW are not 
protected correctly from one another. In general, the interlocks, 
which are there for that very purpose, achieve just that, 
serialisation of atomic accesses to the same location.

More precisely: The bug occurs, because one core gives up the locks on 
the location halfway through commiting a RMW instruction. The release 
of the locks happens by adding the locks to a buffer when the load 
instruction that acquired the locks commits. That buffer 
of 'releaseable' locks will be freed after the mfence following the 
store of the RMW instruction is commited.

That should, as mentioned, ensure serial execution on different cores. 
However, there is a nasty interplay with annulment here. If the core 
(lets call it  A) having committed half of the RMW instruction's uops 
detects a later mispredicted jump, it will annul some uops wrongly 
speculated. The jump and the annuled uops are much later than the RMW 
instruction... but the way annulment works is problematic:
It first asks the to-be-annulled ROB, to put all its acquired locks 
(usually none) into the buffer of locks to release and then frees the 
entire buffer. Now this means that the lock held by the halfway 
committed RMW instruction is now freed, too, allowing the concurrent 
execuion of a RMW on another core B to grab the lock and proceed. 
This stalls commit of the store uop of core A's RMW instruction until 
core B has finished its atomic RMW instruction.

My patch does the following: It adds some explanatory comments (feel 
free to remove / edit those) and changes annulment behaviour: Before 
doing anything during annulment, the number of locks in 
the 'releaseable' lock buffer is sampled and locks are only freed 
starting from that position. This causes only those locks to be 
freed, which were held and declared releaseable by the uop which is 
actually annulled and keeps the locks of RMW instructions being 
halfway committed locked.

I hope that above explanation and the patch make some sense. Please 
note that I'm still with smtcore, but the bug is still present in 
current svn, only naming of the files should be adapted.

Hope that helps!

Cheers,
  Stephan
-- 
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst at amd.com, Tel.   (AMD: 8-4903)
-------------- next part --------------
diff -r 358c449dadc6 -r c89f694e099d smtcore.h

--- a/smtcore.h Wed Oct 17 23:24:13 2007 +0200
+++ b/smtcore.h Fri Nov 02 16:53:26 2007 +0100
@@ -1504,7 +1505,7 @@ namespace SMTModel {
     void annul_fetchq();
     BasicBlock* fetch_or_translate_basic_block(const RIPVirtPhys& rvp);
     void redispatch_deadlock_recovery();
-    void flush_mem_lock_release_list();
+    void flush_mem_lock_release_list(byte start = 0);
     int get_priority() const;
 
     void dump_smt_state(ostream& os);
diff -r 358c449dadc6 -r c89f694e099d smtexec.cpp
--- a/smtexec.cpp       Wed Oct 17 23:24:13 2007 +0200
+++ b/smtexec.cpp       Fri Nov 02 16:53:26 2007 +0100
@@ -1417,11 +1421,11 @@ int ReorderBufferEntry::issueload(LoadSt
     }
 
     // Issuing more than one ld.acq on the same block is not allowed:
-    if (lock) {
+    if (lock) { //&& (lock->vcpuid == thread.ctx.vcpuid)
       logfile << "ERROR: thread ", thread.ctx.vcpuid, " uuid ", uop.uuid, " over physaddr ", (void*)physaddr, ": lock was already acquired by vcpuid ", lock->vcpuid, " uuid ", lock->uuid, " rob ", lock->rob, endl;
       assert(false);
     }
- 
+    // S.D. Location NOT locked
     if unlikely (uop.locked) {
       //
       // Attempt to acquire an exclusive lock on the block via ld.acq,
@@ -2182,7 +2186,7 @@ W64 ReorderBufferEntry::annul(bool keep_
   RegisterRenameTable& commitrrt = thread.commitrrt;
   int& loads_in_flight = thread.loads_in_flight;
   int& stores_in_flight = thread.stores_in_flight;
-
+  byte queued_locks_before = thread.queued_mem_lock_release_count;
   SMTCoreEvent* event;
 
   int idx;
@@ -2307,8 +2311,15 @@ W64 ReorderBufferEntry::annul(bool keep_
     annulrob.physreg->free();
 
     if unlikely (isclass(annulrob.uop.opcode, OPCLASS_LOAD|OPCLASS_STORE)) {
-      annulrob.release_mem_lock(true);
-      thread.flush_mem_lock_release_list();
+      /* SD: This was a nasty bug: We have to be careful to not flush away any
+         locks that are about to be freed by a committing locked RMW
+         instruction, whihc takes more than one cycle to commit, but has
+         already declared the locks it wants to give up!
+         There are a few things we can do here: Only flush, if the annulrob
+         actually held locks and then only flush those locks (actually only
+         a single one) added here!*/
+      if(annulrob.release_mem_lock(true))
+        thread.flush_mem_lock_release_list(queued_locks_before);
       loads_in_flight -= (annulrob.lsq->store == 0);
       stores_in_flight -= (annulrob.lsq->store == 1);
       annulrob.lsq->reset();
diff -r 358c449dadc6 -r c89f694e099d smtpipe.cpp
--- a/smtpipe.cpp       Wed Oct 17 23:24:13 2007 +0200
+++ b/smtpipe.cpp       Fri Nov 02 16:53:26 2007 +0100
@@ -102,6 +104,9 @@ void ThreadContext::flush_pipeline() {
   foreach_forward(ROB, i) {
     ReorderBufferEntry& rob = ROB[i];
     rob.release_mem_lock(true);
+    // SD: Note, that we might actually flush halfway through a locked RMW
+    // instruction. But this is not as bad as in the annul case, as the
+    // store (the W-part) will be wiped, too!
     flush_mem_lock_release_list();
     rob.physreg->reset(threadid); // free all register allocated by rob:
   }
@@ -1527,8 +1536,8 @@ int ThreadContext::commit() {
   return rc;
 }
 
-void ThreadContext::flush_mem_lock_release_list() {
-  foreach (i, queued_mem_lock_release_count) {
+void ThreadContext::flush_mem_lock_release_list(byte from /*=0*/) {
+  for (size_t i = from; i < queued_mem_lock_release_count; i++) {
     W64 lockaddr = queued_mem_lock_release_list[i];
 
     MemoryInterlockEntry* lock = interlocks.probe(lockaddr);
@@ -1548,11 +1557,9 @@ void ThreadContext::flush_mem_lock_relea
       event->threadid = ctx.vcpuid;
       event->loadstore.sfr.physaddr = lockaddr >> 3;
     }
-
     interlocks.invalidate(lockaddr);
   }
-
-  queued_mem_lock_release_count = 0;
+  queued_mem_lock_release_count = from;
 }
 
 #ifdef PTLSIM_HYPERVISOR
@@ -1608,6 +1615,14 @@ int ReorderBufferEntry::commit() {
   // its P (internal) bit must be set.
   //
 
+  //
+  // SD: Above explanation is at least IMHO somewhat unsatisfactory on
+  // why flushing the locks here actually makes sense. It does, if you
+  // figure out that in order to have sth. to flush, this must be the
+  // fence exactly after the locked RMW instruction, as the lock is just
+  // added to the flush-list at the commit of the load (the R part), which
+  // will definitly happen after the commit of the preceeding fence.
+  //
   if unlikely ((uop.opcode == OP_mf) && ready_to_commit() && (!load_store_second_phase)) {
     fencewakeup();
     thread.flush_mem_lock_release_list();