[PTLsim-devel] [PATCH] Re: Simulator livelocks on while aquiring the lock
Stephan Diestelhorst
Fri Nov 9 08:59:03 EST 2007
Hi,
> > This is a bug that has been in PTLsim for quite a while now. I
> > have finally gotten around finding and fixing it. Will post a
> > patch (after cleanup) soon!
>
> This is great! I've been looking for this bug for a very long time.
> Does this have something to do with total store ordering not being
> implemented correctly between threads?
no, that was one of the possibilities, but it turned out to be
slightly more subtle.
The bug occurs, because concurrent atomic (i.e. locked) RMW are not
protected correctly from one another. In general, the interlocks,
which are there for that very purpose, achieve just that,
serialisation of atomic accesses to the same location.
More precisely: The bug occurs, because one core gives up the locks on
the location halfway through commiting a RMW instruction. The release
of the locks happens by adding the locks to a buffer when the load
instruction that acquired the locks commits. That buffer
of 'releaseable' locks will be freed after the mfence following the
store of the RMW instruction is commited.
That should, as mentioned, ensure serial execution on different cores.
However, there is a nasty interplay with annulment here. If the core
(lets call it A) having committed half of the RMW instruction's uops
detects a later mispredicted jump, it will annul some uops wrongly
speculated. The jump and the annuled uops are much later than the RMW
instruction... but the way annulment works is problematic:
It first asks the to-be-annulled ROB, to put all its acquired locks
(usually none) into the buffer of locks to release and then frees the
entire buffer. Now this means that the lock held by the halfway
committed RMW instruction is now freed, too, allowing the concurrent
execuion of a RMW on another core B to grab the lock and proceed.
This stalls commit of the store uop of core A's RMW instruction until
core B has finished its atomic RMW instruction.
My patch does the following: It adds some explanatory comments (feel
free to remove / edit those) and changes annulment behaviour: Before
doing anything during annulment, the number of locks in
the 'releaseable' lock buffer is sampled and locks are only freed
starting from that position. This causes only those locks to be
freed, which were held and declared releaseable by the uop which is
actually annulled and keeps the locks of RMW instructions being
halfway committed locked.
I hope that above explanation and the patch make some sense. Please
note that I'm still with smtcore, but the bug is still present in
current svn, only naming of the files should be adapted.
Hope that helps!
Cheers,
Stephan
--
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst at amd.com, Tel. (AMD: 8-4903)
-------------- next part --------------
diff -r 358c449dadc6 -r c89f694e099d smtcore.h
--- a/smtcore.h Wed Oct 17 23:24:13 2007 +0200
+++ b/smtcore.h Fri Nov 02 16:53:26 2007 +0100
@@ -1504,7 +1505,7 @@ namespace SMTModel {
void annul_fetchq();
BasicBlock* fetch_or_translate_basic_block(const RIPVirtPhys& rvp);
void redispatch_deadlock_recovery();
- void flush_mem_lock_release_list();
+ void flush_mem_lock_release_list(byte start = 0);
int get_priority() const;
void dump_smt_state(ostream& os);
diff -r 358c449dadc6 -r c89f694e099d smtexec.cpp
--- a/smtexec.cpp Wed Oct 17 23:24:13 2007 +0200
+++ b/smtexec.cpp Fri Nov 02 16:53:26 2007 +0100
@@ -1417,11 +1421,11 @@ int ReorderBufferEntry::issueload(LoadSt
}
// Issuing more than one ld.acq on the same block is not allowed:
- if (lock) {
+ if (lock) { //&& (lock->vcpuid == thread.ctx.vcpuid)
logfile << "ERROR: thread ", thread.ctx.vcpuid, " uuid ", uop.uuid, " over physaddr ", (void*)physaddr, ": lock was already acquired by vcpuid ", lock->vcpuid, " uuid ", lock->uuid, " rob ", lock->rob, endl;
assert(false);
}
-
+ // S.D. Location NOT locked
if unlikely (uop.locked) {
//
// Attempt to acquire an exclusive lock on the block via ld.acq,
@@ -2182,7 +2186,7 @@ W64 ReorderBufferEntry::annul(bool keep_
RegisterRenameTable& commitrrt = thread.commitrrt;
int& loads_in_flight = thread.loads_in_flight;
int& stores_in_flight = thread.stores_in_flight;
-
+ byte queued_locks_before = thread.queued_mem_lock_release_count;
SMTCoreEvent* event;
int idx;
@@ -2307,8 +2311,15 @@ W64 ReorderBufferEntry::annul(bool keep_
annulrob.physreg->free();
if unlikely (isclass(annulrob.uop.opcode, OPCLASS_LOAD|OPCLASS_STORE)) {
- annulrob.release_mem_lock(true);
- thread.flush_mem_lock_release_list();
+ /* SD: This was a nasty bug: We have to be careful to not flush away any
+ locks that are about to be freed by a committing locked RMW
+ instruction, whihc takes more than one cycle to commit, but has
+ already declared the locks it wants to give up!
+ There are a few things we can do here: Only flush, if the annulrob
+ actually held locks and then only flush those locks (actually only
+ a single one) added here!*/
+ if(annulrob.release_mem_lock(true))
+ thread.flush_mem_lock_release_list(queued_locks_before);
loads_in_flight -= (annulrob.lsq->store == 0);
stores_in_flight -= (annulrob.lsq->store == 1);
annulrob.lsq->reset();
diff -r 358c449dadc6 -r c89f694e099d smtpipe.cpp
--- a/smtpipe.cpp Wed Oct 17 23:24:13 2007 +0200
+++ b/smtpipe.cpp Fri Nov 02 16:53:26 2007 +0100
@@ -102,6 +104,9 @@ void ThreadContext::flush_pipeline() {
foreach_forward(ROB, i) {
ReorderBufferEntry& rob = ROB[i];
rob.release_mem_lock(true);
+ // SD: Note, that we might actually flush halfway through a locked RMW
+ // instruction. But this is not as bad as in the annul case, as the
+ // store (the W-part) will be wiped, too!
flush_mem_lock_release_list();
rob.physreg->reset(threadid); // free all register allocated by rob:
}
@@ -1527,8 +1536,8 @@ int ThreadContext::commit() {
return rc;
}
-void ThreadContext::flush_mem_lock_release_list() {
- foreach (i, queued_mem_lock_release_count) {
+void ThreadContext::flush_mem_lock_release_list(byte from /*=0*/) {
+ for (size_t i = from; i < queued_mem_lock_release_count; i++) {
W64 lockaddr = queued_mem_lock_release_list[i];
MemoryInterlockEntry* lock = interlocks.probe(lockaddr);
@@ -1548,11 +1557,9 @@ void ThreadContext::flush_mem_lock_relea
event->threadid = ctx.vcpuid;
event->loadstore.sfr.physaddr = lockaddr >> 3;
}
-
interlocks.invalidate(lockaddr);
}
-
- queued_mem_lock_release_count = 0;
+ queued_mem_lock_release_count = from;
}
#ifdef PTLSIM_HYPERVISOR
@@ -1608,6 +1615,14 @@ int ReorderBufferEntry::commit() {
// its P (internal) bit must be set.
//
+ //
+ // SD: Above explanation is at least IMHO somewhat unsatisfactory on
+ // why flushing the locks here actually makes sense. It does, if you
+ // figure out that in order to have sth. to flush, this must be the
+ // fence exactly after the locked RMW instruction, as the lock is just
+ // added to the flush-list at the commit of the load (the R part), which
+ // will definitly happen after the commit of the preceeding fence.
+ //
if unlikely ((uop.opcode == OP_mf) && ready_to_commit() && (!load_store_second_phase)) {
fencewakeup();
thread.flush_mem_lock_release_list();
More information about the PTLsim-devel mailing list