[PTLsim-devel] Issues with multiple OOO-cores

Thu Aug 23 04:32:03 EDT 2007

(this time with correct sender)
Hi Matt,
  thanks for the detailed reply!

> > Probably a small warning banner and a modification of the tutorial (which
> > uses ooocore in PTLsim/X for the initial walkthrough) would be helpful
> > here?
> >
>
> OK, I'll add that. I can just disable ooocore when running in full system mode
> with more than one VCPU.
>
Either way, some notice would be very nice, thanks!


> > This would mean to get rid of (disable) all SMT stuff, such as
> > fetch-priority calculation etc. and probably some care at the mapping of
> > Xen's VCPUs to cores and the transition between them, which I hoped to
> > avoid by doing the port from smt to ooo.
> >
>
> I think the thread priority logic already gets turned off when only one VCPU
> is enabled. If you comment out "#define ENABLE_SMT" in smtcore.h, it will do
> what you want, without actually removing any code.

I have started hacking already yesterday and now have some working
patch which allows multiple single-threaded SMT cores, see attachment.
I'd be very glad to hear any coments on sanity and such.

Basic idea: Add a corecounter to the model and loop through all the
cores at the appropriate places, instead of just using core[0]. This
is triggered by ENABLE_SMT, if defined, every VCPU will get its own
SMT and if undefined, every VCPU will get its own core with a single
hardware thread.

I've found and fixed some problems on my way through the code:

-using default sizes for the register file, the SMT model (with
multiple hardware threads) doesn't support more than three cores, as
the register file is too small for the architectural registers ->
scale PHYS_REG_FILE_SIZE with the number of threads and adapt
MAX_PHYS_REG_FILE_SIZE accordingly.

-the assertion which should actually catch the above problem
(smtpipe.cpp:220) is actually ignored, some general assertion
treatment would be nice, as in some areas they are enabled and in some
disabled :(

-using the smt model without defining ENABLE_SMT seems to be possible,
also when we have multiple VCPUs! Haven't tried it, but IMHO is a bug.
-> Fixed, as we will create multiple cores with the new patch.

-on large testmachines (mine has 16 GB physmem, 8 cores) PTLsim's
reserved memory of 32 MB is insufficient for some internal data
(supposedly pagetables) -> second patch increases (rather arbitrarily)
to 128 MB, this is actually quite trivial!

-the number of SMT_threads is limited to various numbers:
 4 by MAX_THREADS_PER_CORE
16 by MAX_THREADS_BITS
 However, as far as I can see, none of these values is actually
checked against the number of VCPUs n the domain. This is just an
observation and has not yet been fixed.

I hope that these patches make some sense, I'd be happy to hear
suggestions and comments!

Thanks,
  Stephan

P.S.: More on MESI later!



-- 
Stephan Diestelhorst
Student Informatik(Diplom)
Student Partner TU Dresden
Microsoft Student Program
Telefon: 0176 / 20075918
E-Mail: stephan.diestelhorst at studentprogram.de
-------------- next part --------------
diff -r e29648cf074f smtcore-amd-k8.h

--- a/smtcore-amd-k8.h  Mon Aug 20 12:37:38 2007 +0200
+++ b/smtcore-amd-k8.h  Tue Aug 21 16:55:55 2007 +0200
@@ -1590,7 +1590,7 @@ namespace SMTModel {
     void check_rob();
   };
 
-#define MAX_SMT_CORES 1
+#define MAX_SMT_CORES 32
 
   struct SMTMachine: public PTLsimMachine {
     SMTCore* cores[MAX_SMT_CORES];
diff -r e29648cf074f smtcore.cpp
--- a/smtcore.cpp       Mon Aug 20 12:37:38 2007 +0200
+++ b/smtcore.cpp       Wed Aug 22 23:22:44 2007 +0200
@@ -152,7 +152,7 @@ void SMTCore::reset() {
   caches.callback = &cache_callbacks;
   setzero(robs_on_fu);
   foreach_issueq(reset(coreid));
-  
+
   reserved_iq_entries = (int)sqrt(ISSUE_QUEUE_SIZE / MAX_THREADS_PER_CORE);
   assert(reserved_iq_entries && reserved_iq_entries < ISSUE_QUEUE_SIZE);
 
@@ -1646,14 +1646,26 @@ SMTMachine::SMTMachine(const char* name)
 //
 
 bool SMTMachine::init(PTLsimConfig& config) {
+#ifdef ENABLE_SMT
   // Note: we only create a single core for all contexts for now.
   cores[0] = new SMTCore(0, *this);
+  corecount = 1;
+#else
+  corecount = 0;
+#endif
 
   foreach (i, contextcount) {
+#ifdef ENABLE_SMT
     SMTCore& core = *cores[0];
+#else
+    cores[corecount] = new SMTCore(corecount, *this);
+    SMTCore& core    = *cores[corecount];
+    corecount++;
+#endif
+
+    ThreadContext* thread = new ThreadContext(core, core.threadcount, contextof(i));
+    core.threads[core.threadcount] = thread;
     core.threadcount++;
-    ThreadContext* thread = new ThreadContext(core, i, contextof(i));
-    core.threads[i] = thread;
     thread->init();
 
     //
@@ -1665,7 +1677,7 @@ bool SMTMachine::init(PTLsimConfig& conf
     //
   }
 
-  cores[0]->init();
+  foreach (i, corecount) cores[i]->init();
   init_luts();
   return true;
 }
@@ -1682,14 +1694,16 @@ int SMTMachine::run(PTLsimConfig& config
   // All VCPUs are running:
   stopped = 0;
 
-  cores[0]->reset();
-  cores[0]->flush_pipeline_all();
-
-  logfile << "IssueQueue states:", endl;
-
-  if unlikely (config.event_log_enabled && (!cores[0]->eventlog.start)) {
-    cores[0]->eventlog.init(config.event_log_ring_buffer_size);
-    cores[0]->eventlog.logfile = &logfile;
+  foreach (i, corecount) {
+    cores[i]->reset();
+    cores[i]->flush_pipeline_all();
+
+    logfile << "IssueQueue states:", endl;
+
+    if unlikely (config.event_log_enabled && (!cores[i]->eventlog.start)) {
+      cores[i]->eventlog.init(config.event_log_ring_buffer_size);
+      cores[i]->eventlog.logfile = &logfile;
+    }
   }
 
   bool exiting = false;
@@ -1704,30 +1718,34 @@ int SMTMachine::run(PTLsimConfig& config
     update_progress();
     inject_events();
 
-    SMTCore& core =* cores[0]; // only one core for now
-    foreach (i, core.threadcount) {
-      ThreadContext* thread = core.threads[i];
+    foreach (j, corecount) {
+      SMTCore& core =* cores[j];
+      foreach (i, core.threadcount) {
+        ThreadContext* thread = core.threads[i];
 #ifdef PTLSIM_HYPERVISOR
-      if unlikely (!thread->ctx.running) {
-        if unlikely (stopping) {
-          // Thread is already waiting for an event: stop it now
-          logfile << "[vcpu ", thread->ctx.vcpuid, "] Already stopped at cycle ", sim_cycle, endl;
-          stopped[thread->ctx.vcpuid] = 1;
-        } else {
-          if (thread->ctx.check_events()) thread->handle_interrupt();
+        if unlikely (!thread->ctx.running) {
+          if unlikely (stopping) {
+            // Thread is already waiting for an event: stop it now
+            logfile << "[vcpu ", thread->ctx.vcpuid, "] Already stopped at cycle ", sim_cycle, endl;
+            stopped[thread->ctx.vcpuid] = 1;
+          } else {
+            if (thread->ctx.check_events()) thread->handle_interrupt();
+          }
+          continue; /* NB, SD: Back to foreach (i, core.threadcount), that doesn't make much sense in the original impl, either! */
         }
-        continue;
+#endif
       }
-#endif
-    }
-
-    exiting |= core.runcycle();
+
+      exiting |= core.runcycle();
+    }
 
     if unlikely (check_for_async_sim_break() && (!stopping)) {
       logfile << "Waiting for all VCPUs to reach stopping point, starting at cycle ", sim_cycle, endl;
       // force_logging_enabled();
-      SMTCore& core =* cores[0];
-      foreach (i, core.threadcount) core.threads[i]->stop_at_next_eom = 1;
+      foreach (j, corecount) {
+        SMTCore& core =* cores[j];
+        foreach (i, core.threadcount) core.threads[i]->stop_at_next_eom = 1;
+      }
       if (config.abort_at_end) {
         config.abort_at_end = 0;
         logfile << "Abort immediately: do not wait for next x86 boundary nor flush pipelines", endl;
@@ -1752,19 +1770,20 @@ int SMTMachine::run(PTLsimConfig& config
 
   logfile << "Exiting SMT mode at ", total_user_insns_committed, " commits, ", total_uops_committed, " uops and ", iterations, " iterations (cycles)", endl;
 
-  SMTCore& core =* cores[0]; /// only one core for now.
-
-  foreach (i, core.threadcount) {
-    ThreadContext* thread = core.threads[i];
-
-    thread->core_to_external_state();
-
-    if (logable(6) | ((sim_cycle - thread->last_commit_at_cycle) > 1024) | config.dump_state_now) {
-      logfile << "Core State at end for thread ", thread->threadid, ": ", endl;
-      logfile << thread->ctx;
-    }
-  }
-
+  foreach (j, corecount) {
+    SMTCore& core =* cores[j]; 
+
+    foreach (i, core.threadcount) {
+      ThreadContext* thread = core.threads[i];
+
+      thread->core_to_external_state();
+
+      if (logable(6) | ((sim_cycle - thread->last_commit_at_cycle) > 1024) | config.dump_state_now) {
+        logfile << "Core State at end for thread ", thread->threadid, ": ", endl;
+        logfile << thread->ctx;
+      }
+    }
+  }
   config.dump_state_now = 0;
 
   dump_state(logfile);
diff -r e29648cf074f smtcore.h
--- a/smtcore.h Mon Aug 20 12:37:38 2007 +0200
+++ b/smtcore.h Wed Aug 22 23:22:44 2007 +0200
@@ -22,7 +22,7 @@
 // threaded mode.
 //
 
-//#define ENABLE_SMT
+#define ENABLE_SMT
 
 static const int MAX_THREADS_BIT = 4; // up to 16 threads
 static const int MAX_ROB_IDX_BIT = 12; // up to 4096 ROB entries
@@ -271,11 +271,14 @@ namespace SMTModel {
   
   const int MAX_ISSUE_WIDTH = 4;
   
+  const int PHYS_REG_FILE_SIZE = 128*MAX_THREADS_PER_CORE;
+  const int PHYS_REG_NULL = 0;
   // Largest size of any physical register file or the store queue:
-  const int MAX_PHYS_REG_FILE_SIZE = 256;
-  const int PHYS_REG_FILE_SIZE = 128;
-  const int PHYS_REG_NULL = 0;
-  
+  /* S.D.: getting this maximum of constants during compile-time doesn't work :(
+    static const int _tmp = max(STQ_SIZE * MAX_THREADS_PER_CORE, MAX_BRANCHES_IN_FLIGHT * MAX_THREADS_PER_CORE);
+    const int MAX_PHYS_REG_FILE_SIZE = max(PHYS_REG_FILE_SIZE, _tmp);
+   */
+  const int MAX_PHYS_REG_FILE_SIZE = 128*MAX_THREADS_PER_CORE;
   //
   // IMPORTANT! If you change this to be greater than 256, you MUST
   // #define BIG_ROB below to use the correct associative search logic
@@ -1618,10 +1621,15 @@ namespace SMTModel {
     void check_rob();
   };
 
-#define MAX_SMT_CORES 1
+#ifdef ENABLE_SMT
+  #define MAX_SMT_CORES 1
+#else
+  #define MAX_SMT_CORES 32
+#endif
 
   struct SMTMachine: public PTLsimMachine {
     SMTCore* cores[MAX_SMT_CORES];
+    int corecount;
     bitvec stopped;
     SMTMachine(const char* name);
     virtual bool init(PTLsimConfig& config);
diff -r e29648cf074f smtpipe.cpp
--- a/smtpipe.cpp       Mon Aug 20 12:37:38 2007 +0200
+++ b/smtpipe.cpp       Wed Aug 22 23:22:44 2007 +0200
@@ -127,7 +127,7 @@ void ThreadContext::flush_pipeline() {
       obj->reset(threadid);
     }     
   }
-  
+
   reset_fetch_unit(ctx.commitarf[REG_rip]);
   rob_states.reset();
 
-------------- next part --------------
diff -r 64ec1a6981c8 -r e29648cf074f ptlmon.cpp
--- a/ptlmon.cpp        Fri Jul 06 15:01:02 2007 +0200
+++ b/ptlmon.cpp        Mon Aug 20 12:37:38 2007 +0200
@@ -1836,7 +1836,7 @@ int main(int argc, char** argv) {
   assert(sizeof(Context) == PAGE_SIZE);
 
   // 32 MB default:
-  W64 ptlsim_reserved_mb = 32;
+  W64 ptlsim_reserved_mb = 128;
   const char* domain_name = null;
 
   foreach (i, argc) {