[PTLsim-devel] Issues with multiple OOO-cores

Mon Aug 20 22:12:09 EDT 2007

On Monday 20 August 2007 14:37, Stephan Diestelhorst wrote:
> Hi again,
>   during testing and code review I 've found that it is possible to use
> the 'ooocore' with multiplie cores in PTLsim/X.
>
> As far as I can see,  ooocore  treats fences and locked memory accesses (as
> in SWAP for example), as no-ops and ignores any side-effects caused by
> them. This is fine for single-core usage, but can IMHO create trouble in a
> multi-core scenario (which actually exists) if one has a multi-cpu domU and
> uses the ooo model, as the 'instant visibility' consistency model still
> allows shuffling of the loads on a core, which could cause inconsistent
> reads from data written by another core.
>
> This might be one of the reasons for using the smtcore, where those
> instructions actually have the expected effects: fences cause ordering
> constraints on memops and locked memops actually take a lock for the
> duration of the x86 op on the modfied address. But then again uops do
> contend for functional units and cache entries in the smtcore.
>
> Is my understanding correct so far?
>

Yes, that's correct - ooocore does not support more than one VCPU since it 
lacks locking support. Since the locking is so complicated (for deadlock 
reasons), We only support this in the SMT core, rather than having to 
maintain the same code in two places. SMT threads need this support anyway, 
independent of any cache coherence based line locking.

At some point we want to remove ooocore from the code base, but since 
apparently many people still depend on it, we decided not to do this, and we 
are trying not to make major changes to the ooocore code for this reason.

The bottom line is this:

- always use smtcore for full system PTLsim, even with 1 VCPU
- always use ooocore for userspace PTLsim

This applies even if you only have one VCPU in the virtual machine. There may 
be some bugs in ooocore that cause problems in full system mode, even with 
only one VCPU - we do not test this combination.

> Why are the fences and locks left out of the ooocore?
> Is there a certain reason, why a full SMP (with 'instant visibility' still
> being acceptable) core has not been employed?
> Is anybody working on that and willing to share code effort?
>

I think Hui updated the SMT core code so N VCPUs can be configured as either N 
single-threaded cores or one N-threaded core. I'll ask him about this - I 
can't seem to find his code for it right now.

> I will need a reasonable SMP model for my evaluation work, which I'm doing
> for my  Masters thesis. If there ain't no such yet, I'll have to build my
> own. Hence I'd be grateful to any pointers and hints regarding pitfalls and
> previous attempts in order to avoid repeating encountered errors.

We have an ongoing project to add a full MESI cache coherence model to PTLsim.

It's going very slow, since unfortunately the grad students working on this 
don't work full time during the summer, and in general are not as prompt as 
we'd sometimes like w.r.t. PTLsim projects. We had a meeting to go over the 
basic architecture of the full MESI model way back in May 2007, but AFAIK 
nothing ever became of this, and I don't have time to do it myself right now.

If you seriously want to develop a full MESI model for PTLsim, I can provide 
the notes and code templates we started with.

It's possible others have also done some cache coherence support for PTLsim, 
but I have not heard about it and/or it hasn't been released yet. If someone 
has already done this (and is reading this list), please contact me - we'd be 
very pleased if we could integrate it into the official PTLsim version!

- Matt

-------------------------------------------------------
 Matt T. Yourst                    yourst at peptidal.com
 Peptidal Research Inc., Co-Founder and Lead Architect
-------------------------------------------------------