[PTLsim-devel] Issues with multiple OOO-cores
Matt T. Yourst
Mon Aug 20 22:12:09 EDT 2007
On Monday 20 August 2007 14:37, Stephan Diestelhorst wrote:
> Hi again,
> during testing and code review I 've found that it is possible to use
> the 'ooocore' with multiplie cores in PTLsim/X.
>
> As far as I can see, ooocore treats fences and locked memory accesses (as
> in SWAP for example), as no-ops and ignores any side-effects caused by
> them. This is fine for single-core usage, but can IMHO create trouble in a
> multi-core scenario (which actually exists) if one has a multi-cpu domU and
> uses the ooo model, as the 'instant visibility' consistency model still
> allows shuffling of the loads on a core, which could cause inconsistent
> reads from data written by another core.
>
> This might be one of the reasons for using the smtcore, where those
> instructions actually have the expected effects: fences cause ordering
> constraints on memops and locked memops actually take a lock for the
> duration of the x86 op on the modfied address. But then again uops do
> contend for functional units and cache entries in the smtcore.
>
> Is my understanding correct so far?
>
Yes, that's correct - ooocore does not support more than one VCPU since it
lacks locking support. Since the locking is so complicated (for deadlock
reasons), We only support this in the SMT core, rather than having to
maintain the same code in two places. SMT threads need this support anyway,
independent of any cache coherence based line locking.
At some point we want to remove ooocore from the code base, but since
apparently many people still depend on it, we decided not to do this, and we
are trying not to make major changes to the ooocore code for this reason.
The bottom line is this:
- always use smtcore for full system PTLsim, even with 1 VCPU
- always use ooocore for userspace PTLsim
This applies even if you only have one VCPU in the virtual machine. There may
be some bugs in ooocore that cause problems in full system mode, even with
only one VCPU - we do not test this combination.
> Why are the fences and locks left out of the ooocore?
> Is there a certain reason, why a full SMP (with 'instant visibility' still
> being acceptable) core has not been employed?
> Is anybody working on that and willing to share code effort?
>
I think Hui updated the SMT core code so N VCPUs can be configured as either N
single-threaded cores or one N-threaded core. I'll ask him about this - I
can't seem to find his code for it right now.
> I will need a reasonable SMP model for my evaluation work, which I'm doing
> for my Masters thesis. If there ain't no such yet, I'll have to build my
> own. Hence I'd be grateful to any pointers and hints regarding pitfalls and
> previous attempts in order to avoid repeating encountered errors.
We have an ongoing project to add a full MESI cache coherence model to PTLsim.
It's going very slow, since unfortunately the grad students working on this
don't work full time during the summer, and in general are not as prompt as
we'd sometimes like w.r.t. PTLsim projects. We had a meeting to go over the
basic architecture of the full MESI model way back in May 2007, but AFAIK
nothing ever became of this, and I don't have time to do it myself right now.
If you seriously want to develop a full MESI model for PTLsim, I can provide
the notes and code templates we started with.
It's possible others have also done some cache coherence support for PTLsim,
but I have not heard about it and/or it hasn't been released yet. If someone
has already done this (and is reading this list), please contact me - we'd be
very pleased if we could integrate it into the official PTLsim version!
- Matt
-------------------------------------------------------
Matt T. Yourst yourst at peptidal.com
Peptidal Research Inc., Co-Founder and Lead Architect
-------------------------------------------------------
More information about the PTLsim-devel mailing list