
Next: Performance Counters Up: Appendices Previous: Appendices Contents
PTLsim uop Reference
The following sections document the semantics and encoding of each micro-operation (uop) supported by the PTLsim processor core. The opinfo[] table in ptlhwdef.cpp and constants in ptlhwdef.h give actual numerical values for the opcodes and other fields described below.
Merging Rules
Mnemonic |
Syntax |
Operation
op |
Merging Rules:
The x86 compatible ALUs implement operations on 1, 2, 4 or 8 byte quantities. Unless otherwise indicated, all operations take a 2-bit size shift field (sz) used to determine the effective size in bytes of the operation as follows:
- sz = 0: Low byte of rd is set to the 8-bit result; high 7 bytes of rd are set to corresponding bytes of ra.
- sz = 1: Low two bytes of rd is set to the 16-bit result; high 6 bytes of rd are set to corresponding bytes of ra.
- sz = 2: Low four bytes of rd is set to the 32-bit result; high 4 bytes of rd are cleared to zero in accordance with x86-64 zero extension semantics. The ra operand is unused and should be REG_zero.
- sz = 3: All 8 bytes of rd are set to the 64-bit result. ra is unused and should be REG_zero.
Flags are calculated based on the sz-byte value produced by the ALU, not the final 64-bit result in rd.
Other Pseudo-Operators
The descriptions in this reference use various pseudo-operators to describe the semantics of each uop. These operators are described below.
EvalFlags(ra)
The EvalFlags pseudo-operator evaluates the ZAPS, CF, OF flags attached to the source operand ra in accordance with the type of condition code evaluation specified by the uop. The operator returns 1 if the evaluation is true; otherwise 0 is returned.
SignExt(ra, N)
The SignExt operator sign extends the ra operand by the number of bits specified by N. Specifically, bit ra[N] is copied to all high order bits from bit 63 down to bit N. If N is not specified, it is assumed to mean the number of bits in the effective size of the uop's result (as described under Merging Rules).
MergeWithSFR(mem, sfr)
The MergeWithSFR pseudo-operator is described in the reference page for load uops.
MergeAlign(mem, sfr)
The MergeAlign pseudo-operator is described in the reference page for load uops.
mov and or xor andnot ornot nand nor eqv
Logical Operations
Mnemonic |
Syntax |
Operation
mov |
Notes:
- All operations merge the ALU result with ra and generate flags in accordance with the standard x86 merging rules described previously.
add sub addadd addsub subadd subsub addm subm addc subc
Add and Subtract
Mnemonic |
Syntax |
Operation
add |
Notes:
- All operations merge the ALU result with ra and generate flags in accordance with the standard x86 merging rules described previously.
- The adda and adds uops are useful for small shifts and x86 three-operand LEA-style address generation.
- The addc and subc uops use only the carry flag field of their rc operand; the value is unused.
- The addm and subm uops mask the result by the immediate in rc. They are used in microcode for modular stack arithmetic.
sel
Conditional Select
Mnemonic |
Syntax |
Operation
sel.cc |
Notes:
- cc is any valid condition code flag evaluation
- The sel uop merges the selected operand with ra in accordance with the standard x86 merging rules described previously
- The 64-bit result and all flags are treated as a single value for selection purposes, i.e. the flags attached to the selected input are passed to the output
- If one of the (ra, rb) operands is not valid (has FLAG_INV set) but the selected operand is valid, the result is valid. This is an exception to the invalid bit propagation rule only when the selected input is valid. If the rc operand is invalid, the result is always invalid.
- If any of the inputs are waiting (FLAG_WAIT is set), the uop does not issue, even if the selected input was ready. This is a pipeline simplification.
- set rd = (a),b
- sel rd = b,0,1,c
set
Conditional Set
Mnemonic |
Syntax |
Operation
set.cc |
Notes:
- cc is any valid condition code flag evaluation
- The value 0 or 1 is zero extended to the operation size and merged with rb in accordance with the standard x86 merging rules described previously (except that set uses rb as the merge target instead of ra)
- Flags attached to ra (condition code) are passed through to the output
set.sub set.and
Conditional Compare and Set
Mnemonic |
Syntax |
Operation
set.sub.cc |
Notes:
- The set.sub and set.and uops take the place of a sub or and uop immediately consumed by a set uop; this is intended to shorten the critical path if uop merging is performed by the processor
- cc is any valid condition code flag evaluation
- The value 0 or 1 is zero extended to the operation size and then merged with rc in accordance with the standard x86 merging rules described previously (except that set.sub and set.and use rc as the merge target instead of ra)
- Flags generated as the result of the comparison are passed through with the result
br
Conditional Branch
Mnemonic |
Syntax |
Operation
br.cc |
Notes:
- cc is any valid condition code flag evaluation
- The rip (user-visible instruction pointer register) is reset to one of two immediates. If the flags evaluation is true, the riptaken immediate is selected; otherwise the ripseq immediate is selected.
- If the flag evaluation is false (i.e., ripseq is selected), the BranchMispredict internal exception is raised. The processor should annul all uops after the branch and restart fetching at the RIP specified by the result (in this case, ripseq).
- Branches are always assumed to be taken. If the branch is predicted as not taken (i.e. future uops come from the next sequential RIP after the branch), it is the responsibility of the decoder or frontend to swap the riptaken and ripseq immediates and invert the condition of the branch. All condition encodings can be inverted by inverting bit 0 of the 4-bit condition specifier.
- The destination register should always be REG_rip; otherwise this uop is undefined.
- If the target RIP falls within an unmapped page, not present page or a page marked as no-execute (NX), the PageFaultOnExec exception is taken.
- No flags are generated by this uop
br.sub br.and
Compare and Conditional Branch
Mnemonic |
Syntax |
Operation
br.cc |
Notes:
- The br.sub and br.and uops take the place of a sub or and uop immediately consumed by a br uop; this is intended to shorten the critical path if uop merging is performed by the processor
- cc is any valid condition code flag evaluation
- The rip (user-visible instruction pointer register) is reset to one of two immediates. If the flags evaluation is true, the riptaken immediate is selected; otherwise the ripseq immediate is selected
- If the flag evaluation is false (i.e., ripseq is selected), the BranchMispredict internal exception is raised. The processor should annul all uops after the branch and restart fetching at the RIP specified by the result (in this case, ripseq)
- Branches are always assumed to be taken. If the branch is predicted as not taken (i.e. future uops come from the next sequential RIP after the branch), it is the responsibility of the decoder or frontend to swap the riptaken and ripseq immediates and invert the condition of the branch. All condition encodings can be inverted by inverting bit 0 of the 4-bit condition specifier.
- The destination register should always be REG_rip; otherwise this uop is undefined
- If the target RIP falls within an unmapped page, not present page or a page marked as no-execute (NX), the PageFaultOnExec exception is taken.
- Flags generated as the result of the comparison are passed through with the result
jmp
Indirect Jump
Mnemonic |
Syntax |
Operation
jmp |
Notes:
- The rip (user-visible instruction pointer register) is reset to the target address specified by ra
- If the ra operand does not match the riptaken immediate, the BranchMispredict internal exception is raised. The processor should annul all uops after the branch and restart fetching at the RIP specified by the result (in this case, ra)
- Indirect jumps are always assumed to match the predicted target in riptaken. If some other target is predicted, it is the responsibility of the decoder or frontend to set the riptaken immediate to that predicted target
- The destination register should always be REG_rip; otherwise this uop is undefined
- If the target RIP falls within an unmapped page, not present page or a marked as no-execute (NX), the PageFaultOnExec exception is taken.
- No flags are generated by this uop
jmpp
Indirect Jump Within Microcode
Mnemonic |
Syntax |
Operation
jmpp |
Notes:
- The jmpp uop redirects uop fetching into microcode not accessible as x86 instructions. The target address (inside PTLsim, not x86 space) is specified by ra
- If the ra operand does not match the riptaken immediate, the BranchMispredict internal exception is raised. The processor should annul all uops after the branch and restart fetching at the RIP specified by the result (in this case, ra)
- Indirect jumps are always assumed to match the predicted target in riptaken. If some other target is predicted, it is the responsibility of the decoder or frontend to set the riptaken immediate to that predicted target
- The destination register should always be REG_rip; otherwise this uop is undefined
- The user visible rip register is not updated after this uop issues; otherwise it would point into PTLsim space not accessible to x86 code. Updating is resumed after a normal jmp issues to return to user code. It is the responsibility of the decoder to move the user address to return to into some temporary register (traditionally REG_sr2 but this is not required).
- No flags are generated by this uop
bru
Unconditional Branch
Mnemonic |
Syntax |
Operation
bru |
Notes:
- The rip (user-visible instruction pointer register) is reset to the specified immediate. The processor may redirect fetching from the new RIP
- No exceptions are possible with unconditional branches
- If the target RIP falls within an unmapped page, not present page or a marked as no-execute (NX), the PageFaultOnExec exception is taken.
- No flags are generated by this uop
brp
Unconditional Branch Within Microcode
Mnemonic |
Syntax |
Operation
bru |
Notes:
- The brp uop redirects uop fetching into microcode not accessible as x86 instructions. The target address (inside PTLsim, not x86 space) is specified by the riptaken immediate
- The rip (user-visible instruction pointer register) is reset to the specified riptaken immediate. The processor may redirect fetching from the new RIP
- No exceptions are possible with unconditional branches
- The user visible rip register is not updated after this uop issues; otherwise it would point into PTLsim space not accessible to x86 code. Updating is resumed after a normal jmp uop issues to return to user code. It is the responsibility of the decoder to move the user address to return to into some temporary register (traditionally REG_sr2 but this is not required).
- No flags are generated by this uop
chk
Check Speculation
Mnemonic |
Syntax |
Operation
chk.cc |
Notes:
- The chk uop verifies certain properties about ra. If this verification check passes, no action is taken. If the check fails, chk signals an exception of the user specified type in the rc immediate. The result of the chk uop in this case is the user specified RIP to recover at after the check failure is handled in microcode. This recovery RIP is saved in the recoveryrip internal register.
- This mechanism is intended to allow simple inlined uop sequences to branch into microcode if certain conditions fail, since normally inlined uop sequences cannot contain embedded branches. One example use is in the REP series of instructions to ensure that the count is not zero on entry (a special corner case).
- Unlike most conditional uops, the chk uop directly checks the numerical value of ra against zero, and ignores any attached flags. Therefore, the cc condition code flag evaluation type is restricted to the subset (e, ne, be, nbe, l, nl, le, nle).
- No flags are generated by this uop
ld ld.lo ld.hi ldx ldx.lo ldx.hi
Load
Mnemonic |
Syntax |
Operation
ld |
Notes:
- The PTLsim load unit model is described in substantial detail in Section 21; this section only gives an overview of the load uop semantics.
- The ld family of uops loads values from the virtual address specified by the sum ra + rb. The ld form zero extends the loaded value, while the ldx form sign extends the loaded value to 64 bits.
- All values are zero or sign extended to 64 bits; no subword merging takes place as with ALU uops. The decoder is responsible for following the load with an explicit mov uop to merge 8-bit and 16-bit loads with their old destination register.
- The sfra operand specifies the store forwarding register (a.k.a. store buffer) to merge with data from the cache to form the final result. The inherited SFR may be determined dynamically by querying a store queue or can be predicted statically.
- If the load misses the cache, the FLAG_WAIT flag of the result is set.
- Load uops do not generate any other condition code flags
Unaligned Load Support:
- The processor supports unaligned loads via a pair of ld.lo and ld.hi uops; an overview can be found in Section 5.6. The alignment type of the load is stored in the uop's cond field (0 = ld, 1 = ld.lo, 2 = ld.hi).
- The ld.lo uop rounds down its effective address
to the nearest 64-bit boundary and performs the load. The ld.hi uop rounds
up to the next 64-bit boundary, performs a load at that address, then takes as its third rc operand the first (ld.lo) load's result. The two loads are concatenated into a 128-bit word and the final unaligned data is extracted (and sign extended if the ldx form was used).
- Special corner case for when the actual user address (ra + rb) did not actually require any bytes in the 8-byte range loaded by the ld.hi uop (i.e. the load was contained entirely within the low 64-bit aligned chunk). Since it is perfectly legal to do an unaligned load to the very end of the page such that the next 64 bit chunk is not mapped to a valid page, the ld.hi uop does not actually access memory; the entire result is extracted from the prior ld.lo result in the rc operand.
Exceptions:
- UnalignedAccess if the address (ra + rb) is not aligned to an integral multiple of the size in bytes of the load. Unaligned loads (ld.lo and ld.hi) do not generate this exception. Since x86 automatically corrects alignment problems, microcode must handle this exception as described in Section 5.6.
- PageFaultOnRead if the virtual address (ra + rb) falls on a page not accessible to the caller in the current operating mode, or a page marked as not present.
- Various other exceptions and replay conditions may exist depending on the specific processor core model.
st
Store
Mnemonic |
Syntax |
Operation
st |
Notes:
- The PTLsim store unit model is described in substantial detail in Section 22.1; this section only gives an overview of the store uop semantics.
- The st family of uops prepares values to be stored to the virtual address specified by the sum ra + rb.
- The sfra operand specifies the store forwarding register (a.k.a. store buffer) to merge the data to be stored (the rc operand) into. The inherited SFR may be determined dynamically by querying a store queue or can be predicted statically, as described in 22.1.
- Store uops only generate the SFR for tracking purposes; the cache is only written when the SFR is committed.
- The store uop may issue as soon as the ra and rb operands are ready, even if the rc and sfra operands are not known. The store must be replayed once these operands become known, in accordance with Section 22.2.
- Store uops do not generate any other condition code flags
Unaligned Store Support:
- The processor supports unaligned stores via a pair of st.lo and st.hi uops; an overview can be found in Section 5.6. The alignment type of the load is stored in the uop's cond field (0 = st, 1 = st.lo, 2 = st.hi).
- Stores are handled in a similar manner, with st.lo and st.hi rounding down and up to store parts of the unaligned value in adjacent 64-bit blocks.
- The st.lo uop rounds down its effective address
to the nearest 64-bit boundary and stores the appropriately aligned portion of the rc operand that actually falls within that range of 8 bytes. The ld.hi uop rounds
up to the next 64-bit boundary and similarly stores the appropriately aligned portion of the rc operand that actually falls within that high range of 8 bytes.
- Special corner case for when the actual user address (ra + rb) did not actually touch any bytes in the 8-byte range normally written by the st.hi uop (i.e. the store was contained entirely within the low 64-bit aligned chunk). Since it is perfectly legal to do an unaligned store to the very end of the page such that the next 64 bit chunk is not mapped to a valid page, the st.hi uop does not actually do anything in this case (the bytemask of the generated SFR is set to zero and no exceptions are checked).
Exceptions:
- UnalignedAccess if the address (ra + rb) is not aligned to an integral multiple of the size in bytes of the store. Unaligned stores (st.lo and st.hi) do not generate this exception. Since x86 automatically corrects alignment problems, microcode must handle this exception as described in Section 5.6.
- PageFaultOnWrite if the virtual address (ra + rb) falls on a write protected page, a page not accessible to the caller in the current operating mode, or a page marked as not present.
- LoadStoreAliasing if a prior load is found to alias the store (see Section 22.2.1).
- Various other exceptions and replay conditions may exist depending on the specific processor core model.
ldp ldxp
Load from Internal Microcode Space
Mnemonic |
Syntax |
Operation
ldp |
Notes:
- The ldp and ldxp uops load values from the internal PTLsim address space not accessible to x86 code. Typically this address space is mapped to internal machine state registers (MSRs) and microcode scratch space. The internal address to access is specified by the sum ra + rb. The ldp form zero extends the loaded value, while the ldxp form sign extends the loaded value to 64 bits.
- Load uops do not generate any other condition code flags
- Internal loads may not be unaligned, and never stall or generate exceptions.
stp
Store to Internal Microcode Space
Mnemonic |
Syntax |
Operation
stp |
Notes:
- The stp uop stores a value to the internal PTLsim address space not accessible to x86 code. Typically this address space is mapped to internal machine state registers (MSRs) and microcode scratch space. The internal address to store is specified by the sum ra + rb and the value to store is specified by rc.
- Store uops do not generate any other condition code flags
- Internal stores may not be unaligned, and never stall or generate exceptions.
shl shr sar rotl rotr rotcl rotcr
Shifts and Rotates
Mnemonic |
Syntax |
Operation
shl |
Notes:
- The shift and rotate instructions have some of the most bizarre semantics in the entire x86 instruction set: they may or may not modify flags depending on the rotation count operand, which we may not even know until the instruction issues. This is introduced in Section 5.9.
-
The specific rules are as follows:
- If the count
is zero, no flags are modified
- If the count
, both OF and CF are modified, but ZAPS is preserved
- If the count
, only the CF is modified. (Technically the value in OF is undefined, but on K8 and P4, it retains the old value, so we try to be compatible).
- Shifts also alter the ZAPS flags while rotates do not.
- For constant counts (immediate rb values), the semantics are easy to determine in advance.
- For variable counts (rb comes from register), things are more complex. Since the shift needs to determine its output flags at runtime based on both the shift count and the input flags (CF, OF, ZAPS), we need to specify the latest versions in program order of all the existing flags. However, this would require three operands to the shift uop not even counting the value and count operands. Therefore, we use a collcc (collect condition code flags, see Section 5.4) uop to get all the most up to date flags into one result, using three operands for ZAPS, CF, OF. This forms a zero word with all the correct flags attached, which is then forwarded as the rc operand to the shift. This may add additional scheduling constraints in the case that one of the operands to the shift itself sets the flags, but this is fairly rare. Conveniently, this also lets us directly implement the 65-bit rotcl/rotcr uops in hardware with little additional complexity.
- All operations merge the ALU result with ra and generate flags in accordance with the standard x86 merging rules described previously.
- The specific flags attached to the result depend on the input conditions described above. The user should always assume these uops always produce the latest version of each of the ZAPS, CF, OF flag sets.
mask
Masking, Insertion and Extraction
Mnemonic |
Syntax |
Operation
mask.x|z |
Notes:
- The mask uop and its variants are used for generalized bit field extraction, insertion, sign and zero extension using the 18-bit control field in the immediate
- These uops are used extensively within PTLsim microcode, but are also useful if the processor supports dynamically merging a chain of shr, and, or uops.
- The condition code flags (ZAPS, CF, OF) are the flags logically generated by the final AND operation.
Control Field Format
The 18-bit rc immediate has the following three 6-bit fields:
- The mask uop and its variants are used for generalized bit field extraction, insertion, sign and zero extension using the 18-bit control field in the immediate
Operation:
-
M = 1'[(ms+mc-1):ms]
T = (ra & ~M) | ((rb >>> ds) & M)
if (Z) {
# Zero extend
rd = ra
(T & 1'[(ms+mc-1):0])
else if (X) {
# Sign extend
rd = ra
(T[ms+mc-1]) ? (T | 1'[63:(ms+mc)]) : (T & 1'[(ms+mc-1):0])
} else {
rd = ra
T
}
bswap
Byte Swap
Mnemonic |
Syntax |
Operation
bswap |
Notes:
- The bswap uop reverses the endianness of the rb operand. The uop's effective result size determines the range of bytes which are reversed.
- This uop's semantics are identical to the x86 bswap instruction.
- This uop does not generate any condition code flags.
collcc
Collect Condition Codes
Mnemonic |
Syntax |
Operation
collcc |
Notes:
- The collcc uop collects the condition code flags from three potentially distinct source operands into a single output with the combined condition code flags in both its appended flags and data.
- This uop is useful for collecting all flags before passing them as input to another uop which only supports one source of flags (for instance, the shift and rotate uops).
movccr movrcc
Move Condition Code Flags Between Register Value and Flag Parts
Mnemonic |
Syntax |
Operation
movccr |
Notes:
- The movccr uop takes the condition code flag bits attached to ra and copies them into the 64-bit register part of the result.
- The movrcc uop takes the low bits of the ra operand and moves those bits into the condition code flag bits attached to the result.
- The bits moved consist of the ZF, PF, SF, CF, OF flags
- The WAIT and INV flags of the result are always cleared since the uop would not even issue if these were set in ra.
andcc orcc ornotcc xorcc
Logical Operations on Condition Codes
Mnemonic |
Syntax |
Operation
andcc |
Notes:
- These uops are used to perform logical operations on the condition code flags attached to ra and rb.
- If the rb operand is an immediate, the immediate data is used instead of the flags normally attached to a register operand.
- The 64-bit value of the output is always set to zero.
mull mulh
Integer Multiplication
Mnemonic |
Syntax |
Operation
mull |
Notes:
- These uops multiply ra and rb, then retain only the low N bits or high N bits of the result (where N is the uop's effective result size in bits). This result is then merged into ra.
- The condition code flags generated by these uops correspond to the normal x86 semantics for integer multiplication (imul); the flags are calculated relative to the effective result size.
- The rb operand may be an immediate
bt bts btr btc
Bit Testing and Manipulation
Mnemonic |
Syntax |
Operation
bt |
Notes:
- These uops test a given bit in ra and then atomically modify (set, reset or complement) that bit in the result.
- The CF flag of the output is set to the original value in bit position rb of ra. Other condition code flag bits in the output are undefined.
- The bt (bit test) uop is special: it generates a value of -1 or +1 if the tested bit is 1 or 0, respectively. This is used in microcode for setting up an increment for the rep x86 instructions.
ctz clz
Count Trailing or Leading Zeros
Mnemonic |
Syntax |
Operation
ctz |
Notes:
- These uops find the bit index of the first '1' bit in rb, starting from the lowest bit 0 (for ctz) or the highest bit of the data type (for clz).
- The result is zero (technically, undefined) if ra is zero.
- The ZF flag of the result is 1 if rb was zero, or 0 if rb was nonzero. Other condition code flags are undefined.
ctpop
Count Population of '1' Bits
Mnemonic |
Syntax |
Operation
ctpop |
Notes:
- The ctpop uop counts the number of '1' bits in the ra operand.
- The ZF flag of the result is 1 if ra was zero, or 0 if ra was nonzero. Other condition code flags are undefined.
Floating Point Format and Merging
All floating point uops use the same encoding to specify the precision and vector format of the operands. The uop's size field is encoded as follows:
- 00: Single precision scalar floating point (opfp mnemonic). The operation is only performed on the low 32 bits (in IEEE single precision format) of the 64-bit inputs; the high 32 bits of the ra operand are copied to the high 32 bits of the output.
- 01: Single precision vector floating point (opfv mnemonic). The operation is performed on both 32 bit halves (in IEEE single precision format) of the 64-bit inputs in parallel
- 1x: Double precision scalar floating point (opfd mnemonic). The operation is performed on the full 64 bit inputs (in IEEE double precision format)
Most floating point operations merge the result with the ra operand to prepare the destination. Since a full 64-bit result is generated with the vector and double formats, the ra operand is not needed and may be specified as zero to reduce dependencies.
Exceptions to this encoding are listed where appropriate.
Unless otherwise noted, all operations update the internal floating point status register (FPSR, equivalent to the MXCSR register in x86 code) by ORing in any exceptions that occur. If the uop is encoded to generate an actual exception on excepting conditions, the FLAG_INV flag is attached to the output to cause an exception at commit time.
No condition code flags are generated by floating point uops unless otherwise noted.
addf subf mulf divf minf maxf
Floating Point Arithmetic
Mnemonic |
Syntax |
Operation
addf |
Notes:
- These uops do arithmetic on floating point numbers in various formats as specified in the Floating Point Format and Merging page.
maddf msubf
Fused Multiply Add and Subtract
Mnemonic |
Syntax |
Operation
maddf |
Notes:
- The maddf and msubf uops perform fused multiply and accumulate operations on three operands.
- The full internal precision is preserved between the multiply and add operations; rounding only occurs at the end.
- These uops are primarily used by microcode to calculate floating point division, square root and reciprocal.
sqrtf rcpf rsqrtf
Square Root, Reciprocal and Reciprocal Square Root
Mnemonic |
Syntax |
Operation
sqrtf |
Notes:
- These uops perform the specified unary operation on rb and merge the result into ra (for a single precision scalar mode only)
- The rcpf and rsqrtf uops are approximates - they do not provide the full precision results. These approximations are in accordance with the standard x86 SSE/SSE2 semantics.
cmpf
Compare Floating Point
Mnemonic |
Syntax |
Operation
cmpf.type |
Notes:
- This uop performs the specified comparison of ra and rb. If the comparison is true, the result is set to all '1' bits; otherwise it is zero. The result is then merged into ra.
- The cond field in the uop encoding holds the comparison type. The set of compare types matches the x86 SSE/SSE2 CMPxx instructions.
cmpccf
Compare Floating Point and Generate Condition Codes
Mnemonic |
Syntax |
Operation
cmpccf.type |
Notes:
- This uop performs all comparisons of ra and rb and produces x86 condition code flags (ZF, PF, CF) to represent the result.
- The semantics of the generated condition code flags exactly matches the x86 SSE/SSE2 instructions COMISS/COMISD/UCOMISS/UCOMISD.
-
Unlike most encodings, the size field holds the comparison type of the two values as follows:
- 00: cmpccfp: single precision ordered compare (same semantics as x86 SSE COMISS)
- 01: cmpccfp.u: single precision unordered compare (same semantics as x86 SSE UCOMISS)
- 10: cmpccfd: double precision ordered compare (same semantics as x86 SSE2 COMISD)
- 11: cmpccfd.u: double precision ordered compare (same semantics as x86 SSE2 UCOMISD)
cvtf.i2s.ins cvtf.i2s.p cvtf.i2d.lo cvtf.i2d.hi
Convert 32-bit Integer to Floating Point
Mnemonic |
Syntax |
Operation |
Used By
cvtf.i2s.ins |
Notes:
- These uops convert 32-bit integers to single or double precision floating point
- The semantics of these instructions are identical to the semantics of the x86 SSE/SSE2 instructions shown in the table
- The uop size field is not used by these uops
cvtf.q2s.ins cvtf.q2d
Convert 64-bit Integer to Floating Point
Mnemonic |
Syntax |
Operation |
Used By
cvtf.q2s.ins |
Notes:
- These uops convert 64-bit integers to single or double precision floating point
- The semantics of these instructions are identical to the semantics of the x86 SSE/SSE2 instructions shown in the table
- The uop size field is not used by these uops
cvtf.s2i cvt.s2q cvtf.s2i.p
Convert Single Precision Floating Point to Integer
Mnemonic |
Syntax |
Operation |
Used By
cvtf.s2i |
Notes:
- These uops convert single precision floating point values to 32-bit or 64-bit integers
- The semantics of these instructions are identical to the semantics of the x86 SSE/SSE2 instructions shown in the table
-
Unlike most encodings, the size field holds the rounding type of the result as follows:
- x0: normal IEEE rounding (as determined by FPSR)
- x1: truncate to zero
cvtf.d2i cvtf.d2q cvtf.d2i.p
Convert Double Precision Floating Point to Integer
Mnemonic |
Syntax |
Operation |
Used By
cvtf.d2i |
Notes:
- These uops convert double precision floating point values to 32-bit or 64-bit integers
- The semantics of these instructions are identical to the semantics of the x86 SSE/SSE2 instructions shown in the table
-
Unlike most encodings, the size field holds the rounding type of the result as follows:
- x0: normal IEEE rounding (as determined by FPSR)
- x1: truncate to zero
cvtf.d2s.ins cvtf.d2s.p cvtf.s2d.lo cvtf.s2d.hi
Convert Between Double Precision and Single Precision Floating Point
Mnemonic |
Syntax |
Operation |
Used By
cvtf.d2s.ins |
Notes:
- These uops convert single precision floating point values to double precision floating point values
- The semantics of these instructions are identical to the semantics of the x86 SSE/SSE2 instructions shown in the table
- The uop size field is not used by these uops

Next: Performance Counters Up: Appendices Previous: Appendices Contents
Matt T Yourst 2007-09-26