Debugging Tools for Windows |
Only the instructions most likely to be encountered in user-mode code are detailed here. Instructions marked with an asterisk (*) are particularly important.
The general notation for instructions is:
where (qp) is the optional qualifying predicate, the ellipses (...) after the opcode are optional completers, dest is the destination (or destinations, for comparison operators), and src is the source.
* | ldx | Ra = [Rb] | Load from memory (zero-extended). |
* | ldx | Ra = [Rb], Rc/imm9 | Load with postincrement. |
An optional parameter after the comma performs a postincrement of the Rb register. For example, ld8 r1 = [r2], 8 loads a 64-bit value from the address in r2 and then increments the r2 register by 8.
An optional .nt# completer spcifies that the memory will not be accessed again for a while. Higher values of # indicate longer-term abandonment. (.nta is the most aggressive level.)
* | stx | [Ra] = Rb | Store to memory. |
* | stx | [Ra] = Rb, imm9 | Store with postincrement. |
Again, an optional parameter after the comma performs a postincrement of the address register.
* | ldff | Ra = [Rb] | Load from memory (zero-extended). |
* | ldff | Ra = [Rb], Rc/imm9 | Load with postincrement. |
* | stff | [Ra] = Rb | Store to memory. |
* | stff | [Ra] = Rb, imm9 | Store with postincrement. |
And again for floating-point registers.
ldx.s | Ra = [Rb] | Load speculated. |
ldx.a | Ra = [Rb] | Load advanced. |
ldx.sa | Ra = [Rb] | Load speculated advanced. |
These were previously discussed.
st.spill... | [Rb] = Ra | Save a speculated value. |
ld.fill... | Ra = [Rb] | Restore a speculated value. |
Read and write a value that might have the NaT bit set. The numerical value is written as usual, and the NaT bit is saved/restored in the ar.unat special register. These allow you to speculate across procedure calls.
lfetch... | ... | Cache line prefetch. |
ldx.c.clr | Ra = [Rb] | Check (or reload) and clear. |
ldx.c.nc | Ra = [Rb] | Check (or reload), no clear. |
chk.a.clr | Ra = [Rb] | Check (or jump) and clear. |
chk.a.nc | Ra = [Rb] | Check (or jump), no clear. |
Clearing an advanced load means that you have no plans to check the load again.
mov | Ra = S | Read from special register. |
mov | S = Ra | Write to special register. |
Special registers, in general, can only be read from and written to. They do not take part in arithmetic computations and cannot be compared against directly.
mov | pr = Ra, mask | Write to predicate registers. |
The mask specifies which predicate registers should be loaded from register Ra. The bottom bit of the mask corresponds to predicate register p1, through bit 14 of the mask corresponding to predicate register p15. Bit 15 of the mask represents all the predicate registers p16 through p63. (Recall that predicate register zero is hard-wired to TRUE.)
Recall that the predicate register preservation rules are established by convention, so the only masks you are likely to see are -1 (restore all registers) and 0x801F (preserve all the usual registers).
xchgx... | Ra = [Rb], Rc | Interlocked exchange |
Store Rc to [Rb] and return the original value in Ra.
cmpxchgx... | Ra = [Rb], Rc, ar.ccv | Conditional exchange |
Check if the value in [Rb] is equal to the special ar.ccv register. If so, store Rc to [Rb]; otherwise, leave it unchanged. In either case, return the original value of [Rb] in Ra.
fetchaddx... | Ra = [Rb], Rc/n | Interlocked add |
Atomically adds Rc/n to [Rb], returning the previous value in Ra.
* | br.cond... | Ba/addr | Branch |
* | br.call... | Ba/addr | Call |
* | br.ret... | Ba/addr | Return |
See the Control Flow section in Itanium Architecture for a description of the various completers.
There are other types of branch instructions as well, but these are not used as much.
* | add | Ra = Rb,Rc | Ra = Rb + Rc |
* | add | Ra = Rb,Rc,1 | Ra = Rb + Rc + 1 |
* | adds | Ra = imm14,Rb | Ra = imm14 + Rb |
* | addl | Ra = imm22,Rb | Ra = imm22 + Rb |
* | subx | Ra = Rb/n,Rc | Ra = Rb/n - Rc |
* | subx | Ra = Rb,Rc,1 | Ra = Rb - Rc - 1 |
shladd | Ra = Rb,n,Rc | Ra = (Rb SHL n) + Rc |
This shifts the first addend left by up to four positions before adding.
Note There is no integer division or multiplication. See the Multiplication subsection in this section for a multiplication workaround. For division, you will have to convert to floating point.
There is a special floating-point format for handling integer multiplication.
* | setf.sig | Fa = Rb | Fa = Rb (special form) |
* | getf.sig | Ra = Fb | Ra = Fb (special form) |
* | xma... | Fa = Fb, Fc, Fd | Fa = Fb * Fc + Fd (special form) |
The setf.sig and getf.sig instructions transfer between integer registers and floating-point registers (in the special form).
The xma instruction performs the operation on numbers in special form, and the result is also in special form.
There are four variations on the xma instruction. The .l version saves the low 64 bits of the result, the .h version saves the high 64 bits of the result, and the .u version performs an unsigned multiplication, rather than a signed multiplication.
For example, xma.lu performs the multiplication as two unsigned integers and saves the low 64 bits of the result.
* | and | Ra = Rb/imm8,Rc | Ra = Rb/imm8 and Rc |
* | or | Ra = Rb/imm8,Rc | Ra = Rb/imm8 or Rc |
* | andcm | Ra = Rb/imm8,Rc | Ra = Rb/imm8 and not Rc |
* | xor | Ra = Rb/imm8,Rc | Ra = Rb/imm8 xor Rc |
The andcm instruction clears the bits specified by the last parameter.
* | shl | Ra = Rb,Rc/n | Ra = Rb SHL Rc/n |
* | shr | Ra = Rb,Rc/n | Ra = Rb SAR Rc/n |
* | shr.u | Ra = Rb,Rc/n | Ra = Rb SHR Rc/n |
The shx instructions do shifting. A more general form of shifting is performed by the extr and dep instructions.
* | extr | Ra = Rb, n1, n2 | Ra = Rb<n1, n2> |
* | extr.u | Ra = Rb, n1, n2 | Ra = Rb<n1, n2> |
The regular version of the extr (extract) instruction sign-extends the result, whereas the extr.u form zero-extends the result. The bit extraction instructions are also used to handle unaligned data.
* | dep | Ra = Rb, Rc, n1, n2 | Ra<n1, n2> = Rb; |
other bits come from Rc |
The dep (deposit) instruction builds its output by taking the <n1, n2> part from Rb and the rest of Rc. Think of it as a masked blt.
* | shrp | Ra = Rb, Rc, n | Ra = (Rb:Rc)<n, 64> |
The shrp (shift right pair) instruction treats Rb and Rc as a huge 128-bit value and extracts 64 bits of it into the Ra register.
* | movl | Ra = n | Load 64-bit number. |
Small numbers (up to 22 bits) can be loaded using add Ra = n, r0 instruction. Larger numbers require the movl instruction. This is one of the few instructions that takes up two slots.
* | cmp.cc | p1, p2 = Ra, Rb | Compare 64-bit values. |
* | cmp4.cc | p1, p2 = Ra, Rb | Compare 32-bit values. |
See Comparisons section in Itanium Architecture for a detailed explanation.
* | tbit | p1, p2 = Ra, n | Test bit |
The tbit instruction tests bit n in register Ra, setting both p1 and p2 accordingly.
popcnt | Ra = Rb | Ra = number of set bits in Rb |
czx1.l | Ra = Rb | Ra = position of lowest zero byte |
czx2.l | Ra = Rb | Ra = position of lowest zero word |
czx1.r | Ra = Rb | Ra = position of highest zero byte |
czx2.r | Ra = Rb | Ra = position of highest zero word |
If there is no zero byte or word, the czx instruction sets the Ra register to 8 (czx1) or 4 (czx2).
* | sxtx | Ra = Rb | sign-extend Rb to Ra |
* | zxtx | Ra = Rb | zero-extend Rb to Ra |
* | add | Ra = r0, n | mov Ra = n |
* | add | Ra = r0, Rb | mov Ra = Rb |
* | add | Ra = Rb, r0, 1 | inc Ra = Rb |
* | sub | Ra = Rb, r0, 1 | dec Ra = Rb |
* | subx | Ra = r0, Rb | negx Ra = Rb |
* | xor | Ra = -1, Rb | not Ra = Rb |
shrp | Ra = Rb, Rb, n | rotl Ra = Rb, n |
You can rotate by doing a paired shift where the two input registers are the same.