Debugging Tools for Windows

Itanium Instructions

Only the instructions most likely to be encountered in user-mode code are detailed here. Instructions marked with an asterisk (*) are particularly important.

The general notation for instructions is:

(qp)    op...   dest = src1src2, ...

where (qp) is the optional qualifying predicate, the ellipses (...) after the opcode are optional completers, dest is the destination (or destinations, for comparison operators), and src is the source.

Memory Access (Aligned)

* ldx Ra = [Rb] Load from memory (zero-extended).
* ldx Ra = [Rb], Rc/imm9 Load with postincrement.

An optional parameter after the comma performs a postincrement of the Rb register. For example, ld8 r1 = [r2], 8 loads a 64-bit value from the address in r2 and then increments the r2 register by 8.

An optional .nt# completer spcifies that the memory will not be accessed again for a while. Higher values of # indicate longer-term abandonment. (.nta is the most aggressive level.)

* stx [Ra] = Rb Store to memory.
* stx [Ra] = Rb, imm9 Store with postincrement.

Again, an optional parameter after the comma performs a postincrement of the address register.

* ldff Ra = [Rb] Load from memory (zero-extended).
* ldff Ra = [Rb], Rc/imm9 Load with postincrement.
* stff [Ra] = Rb Store to memory.
* stff [Ra] = Rb, imm9 Store with postincrement.

And again for floating-point registers.

Memory Access (Advanced/Speculated)

ldx.s Ra = [Rb] Load speculated.
ldx.a Ra = [Rb] Load advanced.
ldx.sa Ra = [Rb] Load speculated advanced.

These were previously discussed.

st.spill... [Rb] = Ra Save a speculated value.
ld.fill... Ra = [Rb] Restore a speculated value.

Read and write a value that might have the NaT bit set. The numerical value is written as usual, and the NaT bit is saved/restored in the ar.unat special register. These allow you to speculate across procedure calls.

lfetch... ... Cache line prefetch.

Verifying Speculated/Advanced Instructions

ldx.c.clr Ra = [Rb] Check (or reload) and clear.
ldx.c.nc Ra = [Rb] Check (or reload), no clear.
chk.a.clr Ra = [Rb] Check (or jump) and clear.
chk.a.nc Ra = [Rb] Check (or jump), no clear.

Clearing an advanced load means that you have no plans to check the load again.

Special Registers

mov Ra = S Read from special register.
mov S = Ra Write to special register.

Special registers, in general, can only be read from and written to. They do not take part in arithmetic computations and cannot be compared against directly.

mov pr = Ra, mask Write to predicate registers.

The mask specifies which predicate registers should be loaded from register Ra. The bottom bit of the mask corresponds to predicate register p1, through bit 14 of the mask corresponding to predicate register p15. Bit 15 of the mask represents all the predicate registers p16 through p63. (Recall that predicate register zero is hard-wired to TRUE.)

Recall that the predicate register preservation rules are established by convention, so the only masks you are likely to see are -1 (restore all registers) and 0x801F (preserve all the usual registers).

Interlocked Instructions

xchgx... Ra = [Rb], Rc Interlocked exchange

Store Rc to [Rb] and return the original value in Ra.

cmpxchgx... Ra = [Rb], Rc, ar.ccv Conditional exchange

Check if the value in [Rb] is equal to the special ar.ccv register. If so, store Rc to [Rb]; otherwise, leave it unchanged. In either case, return the original value of [Rb] in Ra.

fetchaddx... Ra = [Rb], Rc/n Interlocked add

Atomically adds Rc/n to [Rb], returning the previous value in Ra.

Control Flow

* br.cond... Ba/addr Branch
* br.call... Ba/addr Call
* br.ret... Ba/addr Return

See the Control Flow section in Itanium Architecture for a description of the various completers.

There are other types of branch instructions as well, but these are not used as much.

Arithmetic

* add Ra = Rb,Rc Ra = Rb + Rc
* add Ra = Rb,Rc,1 Ra = Rb + Rc + 1
* adds Ra = imm14,Rb Ra = imm14 + Rb
* addl Ra = imm22,Rb Ra = imm22 + Rb
* subx Ra = Rb/n,Rc Ra = Rb/n - Rc
* subx Ra = Rb,Rc,1 Ra = Rb - Rc - 1

shladd Ra = Rb,n,Rc Ra = (Rb SHL n) + Rc

This shifts the first addend left by up to four positions before adding.

Note  There is no integer division or multiplication. See the Multiplication subsection in this section for a multiplication workaround. For division, you will have to convert to floating point.

Multiplication

There is a special floating-point format for handling integer multiplication.

* setf.sig Fa = Rb Fa = Rb (special form)
* getf.sig Ra = Fb Ra = Fb (special form)
* xma... Fa = Fb, Fc, Fd Fa = Fb * Fc + Fd (special form)

The setf.sig and getf.sig instructions transfer between integer registers and floating-point registers (in the special form).

The xma instruction performs the operation on numbers in special form, and the result is also in special form.

There are four variations on the xma instruction. The .l version saves the low 64 bits of the result, the .h version saves the high 64 bits of the result, and the .u version performs an unsigned multiplication, rather than a signed multiplication.

For example, xma.lu performs the multiplication as two unsigned integers and saves the low 64 bits of the result.

Bits

* and Ra = Rb/imm8,Rc Ra = Rb/imm8 and Rc
* or Ra = Rb/imm8,Rc Ra = Rb/imm8 or Rc
* andcm Ra = Rb/imm8,Rc Ra = Rb/imm8 and not Rc
* xor Ra = Rb/imm8,Rc Ra = Rb/imm8 xor Rc

The andcm instruction clears the bits specified by the last parameter.

* shl Ra = Rb,Rc/n Ra = Rb SHL Rc/n
* shr Ra = Rb,Rc/n Ra = Rb SAR Rc/n
* shr.u Ra = Rb,Rc/n Ra = Rb SHR Rc/n

The shx instructions do shifting. A more general form of shifting is performed by the extr and dep instructions.

* extr Ra = Rb, n1, n2 Ra = Rb<n1, n2>
* extr.u Ra = Rb, n1, n2 Ra = Rb<n1, n2>

The regular version of the extr (extract) instruction sign-extends the result, whereas the extr.u form zero-extends the result. The bit extraction instructions are also used to handle unaligned data.

* dep Ra = Rb, Rc, n1, n2 Ra<n1, n2> = Rb;
other bits come from Rc

The dep (deposit) instruction builds its output by taking the <n1, n2> part from Rb and the rest of Rc. Think of it as a masked blt.

* shrp Ra = Rb, Rc, n Ra = (Rb:Rc)<n, 64>

The shrp (shift right pair) instruction treats Rb and Rc as a huge 128-bit value and extracts 64 bits of it into the Ra register.

Constants

* movl Ra = n Load 64-bit number.

Small numbers (up to 22 bits) can be loaded using add Ra = n, r0 instruction. Larger numbers require the movl instruction. This is one of the few instructions that takes up two slots.

Comparisons

* cmp.cc p1, p2 = Ra, Rb Compare 64-bit values.
* cmp4.cc p1, p2 = Ra, Rb Compare 32-bit values.

See Comparisons section in Itanium Architecture for a detailed explanation.

* tbit p1, p2 = Ra, n Test bit

The tbit instruction tests bit n in register Ra, setting both p1 and p2 accordingly.

Bit and Bytes

popcnt Ra = Rb Ra = number of set bits in Rb
czx1.l Ra = Rb Ra = position of lowest zero byte
czx2.l Ra = Rb Ra = position of lowest zero word
czx1.r Ra = Rb Ra = position of highest zero byte
czx2.r Ra = Rb Ra = position of highest zero word

If there is no zero byte or word, the czx instruction sets the Ra register to 8 (czx1) or 4 (czx2).

Conversion

* sxtx Ra = Rb sign-extend Rb to Ra
* zxtx Ra = Rb zero-extend Rb to Ra

Idioms

* add Ra = r0, n mov Ra = n
* add Ra = r0, Rb mov Ra = Rb
* add Ra = Rb, r0, 1 inc Ra = Rb
* sub Ra = Rb, r0, 1 dec Ra = Rb
* subx Ra = r0, Rb negx Ra = Rb
* xor Ra = -1, Rb not Ra = Rb
shrp Ra = Rb, Rb, n rotl Ra = Rb, n

You can rotate by doing a paired shift where the two input registers are the same.

Build machine: CAPEBUILD