Debugging Tools for Windows

Itanium Instructions

Only the instructions most likely to be encountered in user-mode code are detailed here. Instructions marked with an asterisk (*) are particularly important.

The general notation for instructions is:

(qp) op... dest = src1, src2, ...

where (qp) is the optional qualifying predicate, the ellipses (...) after the opcode are optional completers, dest is the destination (or destinations, for comparison operators), and src is the source.

Memory Access (Aligned)

*	ldx	Ra = [Rb]	Load from memory (zero-extended).
*	ldx	Ra = [Rb], Rc/imm₉	Load with postincrement.

An optional parameter after the comma performs a postincrement of the Rb register. For example, ld8 r1 = [r2], 8 loads a 64-bit value from the address in r2 and then increments the r2 register by 8.

An optional .nt# completer spcifies that the memory will not be accessed again for a while. Higher values of # indicate longer-term abandonment. (.nta is the most aggressive level.)

*	stx	[Ra] = Rb	Store to memory.
*	stx	[Ra] = Rb, imm₉	Store with postincrement.

Again, an optional parameter after the comma performs a postincrement of the address register.

*	ldff	Ra = [Rb]	Load from memory (zero-extended).
*	ldff	Ra = [Rb], Rc/imm₉	Load with postincrement.
*	stff	[Ra] = Rb	Store to memory.
*	stff	[Ra] = Rb, imm₉	Store with postincrement.

And again for floating-point registers.

Memory Access (Advanced/Speculated)

ldx.s	Ra = [Rb]	Load speculated.
ldx.a	Ra = [Rb]	Load advanced.
ldx.sa	Ra = [Rb]	Load speculated advanced.

These were previously discussed.

st.spill...	[Rb] = Ra	Save a speculated value.
ld.fill...	Ra = [Rb]	Restore a speculated value.

Read and write a value that might have the NaT bit set. The numerical value is written as usual, and the NaT bit is saved/restored in the ar.unat special register. These allow you to speculate across procedure calls.

lfetch...

...

Cache line prefetch.

Verifying Speculated/Advanced Instructions

ldx.c.clr	Ra = [Rb]	Check (or reload) and clear.
ldx.c.nc	Ra = [Rb]	Check (or reload), no clear.
chk.a.clr	Ra = [Rb]	Check (or jump) and clear.
chk.a.nc	Ra = [Rb]	Check (or jump), no clear.

Clearing an advanced load means that you have no plans to check the load again.

Special Registers

mov	Ra = S	Read from special register.
mov	S = Ra	Write to special register.

Special registers, in general, can only be read from and written to. They do not take part in arithmetic computations and cannot be compared against directly.

mov

pr = Ra, mask

Write to predicate registers.

The mask specifies which predicate registers should be loaded from register Ra. The bottom bit of the mask corresponds to predicate register p1, through bit 14 of the mask corresponding to predicate register p15. Bit 15 of the mask represents all the predicate registers p16 through p63. (Recall that predicate register zero is hard-wired to TRUE.)

Recall that the predicate register preservation rules are established by convention, so the only masks you are likely to see are -1 (restore all registers) and 0x801F (preserve all the usual registers).

Interlocked Instructions

xchgx...

Ra = [Rb], Rc

Interlocked exchange

Store Rc to [Rb] and return the original value in Ra.

cmpxchgx...

Ra = [Rb], Rc, ar.ccv

Conditional exchange

Check if the value in [Rb] is equal to the special ar.ccv register. If so, store Rc to [Rb]; otherwise, leave it unchanged. In either case, return the original value of [Rb] in Ra.

fetchaddx...

Ra = [Rb], Rc/n

Interlocked add

Atomically adds Rc/n to [Rb], returning the previous value in Ra.

Control Flow

*	br.cond...	Ba/addr	Branch
*	br.call...	Ba/addr	Call
*	br.ret...	Ba/addr	Return

See the Control Flow section in Itanium Architecture for a description of the various completers.

There are other types of branch instructions as well, but these are not used as much.

Arithmetic

*	add	Ra = Rb,Rc	Ra = Rb + Rc
*	add	Ra = Rb,Rc,1	Ra = Rb + Rc + 1
*	adds	Ra = imm₁₄,Rb	Ra = imm₁₄ + Rb
*	addl	Ra = imm₂₂,Rb	Ra = imm₂₂ + Rb
*	subx	Ra = Rb/n,Rc	Ra = Rb/n - Rc
*	subx	Ra = Rb,Rc,1	Ra = Rb - Rc - 1

shladd

Ra = Rb,n,Rc

Ra = (Rb SHL n) + Rc

This shifts the first addend left by up to four positions before adding.

Note There is no integer division or multiplication. See the Multiplication subsection in this section for a multiplication workaround. For division, you will have to convert to floating point.

Multiplication

There is a special floating-point format for handling integer multiplication.

*	setf.sig	Fa = Rb	Fa = Rb (special form)
*	getf.sig	Ra = Fb	Ra = Fb (special form)
*	xma...	Fa = Fb, Fc, Fd	Fa = Fb * Fc + Fd (special form)

The setf.sig and getf.sig instructions transfer between integer registers and floating-point registers (in the special form).

The xma instruction performs the operation on numbers in special form, and the result is also in special form.

There are four variations on the xma instruction. The .l version saves the low 64 bits of the result, the .h version saves the high 64 bits of the result, and the .u version performs an unsigned multiplication, rather than a signed multiplication.

For example, xma.lu performs the multiplication as two unsigned integers and saves the low 64 bits of the result.

Bits

*	and	Ra = Rb/imm₈,Rc	Ra = Rb/imm₈ and Rc
*	or	Ra = Rb/imm₈,Rc	Ra = Rb/imm₈ or Rc
*	andcm	Ra = Rb/imm₈,Rc	Ra = Rb/imm₈ and not Rc
*	xor	Ra = Rb/imm₈,Rc	Ra = Rb/imm₈ xor Rc

The andcm instruction clears the bits specified by the last parameter.

*	shl	Ra = Rb,Rc/n	Ra = Rb SHL Rc/n
*	shr	Ra = Rb,Rc/n	Ra = Rb SAR Rc/n
*	shr.u	Ra = Rb,Rc/n	Ra = Rb SHR Rc/n

The shx instructions do shifting. A more general form of shifting is performed by the extr and dep instructions.

*	extr	Ra = Rb, n1, n2	Ra = Rb<n1, n2>
*	extr.u	Ra = Rb, n1, n2	Ra = Rb<n1, n2>

The regular version of the extr (extract) instruction sign-extends the result, whereas the extr.u form zero-extends the result. The bit extraction instructions are also used to handle unaligned data.

*	dep	Ra = Rb, Rc, n1, n2	Ra<n1, n2> = Rb;
			other bits come from Rc

The dep (deposit) instruction builds its output by taking the <n1, n2> part from Rb and the rest of Rc. Think of it as a masked blt.

shrp

Ra = Rb, Rc, n

Ra = (Rb:Rc)<n, 64>

The shrp (shift right pair) instruction treats Rb and Rc as a huge 128-bit value and extracts 64 bits of it into the Ra register.

Constants

movl

Ra = n

Load 64-bit number.

Small numbers (up to 22 bits) can be loaded using add Ra = n, r0 instruction. Larger numbers require the movl instruction. This is one of the few instructions that takes up two slots.

Comparisons

*	cmp.cc	p1, p2 = Ra, Rb	Compare 64-bit values.
*	cmp4.cc	p1, p2 = Ra, Rb	Compare 32-bit values.

See Comparisons section in Itanium Architecture for a detailed explanation.

tbit

p1, p2 = Ra, n

Test bit

The tbit instruction tests bit n in register Ra, setting both p1 and p2 accordingly.

Bit and Bytes

popcnt	Ra = Rb	Ra = number of set bits in Rb
czx1.l	Ra = Rb	Ra = position of lowest zero byte
czx2.l	Ra = Rb	Ra = position of lowest zero word
czx1.r	Ra = Rb	Ra = position of highest zero byte
czx2.r	Ra = Rb	Ra = position of highest zero word

If there is no zero byte or word, the czx instruction sets the Ra register to 8 (czx1) or 4 (czx2).

Conversion

*	sxtx	Ra = Rb	sign-extend Rb to Ra
*	zxtx	Ra = Rb	zero-extend Rb to Ra

Idioms

*	add	Ra = r0, n	mov Ra = n
*	add	Ra = r0, Rb	mov Ra = Rb
*	add	Ra = Rb, r0, 1	inc Ra = Rb
*	sub	Ra = Rb, r0, 1	dec Ra = Rb
*	subx	Ra = r0, Rb	negx Ra = Rb
*	xor	Ra = -1, Rb	not Ra = Rb
	shrp	Ra = Rb, Rb, n	rotl Ra = Rb, n

You can rotate by doing a paired shift where the two input registers are the same.

Build machine: CAPEBUILD