Sideband Stack Optimizer is one of many of the AMD “Barcelona” processor’s evolutionary “CPU Core IPC improvement” features. The Sideband Stack Optimizer is special circuitry in the core that tracks the value that the stack-pointer (RSP) assumes, allowing parallel execution of more than one PUSH or POP instruction. This is typically implemented by modifying epilogue and prolog code to utilize PUSH/POP instead of explicit references via RSP.
Motivations:
- Chains of pushes and pops are dependent through RSP (i.e. breaks serial dependence chains for consecutive PUSH/POPs)
- Can remove dependency by tracking RSP changes in a sideband register a.k.a. Stack-Pointer Delta or “SPd”
- RSP adjustments then don’t require functional unit bandwidth (no uops)
Basic operation:
- Converts PUSH ops into pure stores (i.e. Save a pass through the functional unit)
- Converts POP ops into pure loads (i.e. Save an op)
- Also optimizes performance of CALL and RET instructions
For the software developer, this invokes preference for small push/pop over larger explicit store/load instructions to promote code density optimization.
Examples:
Replace this : MOV reg, [RSP+disp8]
(4 bytes)
or this : MOV reg, [RSP+disp32]
(8 bytes)
With this : POP reg
(1 byte)
Replace this : MOV [RSP+disp8], reg
(4 bytes)
or this : MOV [RSP+disp32], reg
(8 bytes)
With this : PUSH reg
(1 byte)
For more details and examples of this feature, please refer to Chapter 4, section 4.7 of the Software Optimization Guide for AMD Family 10h Processors.
This post is the opinion of the author and may not represent AMD’s positions, strategies or opinions. Links to third party sites and references to third party trademarks are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.