“Barcelona” Processor Feature: 128-bit FPU

Image may be NSFW.
Clik here to view.

The new AMD “Barcelona” processors introduce dramatically improved numerical performance when using the standard SSE, SSE2 and SSE3 instruction extensions. Previous AMD processors typically could execute a vectorized SSE instruction (for example, MULPS to perform four multiplies) every two clock cycles. In the AMD “Barcelona” processor, this performance is doubled so a new vectorized SSE instruction like MULPS can typically be issued every cycle. This feature is called SSE128 because an entire 128-bit SSE register is processed on each clock tick. A detailed discussion of SSE128 can be found in the article “SSE128: AMD’s New Floating-Point Enhancements.”

Furthermore, with separate pipelines for add-class and multiply-class instructions, the new processor has a theoretical peak throughput of 8 single-precision floating point calculations per clock cycle. Integer SSE instructions get a similar boost. For complete timing details on all the instructions, see the Software Optimization Guide for AMD Family 10h Processors, appendix C.

The easiest way to realize the benefits of SSE128 in real applications is to leverage existing library code which has been optimized using vectorized SSE instructions. The AMD Performance Library (APL) is one such library, providing a collection of popular software routines designed to accelerate application development, debugging, and optimization on x86 class processors to provide a quick path to high performance development. Also, the new release of the AMD Core Math Library (ACML), version 4.0, features new kernels tuned for great performance on the new processors. Specifically, DGEMM, SGEMM and CFFT have all been optimized to take advantage of the improved floating point performance. Another new feature of ACML 4.0 is the upgrade of the LAPACK routines to the new LAPACK 3.1. Many of these LAPACK routines have been optimized with OpenMP to take advantage of AMD’s new quad-core processors. ACML will continue to improve, with more optimized functions in future releases.

Intermediate or advanced programmers can write custom vectorized SSE code to improve performance. Using Microsoft’s Compiler Intrinsic functions for SSE, developers can write one version of SSE code that compiles for both 32-bit and 64-bit native platforms, something which is not possible using pure assembly code. See the article “Performance Optimization of 64-bit Windows Applications for AMD Athlon™ 64 and AMD Opteron™ Processors using Microsoft Visual Studio 2005” for an easy-to-follow tutorial with demo code showing some examples using Microsoft Visual Studio 2005.

This post is the opinion of the author and may not represent AMD’s positions, strategies or opinions. Links to third party sites and references to third party trademarks are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

“Barcelona” Processor Feature: 128-bit FPU

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112