Flipping sign on packed SSE floats.

I'm looking for the most efficient method of flipping the sign on all four floats packed in an SSE register. I have not found an intrinsic for doing this in the Intel Architecture software dev manual. Below are the things I've already tried. For each case I looped over the code 10 billion times and got the wall-time indicated. I'm trying to at least match 4 seconds it takes my non-SIMD approach, which is using just the unary minus operator.
[48 sec]
_mm_sub_ps( _mm_setzero_ps(), vec );
[32 sec]
_mm_mul_ps( _mm_set1_ps( -1.0f ), vec );
[9 sec]
union NegativeMask {
    int   intRep;
    float fltRep;
} negMask;
negMask.intRep = 0x80000000;

_mm_xor_ps( _mm_set1_ps( negMask.fltRep ), vec );

The compiler is gcc 4.2 with -O3. The CPU is an Intel Core 2 Duo.

以上就是Flipping sign on packed SSE floats.的详细内容,更多请关注web前端其它相关文章!

赞(0) 打赏
未经允许不得转载:web前端首页 » CSS3 答疑

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

前端开发相关广告投放 更专业 更精准