Which xarch for SHA extensions on Solaris?

2019-07-12 19:00发布

问题:

Oracle released Sun Studio 12.6 recently. We have a SHA-1 and SHA-256 intrinsic based implementation (for ARM and Intel), and we want to enable the extension on Solaris i86 machines.

The 12.6 manual and -xarch options is available at A.2.115.3 -xarch Flags for x86, but it does not discuss SHA.

Which -xarch option do we use for SHA?

回答1:

If Studio 12.6 doesn't support the SHA instruction set (and I strongly suspect it doesn't since I can't find "SHA" mentioned at all, in any form, in the What's New in the Oracle Developer Studio 12.6 Release documentation), you're out of luck.

Almost.

What you can do is create your own inline assembler functions. See man inline:

inline(4)

Name

inline, filename.il - Assembly language inline template files

Description

Assembly language call instructions are replaced by a copy of their corresponding function body obtained from the inline template (*.il) file.

Inline template files have a suffix of .il, for example:

% CC foo.il hello.c

Inlining is done by the compiler's code generator.

...

Examples

Please review libm.il or vis.il for examples. You can find a version of these libraries that is specific to each supported architecture under the compiler's lib/ directory.

...

An example can be found here (emphasis mine):

Performance Tuning With Sun Studio Compilers and Inline Assembly Code

...

This paper provides a demonstration of how to measure the performance of a critical piece of code. An example using a compiler flag and another example using inline assembly code are provided. The results are compared to show the benefits and differences of each approach.

...

Example 8: Inline Assembly Code for the Iterative Mandelbrot Calculation

Knowing all these facts, the inline code can be written, as shown in Example 8.

.inline mandel_il,0
// x is stored in %xmm0
// y is stored in %xmm1
// 4.0 is stored in %xmm2
// max_int is stored in %rdi

// set registers to zero
  xorps %xmm3, %xmm3
  xorps %xmm4, %xmm4
  xorps %xmm5, %xmm5
  xorps %xmm6, %xmm6
  xorps %xmm7, %xmm7
  xorq %rax, %rax

.loop:
// check to see if u2 - v2 > 4.0
  movss %xmm5, %xmm7
  addss %xmm6, %xmm7
  ucomiss %xmm2, %xmm7
  jp     .exit
  jae    .exit

// v = 2 * v * u + y
  mulss %xmm3, %xmm4
  addss %xmm4, %xmm4
  addss %xmm1, %xmm4
// u = u2 - v2 + x
  movss %xmm5, %xmm3
  subss %xmm6, %xmm3
  addss %xmm0, %xmm3
// u2 = u * u
  movss %xmm3, %xmm5
  mulss %xmm3, %xmm5
// v2 = v * v
  movss %xmm4, %xmm6
  mulss %xmm4, %xmm6

  incl %eax
  cmpl %edi, %eax
  jl .loop

.exit:
// end of mandel_il
.end

It's not hard at all. I had to write a lot of SPARC inline assembler functions for a customer I was consulting for back in the Solaris 8 days, some of them were pretty basic - effectively one-liners to wrap a single instruction. I swear some of them wound up in later versions of the Studio compiler suite (since we were sub-contracted by Sun itself, that's not surprising, nevermind the fact that some of them were blatantly obvious - floor() and ceil(), IIRC, were two of them - and should have been there in the first place...)