pveclib.git
2 days agoMerge pull request #95 from munroesj52/clang-fixes master
Steven Munroe [Tue, 26 May 2020 15:34:19 +0000 (10:34 -0500)]
Merge pull request #95 from munroesj52/clang-fixes

Changes require to compiler/build pveclib with clang.

2 days agoChanges require to compiler/build pveclib with clang, cleanup. 95/head
Steven Munroe [Tue, 26 May 2020 15:28:34 +0000 (10:28 -0500)]
Changes require to compiler/build pveclib with clang, cleanup.

Updates after review. Improve comments and conditions for clang
specific work-arounds

* src/pveclib/vec_f32_ppc.h:
Replace "wf" constraints with "wa".
* src/pveclib/vec_int128_ppc.h [(__clang_major__ >= 11)]:
Reenable vmsumudm for clang 11.
* src/pveclib/vec_int64_ppc.h [(__clang_major__ >= 11)]:
Reenable vmsumudm for clang 11.
[(__clang_major__ < 7)]: Fixup for xxpermdi.

* src/tipowof10.c: Improve comments.
* src/vec_runtime_PWR7.c: Remove #pragma.
* src/vec_runtime_PWR8.c: Ditto.
* src/vec_runtime_PWR9.c: Ditto.

* src/testsuite/arith128_print.c: Improve comments.
* src/testsuite/pveclib_test.c: Improve comments.
Remove <math.h>.

* src/testsuite/vec_bcd_dummy.c: Improve comments.
* src/testsuite/vec_f128_dummy.c: Ditto.
* src/testsuite/vec_f64_dummy.c: Ditto.
* src/testsuite/vec_int128_dummy.c: Ditto.
` * src/testsuite/vec_int32_dummy.c: Ditto.
* src/testsuite/vec_pwr9_dummy.c: Ditto.

* src/testsuite/vec_perf_f32.c: Improve comments.
* src/testsuite/vec_perf_f64.c:: Ditto.
* src/testsuite/vec_perf_i128.c:: Ditto.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 days agoChanges require to compiler/build pveclib with clang.
Steven Munroe [Fri, 15 May 2020 21:05:30 +0000 (16:05 -0500)]
Changes require to compiler/build pveclib with clang.

Some differences/issues had to be addressed.
1) Still using __ibm128 for long double but does not provide __ibm128
type. Alos missing __builtin_unpack_longdouble and
__builtin_pack_longdouble.
2) Does not support Decimal FP at all. Without the _Decimal128 type
can not handle floating Point Register Pairs (FPRp) as required by the
DFP ISA. Seems inline asm is missing the constraints required to access
1st/2nd FPR for either DFP FPRps or the even IBM long double.
3) Float128 requires explicit -mcpu=power9 -mfloat128 to get the
compiler to define __FLOAT128__ and enables the __float128 type.
4) No constant folding for __int128 for example:
  - (__int128 ) 10ll * (__int128 ) 1000000000000000000ll, /* 10**19 */
  - works on GCC for both scalar and vector initializers.
5) Does not support #pagma GCC target. For now disables multi-cpu
dynamic libraries.
6) Clang vec_ld () does not handle vector __int128 (as specified in the
Power Vector Intrinsic Programming Reference). CLang only supports the
original VMX types.
7) Clang8 for -mcpu=power9 does support inline asm for vmsumud which is
a valid instruction and mnemonic. This cripples the power9
implementation vec_int512_ppc.h
8) Code gen bug in clang8/9 for vec_sra vector long long.

* src/decpowof2.c [!defined(__clang__)]: Clang missing
_Decimal128 support so null file.
* src/tipowof10.c [!defined(__clang__)]: Clang does nor do
constant folding for __int128. Have to do constants in hex.

* src/pveclib/vec_bcd_ppc.h [!defined(__clang__)]: Clang
missing _Decimal128 support so null file.
* src/pveclib/vec_char_ppc.h (vec_clzb, vec_popcntb)
[defined((__clang__)]: Use generic built-ins for clang,
use specific builtins for old gcc versions.
* src/pveclib/vec_common_ppc.h [!defined(__clang__)]:
special case vector __int128 for clang otherwise typedef
to vector int.
(union __VEC_U_128)[!defined(__clang__)]: Clang missing
_Decimal128 support so skip __Decimal128 member.
* src/pveclib/vec_f128_ppc.h [defined(__clang__)]: Typedef
__IBM128 but don't re-typedef __float128.
* src/pveclib/vec_f64_ppc.h (vec_all_isfinitef64,
vec_all_isinff64, vec_all_isnanf64, vec_all_isnormalf64,
vec_all_issubnormalf64, vec_all_iszerof64, vec_any_isfinitef64,
vec_any_isinff64, vec_any_isnanf64, vec_any_isnormalf64,
vec_any_issubnormalf64, vec_any_iszerof64, vec_isfinitef64,
vec_isinff64, vec_isnanf64, vec_isnormalf64,
vec_issubnormalf64, vec_iszerof64, vec_iszerof64):
Use "wa" Constraint in asm.
(vec_pack_longdouble, vec_unpack_longdouble
[defined(__clang__)]): Use union for _float128/vector transfer.
* src/pveclib/vec_int128_ppc.h [CONST_VUINT128_QxW]: Cant use
constant folding for __int128 constants.
(vec_addcuq, vec_vaddecuq, vec_vaddeuqm, vec_vadduqm,
vec_addcq, vec_addeq): Use generic built-ins for clang,
use specific builtins for older gcc versions.
(vec_clzq): Use pveclib vec_clzd.
(vec_cmpneuq): Clang is pedantic about long long int.
(vec_msumudm): Use pveclib vec_addudm.
(vec_popcntq): Use pveclib vec_popcntd.
(vec_revbq[defined(__clang__)]): Use generic vec_revb.
(vec_subcuq, vec_subecuq, vec_subecuq, vec_subeuqm,
vec_subuqm): Use generic built-ins for clang,
use specific builtins for older gcc versions.
(vec_vmuleud, vec_vmuloud)
[defined (_ARCH_PWR9) && (__GNUC__ > 5)]:
Exclude old gcc and all Clang from, from using vmsumudm asm.
* src/pveclib/vec_int16_ppc.h (vec_clzh, vec_popcnth)
[defined((__clang__)]: Use generic built-ins for clang,
use specific builtins for old gcc versions.
(vec_revbh[defined(__clang__)]): Use generic vec_revb.
* src/pveclib/vec_int32_ppc.h (vec_clzw, vec_popcntw)
[defined((__clang__)]: Use generic built-ins for clang,
use specific builtins for old gcc versions.
(vec_revbw[defined(__clang__)]): Use generic vec_revb.
* src/pveclib/vec_int64_ppc.h (vec_vaddudm, vec_clzd,
vec_maxsd, vec_maxud, vec_minsd, vec_minud, vec_popcntd)
[defined((__clang__)]: Use generic built-ins for clang,
use specific builtins for old gcc versions.
(vec_revbd[defined(__clang__)]): Use generic vec_revb.
(vec_rldi, vec_sldi, vec_srdi, vec_sradi):
Clang is pedantic about long long int.
(vec_vsubudm) Use generic built-ins for clang,
use specific builtins for older gcc versions.
(vec_vpkudum [!defined(__clang__)]: Skil already defined
for clang.
(vec_vrld, vec_vsld, vec_vsrad, vec_vsrd):
Use generic built-ins for clang, Inline asm for gcc.

* src/testsuite/arith128.h (vec2FRp)[!defined(__clang__)]:
Clang missing _Decimal128 support so skip function.
* src/testsuite/arith128_print.c
(print_dfp128)[!defined(__clang__)]:
Clang missing _Decimal128 support so skip function.
(print_ibm128, print_ibm128x):
Use union to avoid __builtin_unpack_longdouble.
(print_v2b64x, print_v2int64, print_v2xint64)
Clang is pedantic about long long int.
(print_vfloat128x, check_isf128_priv, check_f128bool_priv):
Use __binary128 as across compilers type.
* src/testsuite/arith128_print.h
(print_dfp128)[!defined(__clang__)]:
Clang missing _Decimal128 support so skip function.
(print_vfloat128x, check_isf128, check_f128bool_priv,
check_f128bool, check_f128):
Use __binary128 as across compilers type.
(check_frexptftd_priv, print_dfp128p2)[!defined(__clang__)]:
Clang missing _Decimal128 support so skip function.

* src/testsuite/arith128_test_bcd.c [!defined(__clang__)]:
Clang missing _Decimal128 support so skip test functions.
(test_vec_bcd) [!defined(__clang__)]:
Skip unit test call for clang.
* src/testsuite/arith128_test_char.c: Remove includes
<fenv.h>, <float.h>, <math.h>, not needed.
* src/testsuite/arith128_test_f128.c (test_isinf_signf128,
test_setb_qp, test_signbitf128, test_isinff128, test_isnanf128,
test_isfinitef128, test_isnormalf128, test_issubnormalf128,
test_iszerof128, test_absf128, test_copysignf128,
test_const_f128, test_all_isfinitef128, test_all_isnanf128,
test_all_isinff128, test_all_isnormalf128,
test_all_issubnormalf128, test_all_iszerof128):
Use __binary128 as across compilers type.
* src/testsuite/arith128_test_i128.c: Remove includes
<fenv.h>, <float.h>, <math.h>, not needed
(test_time_i128  [!defined(__clang__)]:
Clang missing _Decimal128 support so skip timed functions.
* src/testsuite/arith128_test_i16.c: Remove includes
<fenv.h>, <float.h>, <math.h>, not needed.
* src/testsuite/arith128_test_i32.c: Remove includes
<fenv.h>, <float.h>, <math.h>, not needed.
* src/testsuite/arith128_test_i64.c: Remove includes
<fenv.h>, <float.h>, <math.h>, not needed.
* src/testsuite/pveclib_test.c [!defined(__clang__)]:
Enabling clang -mfloat128 causes an error condition in
stdlib.h and math.h. Skip include and #define EXIT_SUCCESS.

* src/testsuite/vec_bcd_dummy.c [!defined(__clang__)]:
Skip all unit tests.
* src/testsuite/vec_dummy_main.c [!defined(__clang__)]:
Enabling clang -mfloat128 causes an error condition in
stdlib.h and math.h. Skip include and #define EXIT_SUCCESS.
* src/testsuite/vec_f128_dummy.c [!defined(__clang__)]:
Enabling clang -mfloat128 causes an error condition in
stdlib.h and math.h. Skip include math.h.
(test_gcc_f128_signbit) [!defined(__clang__)]: Skip function
because math functions are skipped above.
src/testsuite/vec_f32_dummy.c [!defined(__clang__)]:
Enabling clang -mfloat128 causes an error condition in
stdlib.h and math.h. Skip include math.h.
(test_fpclassify_f32) [!defined(__clang__)]: Skip function
because math functions are skipped above.
* src/testsuite/vec_f64_dummy.c [!defined(__clang__)]:
Enabling clang -mfloat128 causes an error condition in
stdlib.h and math.h. Skip include math.h.
(test_fpclassify_f64) [!defined(__clang__)]: Skip function
because math functions are skipped above.
(test_ibm128_vf64_asm, test_vf64_ibm128_asm)
[!defined(__clang__)]: Skip because the constraints
associated with access to 1st/2nd FPR of are not working.
* src/testsuite/vec_int128_dummy.c (test_vec_load_store)
[!defined(__clang__)]: load of vector long long generates
bad code. For clang cast to vui32_t* which generates good code.
(example_print_vint128, __test_clzq):
Clang is pedantic about long long int.
* src/testsuite/vec_int32_dummy.c (__test_mrgew)
[!defined(__clang__)]: Skip compile test of specific built-ins.
* src/testsuite/vec_pwr9_dummy.c[!defined(__clang__)]:
Enabling clang -mfloat128 causes an error condition in
stdlib.h and math.h. Skip include math.h.
(test_fpclassify_f32_PWR9, test_vec_bcdmulh_PWR9,
example_longbcdct_10e32_PWR9) [!defined(__clang__)]:
Skip compile test due to missing support in clang.
(__test_madd512x512a512_PWR9) [!defined(__clang__)]:
Skip register variables.

* src/testsuite/vec_perf_f32.c: Remove includes <math.h>,
not needed.
(test_fpclassify_f32)[!defined(__clang__)]: Disable extern
and timed for clang due to the math.h bug.
* src/testsuite/vec_perf_f64.c: Remove includes <math.h>,
not needed. Clang is pedantic about long long int.
(test_fpclassify_f64)[!defined(__clang__)]: Disable extern
and timed tests for clang due to the math.h bug.
* src/testsuite/vec_perf_i128.c:  Remove includes
<fenv.h>, <float.h>, <math.h>, not needed.
[!defined(__clang__)]:
Enabling clang -mfloat128 causes an error condition in
stdlib.h and math.h. Skip include stdlib.h.
(example_longbcdcf_10e32, timed_ctmaxdouble_10e32)
[!defined(__clang__)]: Disable extern
and timed tests for missing _Decimal128 support.
* src/testsuite/vec_perf_i128.h
(example_longbcdcf_10e32, timed_ctmaxdouble_10e32)
[!defined(__clang__)]:Disable extern
due to missing _Decimal128 support.
* src/testsuite/vec_perf_i512.c:  Remove includes <stdlib.h>,
<fenv.h>, <float.h>, <math.h> not needed.

* src/vec_runtime_DYN.c [VEC_DYN_RESOLVER]: Correct macro.
* src/vec_runtime_PWR7.c [!defined(__clang__)]: Clang does not
support #pagma GCC target so null build for clang.
* src/vec_runtime_PWR9.c [!defined(__clang__)]: Clang does not
support #pagma GCC target so null build for clang.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
5 weeks agoMerge pull request #93 from munroesj52/int512-s2p4
Steven Munroe [Sat, 18 Apr 2020 23:53:03 +0000 (18:53 -0500)]
Merge pull request #93 from munroesj52/int512-s2p4

Optimizations for multiple quadword precision integer support (multi-quadword).

5 weeks agoOptimizations for multiple quadword precision integer support 93/head
Steven Munroe [Sat, 18 Apr 2020 23:51:00 +0000 (18:51 -0500)]
Optimizations for multiple quadword precision integer support
(multi-quadword).

Cleanup per review comments.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
5 weeks agoMerge pull request #92 from munroesj52/int512-s2p3
Steven Munroe [Sat, 18 Apr 2020 18:27:02 +0000 (13:27 -0500)]
Merge pull request #92 from munroesj52/int512-s2p3

Optimizations for multiple quadword precision integer support (quadwo…

5 weeks agoOptimizations for multiple quadword precision integer support (quadword). 92/head
Steven Munroe [Sat, 18 Apr 2020 18:23:04 +0000 (13:23 -0500)]
Optimizations for multiple quadword precision integer support (quadword).

Cleanup per review comments.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
2 months agoOptimizations for multiple quadword precision integer support
Steven Munroe [Fri, 20 Mar 2020 22:14:22 +0000 (17:14 -0500)]
Optimizations for multiple quadword precision integer support
(multi-quadword).

Added multi-quadword multiply-add operations optimized for (POWER7/8/9).
Used them to optimize larger multi-quadword multiply while carefully
use COMPILER_FENCE to avoid register presure and spill code.
Added appropriate unit, compile, anf performance tests.

* src/pveclib/vec_int512_ppc.h General doxygen text cleanup.
[i512_security_issues_0_0_2]: Remove todos from subsection.
[i512_Endian_issues_0_0]: Remove todos from section.
(__VEC_U_512x1): Add x2 field to union in support of madd.
(__VEC_U_4096x512): Add x4 field to union in support of add512.
[__VEC_EXPLICITE_FENCE_NOPS__]: Enable explicite NOPS to make
compiler fences visible.
(vec_mul256x256_inline): Add doxygen note.
Leverage madduq/madd2uq.
(vec_mul512x128_inline): Add doxygen note. Leverage madduq.
(vec_madd512x128a128_inline, ec_madd512x128a512_inline,
vec_madd512x128a128a512_inline): New operation.
(vec_mul512x512_inline): Add doxygen note.
Leverage vec_madd512x128a512_inline.
(vec_madd512x512a512_inline): New operation.
(vec_mul128x128, vec_mul256x256): Update latency table.
(vec_mul1024x1024, vec_mul2048x2048):
Add note and update latency table.
(vec_mul128_byMN, vec_mul512_byMN) New library functions.
(vec_madd512x128a128, vec_madd512x128a512,
vec_madd512x128a128a512, vec_madd512x512a512,
vec_mul128_byMN, vec_mul512_byMN):
New platform qualified externs.

src/vec_int512_runtime.c [__VEC_EXPLICITE_FENCE_NOPS__]:
Define COMPILE_FENCE, COMPILE_FENCE1, COMPILE_FENCE2,
COMPILE_FENCE3, COMPILE_FENCE10, COMPILE_FENCE11,
COMPILE_FENCE12, COMPILE_FENCE13, COMPILE_FENCE14,
COMPILE_FENCE15, COMPILE_FENCE16, COMPILE_FENCE17,
COMPILE_FENCE20, COMPILE_FENCE21, COMPILE_FENCE22,
COMPILE_FENCE23, COMPILE_FENCE24, COMPILE_FENCE25,
COMPILE_FENCE26, COMPILE_FENCE27, COMPILE_FENCE28,
COMPILE_FENCE29):
(vec_mul256x256, vec_mul512x128): Optimized for static
implementation for PWR8/9.
(vec_madd512x128a128, vec_madd512x128a512,
vec_madd512x128a128a512): New function.
(vec_mul512x512): Optimized for static
implementation for PWR8/9.
(vec_madd512x512a512):  New function.
(vec_mul1024x1024 [__VEC_USE_RESTRICT__]):
Disable restrict for now.
(vec_mul1024x1024):
Inline optimized PWR8/9 512-bit mul/madd implementions.
(vec_mul2048x2048 [__VEC_USE_RESTRICT__]):
Disable restrict for now.
(vec_mul2048x2048):
Inline optimized PWR8/9 512-bit mul/madd implementions.
(vec_mul128_byMN, vec_mul512_byMN): New library functions.

* src/vec_runtime_DYN.c ([__ORDER_BIG_ENDIAN__]
vec_mul128_byMN_PWR7, vec_mul512_byMN_PWR7): New externs.
([__ORDER_LITTLE_ENDIAN__] vec_mul128_byMN_PWR9,
vec_mul512_byMN_PWR9): New externs.
(vec_mul128_byMN_PWR8, vec_mul512_byMN_PWR8): New externs.
(resolve_vec_mul128_byMN, resolve_vec_mul512_byMN):
New resolver.
(vec_mul128_byMN, vec_mul512_byMN): New IFUNC symbol.

* src/testsuite/arith128_test_i512.c
(vec512_ten2048_13, vec512_ten2048_12, vec512_ten2048_11,
vec512_ten2048_10. vec512_ten2048_9, vec512_ten2048_8,
vec512_ten2048_7, vec512_ten2048_6, vec512_ten2048_5,
vec512_ten2048_4, vec512_ten2048_3, vec512_ten2048_2,
vec512_ten2048_1, vec512_ten2048_0): Static const for 10^2048.
(test_time_i512): Add timed Loops for timed_mul128x128,
timed_mul256x256. timed_mul512x512by8, timed_mul1024x1024by8,
timed_mul2048x2048by8, timed_mul2048x2048_MN, and
timed_mul4096x4096_MN.
(test_mul512x128_MN, test_madd512x128, test_mul512x512_MN,
test_mul2048x2048_MN): New unit tests.
(test_vec_i512): Add calls to unit tests test_madd512x128,
test_mul512x128_MN, test_mul512x512_MN, and
test_mul2048x2048_MN.
* src/testsuite/arith128_test_i512.h
(vec512_ten2048_13, vec512_ten2048_12, vec512_ten2048_11,
vec512_ten2048_10. vec512_ten2048_9, vec512_ten2048_8,
vec512_ten2048_7, vec512_ten2048_6, vec512_ten2048_5,
vec512_ten2048_4, vec512_ten2048_3, vec512_ten2048_2,
vec512_ten2048_1, vec512_ten2048_0): Add const externs.
(test_mul1024x1024): Add function extern.

* src/testsuite/vec_perf_i512.c (c_zero, c_one, c_ten,
c_hundred, c_10k, c_100m, ten_64h,  ten_64l):
New static consts.
(timed_mul128x128, timed_mul256x256. timed_mul512x512by8,
timed_mul1024x1024by8, timed_mul2048x2048by8,
timed_mul2048x2048_MN, timed_mul4096x4096_MN):
New timed functions.
* src/testsuite/vec_perf_i512.h (timed_mul128x128,
timed_mul256x256, timed_mul512x512by8, timed_mul1024x1024by8,
timed_mul2048x2048by8, timed_mul2048x2048_MN,
timed_mul4096x4096_MN): Externs for timed functions.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
2 months agoOptimizations for multiple quadword precision integer support (quadword).
Steven Munroe [Fri, 13 Mar 2020 00:37:02 +0000 (19:37 -0500)]
Optimizations for multiple quadword precision integer support (quadword).

Added multiply-add operations to quadword (POWER7/8/9)
in support of multiply quadword.
Added appropriate unit and compile tests.

* src/pveclib/vec_int128_ppc.h [int128_arith_facts_0]:
Doxygen section "Some facts about fixed precision integers".
(vec_vmaddeud, vec_vmaddoud, vec_vmsumeud,vec_vmsumoud):
Add forward declares.
(vec_msumudm): Improve doxygen description.
(vec_muludm [_ARCH_PWR8]): Use endian agnostic vec_vmulouw.
(vec_mulhuq [_ARCH_PWR8]): Use endian agnostic operations and
leverage multiply-add forms.
(vec_mulhuq [_ARCH_PWR7]): Leverage multiply-add forms.
(vec_mulluq [_ARCH_PWR8]): Use endian agnostice operations and
leverage multiply-add forms.
(vec_mulluq [_ARCH_PWR7]): Leverage multiply-add forms.
(vec_muludq): Update doxygen with improved cycle count.
(vec_muludq [_ARCH_PWR9]): Improved sequence that allows more
instruction overlap.
(vec_muludq [_ARCH_PWR8]): Use endian agnostic operations and
leverage multiply-add forms. Improved sequence instruction
overlap.
(vec_muludq [_ARCH_PWR7]): Leverage multiply-add forms.
Improved sequence instruction overlap.
(vec_madduq, vec_madd2uq): New Operations.
(vec_vmuleud): Update doxygen POWER9 cycles count.
(vec_vmuleud [_ARCH_PWR8]): Use endian agnostic vec_vmulouw.
(vec_vmaddeud, vec_vmadd2eud): New Operations.
(vec_vmuloud [_ARCH_PWR8]): Use endian agnostic vec_vmulouw.
(vec_vmaddoud, vec_vmadd2oud, vec_vmsumeud, vec_vmsumeud,
vec_vmsumoud): New Operations.

* src/testsuite/arith128_test_i128.c (test_madduq):
New unit test.
(test_vec_i128): Add test_madduq to test driver.

*src/testsuite/arith128_test_i64.c (test_vmaddoud): Fix pasto
in title printf.

* src/testsuite/vec_int128_dummy.c:
(__test_vmsumoud, __test_vmsumeud, __test_vmaddoud,
__test_vmadd2oud, __test_vmaddeud, __test_vmadd2eud):
New compile tests.
(__test_madduq, __test_madd2uq, __test_madduq_x):
New compile tests.
(test_mul128_MN): New compile tests.

* src/testsuite/vec_pwr9_dummy.c:
(__test_vmsumoud_PWR9, __test_vmsumeud_PWR9,
__test_vmaddoud_PWR9, __test_vmaddeud_PWR9,
__test_vmadd2oud_PWR9, __test_vmadd2oud_x_PWR9,
__test_vmadd2eud_PWR9, __test_vmadd2eud_x_PWR9,
__test_vmadd2ud_PWR9, __test_vmadd2ud_x_PWR9):
New compile tests.
(__test_muludq_x_PWR9, __test_muludq_y_PWR9,
__test_muludq_z_PWR9, __test_madduq_PWR9, __test_madduq_x_PWR9,
__test_madduq_y_PWR9, __test_madd2uq_PWR9,
__test_madduq2_PWR9): New compile tests.
(test_mul128_MN_PWR9): New compile test.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
2 months agoMerge pull request #91 from munroesj52/int512-s2p2
Steven Munroe [Wed, 11 Mar 2020 15:45:43 +0000 (10:45 -0500)]
Merge pull request #91 from munroesj52/int512-s2p2

Optimizations for multiple quadword precision integer support (doubleword).

2 months agoMerge pull request #90 from munroesj52/int512-s2p1
Steven Munroe [Wed, 11 Mar 2020 15:40:54 +0000 (10:40 -0500)]
Merge pull request #90 from munroesj52/int512-s2p1

Optimizations for multiple quadword precision integer support.

2 months agoOptimizations for multiple quadword precision integer support. 90/head
Steven Munroe [Wed, 11 Mar 2020 15:37:14 +0000 (10:37 -0500)]
Optimizations for multiple quadword precision integer support.
Doxygen text changes to resolve comments.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
2 months agoOptimizations for multiple quadword precision integersupport. 91/head
Steven Munroe [Mon, 9 Mar 2020 14:32:08 +0000 (09:32 -0500)]
Optimizations for multiple quadword precision integersupport.

Added multiply-add operations to doubleword (POWER8/9)
in support of multiply quadword multiply.
Used copybrief so that doubleword operations, that are
implemented in other headers, appear in aphabetical order with other
doubleword operations.
Added appropriate unit and compile tests.

* src/pveclib/vec_int64_ppc.h (vec_msumudm, vec_muleud,
vec_mulhud, vec_muloud, vec_muludm): Doxygen copybrief from
vec_int128_ppc.h.
(vec_vmadd2eud, vec_vmaddeud, vec_vmadd2oud, vec_vmaddoud,
vec_vmuleud, vec_vmuloud, vec_vmsumeud, vec_vmsumoud):
Doxygen copybrief from vec_int128_ppc.h.
(vec_vmaddeuw, vec_vmadd2euw, vec_vmaddouw, vec_vmadd2ouw):
New operations.
(vec_vmsumuwm): Doxygen text edits.

* src/testsuite/arith128_test_i64.c (test_vmaddeud,
test_vmaddoud): New Unit test.
(test_vmuleud): Add tests to test driver.

* src/testsuite/vec_int64_dummy.c (__test_vmaddeuw,
__test_vmaddouw, __test_vmadd2euw, __test_vmadd2ouw,
__test_vmadduw): New compile tests.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
2 months agoOptimizations for multiple quadword precision integersupport.
Steven Munroe [Fri, 6 Mar 2020 02:06:00 +0000 (20:06 -0600)]
Optimizations for multiple quadword precision integersupport.

Added multiply-add operations to halfword (POWER7) and word (POWER8)
in support of multiply quadword multiply.
Used copybrief to that halfword and word operations, that are
implemented in other headers, appear in aphabetical order with other
halfword and word operations.
Added appropriate unit and compile tests.

* doc/pveclib-doxygen-pveclib.doxy [PAPER_TYPE]:
Change to letter.
* doc/pveclibmaindox.h [mainpage_ref_docs]: Rework references
to avoid line overflows in generated PDF.

* src/pveclib/vec_bcd_ppc.h (vec_bcdcfsq): Doxygen edits.
(vec_unpack_Decimal128):  Doxygen edits.
* src/pveclib/vec_int16_ppc.h: General Doxygen edits.
(i16_recent_additions): new doxygen section.
(vec_vmaddeuh, vec_vmaddouh): New operation.
* src/pveclib/vec_int32_ppc.h
(i16_recent_additions): new doxygen section.
(vec_vmuleuw, vec_vmulouw): Add forward declare.
(vec_mrgahw, vec_mrgalw): Add doxygen note.
(vec_muleuw, vec_mulouw): Simplify implementation using
vec_vmuleuw, vec_vmulouw.
(vec_vmadd2euw, vec_vmadd2ouw, vec_vmaddeuw, vec_vmaddouw,
vec_vmsumuwm): Doxygen copybrief from vec_int64_ppc.h.
(vec_vmuleuw, vec_vmulouw): New operations.

* src/testsuite/arith128_test_i16.c (test_vmadduh):
New Unit test.
(test_vec_i16): Add test_vmadduh to test driver.

* src/testsuite/vec_int16_dummy.c
(__test_vmaddeuh, __test_vmaddouh, __test_vmaddouh_alt,
__test_vadduqm_alt7, __test_vaddcuq_alt7, __test_vmadduh_alt7):
New compile tests.
* src/testsuite/vec_int32_dummy.c
(__test_vmuleuw, __test_vmulouw): New compile tests.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
4 months agos/TIME_10_ITERATION/TIMING_ITERATIONS/
Paul A. Clarke [Wed, 18 Dec 2019 16:57:58 +0000 (10:57 -0600)]
s/TIME_10_ITERATION/TIMING_ITERATIONS/

Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
5 months agoAdd vec_int512 files for multiple precision quadword integer support.
Steven Munroe [Sun, 15 Dec 2019 23:10:03 +0000 (17:10 -0600)]
Add vec_int512 files for multiple precision quadword integer support.

* Makefile.in: Regenerate.
* configure: Ditto.
* src/Makefile.in: Ditto.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
5 months agoAdd vec_int512 files for multiple precision quadword integer support.
Steven Munroe [Sun, 15 Dec 2019 23:00:35 +0000 (17:00 -0600)]
Add vec_int512 files for multiple precision quadword integer support.

Some how these changes went missing from the previous commits.
Add compile and unit tests required for int512.
Add new print/check functions to support int512 unit test.
Add compile and unit tests for int512.
Correct some compile only examples that turned out to be incorrect.
Add new print/check functions to support int512 unit test.

* src/testsuite/arith128_test_i512.c: New File.
* src/testsuite/arith128_test_i512.h: Ditto.
* src/testsuite/vec_int512_dummy.c: Ditto.
* src/testsuite/vec_perf_i512.c: Ditto.
* src/testsuite/vec_perf_i512.h: Ditto.

* src/testsuite/vec_int128_dummy.c
(test_vec_mul10uq): Attribute target power9.
(test_mul4uq): Correct implementation.
(example_qw_convert_decimal [CONST_VUINT128_Qx19d]):
Replace use of CONST_VINT128_DW.
[CONST_VUINT128_Qx16d]: replace use of CONST_VINT128_DW.
* src/testsuite/vec_pwr9_dummy.c:
Include <pveclib/vec_int512_ppc.h>
(test_mul4uq_PWR9): Correct implementation.
(test_vec_mul1024x1024_PWR9: New function.

* src/testsuite/arith128_print.h: Include
<pveclib/vec_int512_ppc.h>
(print_vint512x, print_vint640x, print_vint512x_prod,
print_vint256x_prod, print_vint256_prod, check_vint512_priv):
Define extern.
(check_vint512): New inline function.
* src/testsuite/arith128_print.c: Include <string.h>.
(print_vuint512x, print_vint512x, print_vint640x,
print_vint512x_prod, print_vint256x_prod, print_vint256_prod,
check_vint512_priv): New functions.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
5 months agoMerge pull request #87 from munroesj52/int512-p5
Steven Munroe [Sun, 15 Dec 2019 20:01:19 +0000 (14:01 -0600)]
Merge pull request #87 from munroesj52/int512-p5

Add vec_int512 files for multiple precision quadword integer support.

5 months agoMerge pull request #85 from munroesj52/int512-p3
Steven Munroe [Sun, 15 Dec 2019 20:00:19 +0000 (14:00 -0600)]
Merge pull request #85 from munroesj52/int512-p3

Fix errors associaed with trying to use vector multiply word on P7.

5 months agoMerge pull request #84 from munroesj52/int512-p2
Steven Munroe [Sun, 15 Dec 2019 19:59:50 +0000 (13:59 -0600)]
Merge pull request #84 from munroesj52/int512-p2

Add vec_int512 files for multiple precision quadword integer support.

5 months agoAdd vec_int512 files for multiple precision quadword integer support. 84/head
Steven Munroe [Sun, 15 Dec 2019 03:48:34 +0000 (21:48 -0600)]
Add vec_int512 files for multiple precision quadword integer support.
Additional doxygen text updates after review.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
5 months agoFix errors associaed with trying to use vector multiply word on P7. 85/head
Steven Munroe [Sat, 14 Dec 2019 18:46:35 +0000 (12:46 -0600)]
Fix errors associaed with trying to use vector multiply word on P7.
Doxygen test update after review.
Fixes #82.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
5 months agoAdd vec_int512 files for multiple precision quadword integer support.
Steven Munroe [Sat, 14 Dec 2019 18:16:14 +0000 (12:16 -0600)]
Add vec_int512 files for multiple precision quadword integer support.
Add the runtime library source supporting static and dynamic linkage.
Updated in responce to review comments from Paul Clark.

* src/pveclib/vec_int512_ppc.h: Doxygen text updates after
review.

* src/vec_int512_runtime.c: Make restrict comments consistent.
* src/vec_runtime_DYN.c: Doxygen text updates after
review. Improve comments for which platforms to enable.
* src/vec_runtime_PWR7.c: Improve comments for condition PWR7
support.
* src/vec_runtime_PWR9.c: Improve comments for condition PWR9
support.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
5 months agoAdd vec_int512 files for multiple precision quadword integer support. 87/head
Steven Munroe [Fri, 13 Dec 2019 15:18:41 +0000 (09:18 -0600)]
Add vec_int512 files for multiple precision quadword integer support.

Enhanse configue example in README.md to include ="-O3 -mcpu=power8".
Update build Makefile for int512, headers, runtime, unit tests and
performance tests. Update Unit test driver to call int512 unit and
performance tests.

* README.md: Update example to
'./configure  CFLAGS="-O3 -mcpu=power8"'

* src/Makefile.am (libpvec_la_SOURCES): Add vec_runtime_PWR9.c,
vec_runtime_PWR8.c, vec_runtime_PWR7.c, vec_runtime_DYN.c.
(libvecdummy_la_SOURCES): Add testsuite/vec_int512_dummy.c.
(pveclibinclude_HEADERS): Add pveclib/vec_int512_ppc.h.
(pveclib_test_SOURCES): Add testsuite/vec_perf_i512.c,
testsuite/vec_perf_i512.h, testsuite/arith128_test_i512.c,
testsuite/arith128_test_i512.h.
* src/Makefile.in: Regenerate.

* src/testsuite/pveclib_test.c: Include
<testsuite/arith128_test_i512.h>.
(main): Add call to test_vec_i512.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
5 months agoFix errors associaed with trying to use vector multiply word on P7.
Steven Munroe [Tue, 10 Dec 2019 22:46:06 +0000 (16:46 -0600)]
Fix errors associaed with trying to use vector multiply word on P7.

Initially this was from missing multiply signed word implementation
for P7. But also hit a compiler bug where some built-ins where disabled
when I mixed -mcpu= and #pragma GCC target with different targets.
Specifically -mcpu=power7 and #pragma GCC target ("cpu=power8").
This seems to be Endian specific as P7 is BE only while P8 can be
either. Anyway. compiler folks are busy and will not spend cycles on a
bug that only impacts P7. So I added not conditionals to avoid this.
This should resolve issue #82.

* src/pveclib/vec_f64_ppc.h (vec_all_isfinitef64,
vec_all_isinff64, vec_all_isnanf64, vec_all_isnormalf64,
vec_all_issubnormalf64, vec_all_iszerof64
[__ORDER_LITTLE_ENDIAN__]): Test data built-ins only available
for LE.

* src/pveclib/vec_int128_ppc.h (vec_cmpsq_all_eq,
vec_cmpsq_all_ne, vec_cmpuq_all_eq, vec_cmpuq_all_ne
[__ORDER_LITTLE_ENDIAN__]):
Generic vec_all_eq() for vector long only available for LE.

* src/pveclib/vec_int32_ppc.h
(vec_muleuw, vec_mulouw, vec_srawi): Forward reference.
(vec_mulesw, vec_mulosw):
Additional doxygent text.
(vec_mulesw, vec_mulosw[! _ARCH_PWR8]):
New implementation for POWER7
(vec_muleuw, vec_mulouw):
Additional doxygent text.

* src/pveclib/vec_int64_ppc.h
(vec_cmpsd_all_eq, vec_cmpsd_all_gt, vec_cmpsd_all_ne,
vec_cmpsd_any_eq, vec_cmpsd_any_ge, vec_cmpsd_any_gt,
vec_cmpsd_any_ne, vec_cmpud_all_eq, vec_cmpud_all_ge,
vec_cmpud_all_gt, vec_cmpud_all_ne, vec_cmpud_any_eq,
vec_cmpud_any_ge, vec_cmpud_any_gt, vec_cmpud_any_ne
[__ORDER_LITTLE_ENDIAN__]):
Generic vec_[all|any]_[eq|gt|ge|ne]() for vector long only
available for LE.

* src/testsuite/arith128_print.h (check_v2b64x):
Use vec_cmpud_any_ne.
* src/testsuite/arith128_test_i32.c (test_mulesw, test_mulosw):
New functions.
(test_muleuw): Add calls to test_mulesw and test_mulosw.

* src/testsuite/vec_f32_dummy.c [defined vec_float2  &&
(__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)]:
Restrict float* built-in compile tests to LE.

* src/testsuite/vec_int32_dummy.c (__test_mulesw,
__test_mulosw, __test_abssw, __test_mullsw): New functions.

* src/testsuite/vec_int64_dummy.c
[(__GNUC__ > 7)  && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)]:
Restrict merge even/odd built-in compile tests to LE.

* src/testsuite/vec_pwr9_dummy.c (vec_all_isfinitef64,
vec_all_isinff64, vec_all_isnanf64, vec_all_isnormalf64,
vec_all_issubnormalf64, vec_all_iszerof64
[__ORDER_LITTLE_ENDIAN__]): Test data built-ins only available
for LE.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
6 months agoAdd vec_int512 files for multiple precision quadword integer support.
Steven Munroe [Sat, 23 Nov 2019 16:11:27 +0000 (10:11 -0600)]
Add vec_int512 files for multiple precision quadword integer support.
Add the runtime library source supporting static and dynamic linkage.

* src/pveclib/vec_int512_ppc.h: Doxygen text updates after
review.
* README.md: Add -O3 and -mcpu to configure example.

* src/vec_int512_runtime.c: New file.
* vec_runtime_PWR7.c: New file.
* src/vec_runtime_PWR8.c: New file.
* src/vec_runtime_PWR9.c: New file.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
6 months agoMerge pull request #83 from munroesj52/int512-p1
Steven Munroe [Wed, 20 Nov 2019 15:01:49 +0000 (09:01 -0600)]
Merge pull request #83 from munroesj52/int512-p1

Add vec_int512 files for multiple precision integer support.

6 months agoAdd vec_int512 files for multiple precision integer support. 83/head
Steven Munroe [Wed, 20 Nov 2019 14:58:38 +0000 (08:58 -0600)]
Add vec_int512 files for multiple precision integer support.
Minor doxygent test update.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
6 months agoAdd vec_int512 files for multiple precision integer support.
Steven Munroe [Tue, 19 Nov 2019 22:15:33 +0000 (16:15 -0600)]
Add vec_int512 files for multiple precision integer support.
Resolve the CONST_VUINT128_QxW controversy.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
6 months agoAdd vec_int512 files for multiple precision integer support.
Steven Munroe [Tue, 19 Nov 2019 01:13:07 +0000 (19:13 -0600)]
Add vec_int512 files for multiple precision integer support.
Change to resolve comments from Paul Clark andf Alex Scheel.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
6 months agoAdd vec_int512 files for multiple precision integer support.
Steven Munroe [Thu, 14 Nov 2019 19:01:39 +0000 (13:01 -0600)]
Add vec_int512 files for multiple precision integer support.
Header files ands doxygen inclues for new text.

* doc/pveclib-doxygen-pveclib.doxy (INPUT): Add
$(SRCDIR)/src/pveclib/vec_int512_ppc.h
* doc/pveclibmaindox.h: Mention vec_int512_ppc.h in intro.

* src/pveclib/vec_int128_ppc.h (int128_const_0_0_1):
Add doxygen subsection Quadword Integer Constants.
(int128_const_0_0_2): Add doxygen subsection for Support for
Quadword Integer Constants.
(int128_examples_0_1)): Update test_mul4uq example to match
current implementation.
(CONST_VUINT128_QxW,  CONST_VUINT128_QxD, CONST_VUINT128_Qx19d,
CONST_VUINT128_Qx18d, CONST_VUINT128_Qx16d): Defined quadword
helper macros.

* src/pveclib/vec_int512_ppc.h: New File.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
10 months agoMerge pull request #81 from munroesj52/fedora-one0twox v1.0.2y v1.0.3
Steven Munroe [Thu, 25 Jul 2019 14:38:30 +0000 (09:38 -0500)]
Merge pull request #81 from munroesj52/fedora-one0twox

Cleanup rpmlint errors. Remove pveclib.spec.

10 months agoCleanup rpmlint errors. Remove pveclib.spec. 81/head
Steven Munroe [Thu, 25 Jul 2019 14:31:03 +0000 (09:31 -0500)]
Cleanup rpmlint errors. Remove pveclib.spec.
Getting "failed, can't parse" because the source must match the build.
Also the Change log data was miss formated.
We will use the pveclib.spec from open-power-sdk/fedora from now on.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
10 months agoMerge pull request #80 from munroesj52/fedora-one0three v1.0.2x
Steven Munroe [Wed, 24 Jul 2019 17:35:03 +0000 (12:35 -0500)]
Merge pull request #80 from munroesj52/fedora-one0three

Addtional work to clean up fedora rpmlint.

10 months agoAddtional work to clean up fedora rpmlint. 80/head
Steven Munroe [Mon, 22 Jul 2019 19:37:13 +0000 (14:37 -0500)]
Addtional work to clean up fedora rpmlint.
Clean up spec file.
Fix breakage in doxygen from adding ./pveclib to include path.
Eliminate deprecated macros from configure.ac

* pveclib.spec [BuildRequires]: Add gcc-c++.
[%configure] Remove duplicate.
[%post, %postun] Depreacted.
[%license]: Add COPYING.

* pveclib-doxygen-pveclib.doxy [INPUT]: The full path is now
(SRCDIR)/src/pveclib/

* configure.ac [AC_INIT] Bump version to 1.0.3
AC_CANONICAL_SYSTEM is deprecated,
Replace with AC_CANONICAL_TARGET.
AM_PROG_CC_C_O is deprecated, remove.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
10 months agoMerge pull request #79 from munroesj52/fedora-one0two
Steven Munroe [Fri, 12 Jul 2019 14:31:16 +0000 (09:31 -0500)]
Merge pull request #79 from munroesj52/fedora-one0two

Addtional work to clean up fedora rpmlint.

10 months agoMerge branch 'fedora-one0two' of https://github.com/munroesj52/pveclib into fedora... 79/head
Steven Munroe [Fri, 12 Jul 2019 14:18:11 +0000 (09:18 -0500)]
Merge branch 'fedora-one0two' of https://github.com/munroesj52/pveclib into fedora-one0two

merge updates from devel laptop system

10 months agoAddtional work to clean up fedora rpmlint.
Steven Munroe [Fri, 12 Jul 2019 14:14:07 +0000 (09:14 -0500)]
Addtional work to clean up fedora rpmlint.
Separate libpvec.a from devel by creating static rpm.
Other minor updates.

* pveclib.spec: Move global description above package devel.
Spelling channge s/intrinsics/built-ins/.
(%package static): Add.
(%files devel): Add %doc README.md. Remove libpvec.a.
(%files static): Add.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
10 months agoResolved the License, Release and ChangeLog issues.
Steven Munroe [Wed, 10 Jul 2019 17:45:00 +0000 (12:45 -0500)]
Resolved the License, Release and ChangeLog issues.

* pveclib.spec (%description): s/intrinsics/built-ins/.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
10 months agoAddtional work to clean up fedora rpmlint.
Steven Munroe [Tue, 9 Jul 2019 22:20:48 +0000 (17:20 -0500)]
Addtional work to clean up fedora rpmlint.
Resolved the License, Release and ChangeLog issues.
Copied the LICENSE file over COPYING to elliminate the symlink.
Updated Makefile.am and regenerated.
Forced a dependency on libc to elliminate the
shared-lib-without-dependency-information Error.
Updated src/Makefile.am and regenerated.
Include most of Antonio's <anto.trande@gmail.com> suggestions.

* COPYING: Copy LICENSE over COPYING, Remove symlink.
* Makefile.am (dist_license_DATA): Change to LICENSE.
(dist_doc_DATA): Restore COPYING to list.
* Makefile.in: Regenerate from automake.
* aclocal.m4: Regenerate from aclocal.
* configure: Regenerate from autoconf.

* src/Makefile.am: Add libpvec_la_LIBADD = -lc
* src/Makefile.in: Regenerate from automake.

* pveclib.spec (Release): 1.
(License): ASL 2.0.
(%description): s/intrinsics/intrinsic/.
(%prep): Use %autosetup.
(%build): Use %make_build.
(%install): Use %make_install.
  Move %check after %make_install.
(%license): Change to LICENSE.
(%doc): Insert COPYING.
(%files devel): Simplify %{_includedir}/pveclib.

Signed-off-by: Steven Munroe <sjmunroe@homer53.localdomain>
10 months agoMerge pull request #78 from munroesj52/fedora-spec
Steven Munroe [Mon, 1 Jul 2019 17:58:55 +0000 (12:58 -0500)]
Merge pull request #78 from munroesj52/fedora-spec

Add fedora RPM spec file to master.

10 months agoAdd fedora RPM spec file to master. 78/head
Steven Munroe [Mon, 1 Jul 2019 17:54:51 +0000 (12:54 -0500)]
Add fedora RPM spec file to master.
This is required for Fedora project submittion process.

* pveclib.spec: New file.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
11 months agoMerge pull request #77 from munroesj52/fedora-one0one v1.0.2
Steven Munroe [Thu, 27 Jun 2019 23:31:55 +0000 (18:31 -0500)]
Merge pull request #77 from munroesj52/fedora-one0one

Fedora one0one

11 months agoFix the issue with @INC_AMINCLUDE@ failing, still. 77/head
Steven Munroe [Wed, 26 Jun 2019 17:50:43 +0000 (12:50 -0500)]
Fix the issue with @INC_AMINCLUDE@ failing, still.
Clean up rpmbuild failures:
Add support for licensedir and install COPYING there.
Cleanup aminclude.am from builddir for distclean.
Run automake again.

* Makefile.am (licensedir): Define.
(dist_license_DATA): Add COPYING to list.
(dist_doc_DATA): Remove COPYING from list.
[distclean-local]: distclean removes aminclude.am from
builddir.

* Makefile.in: Regenerated from automake.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
11 months agoFix the issue with @INC_AMINCLUDE@ failing, still.
Steven Munroe [Mon, 24 Jun 2019 15:54:12 +0000 (10:54 -0500)]
Fix the issue with @INC_AMINCLUDE@ failing, still.
Run aclocal, automake, and autoconf, again.

* aclocal.m4: Regenerated by aclocal.
* Makfile.in: Regenerated by automake.
* ./src/Makfile.in: Regenerated by automake.
* configure: Regenerated vy automake.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
11 months agoFix the issue with @INC_AMINCLUDE@ failing and miscellaneous changes.
Steven Munroe [Mon, 24 Jun 2019 14:32:22 +0000 (09:32 -0500)]
Fix the issue with @INC_AMINCLUDE@ failing and miscellaneous changes.
Bump version to 1.0.2 to provide a clean tag for Fedora.
The line @INC_AMINCLUDE@ should expanded at configure time to:
"include $(top_builddir)/aminclude.am"
The special aminclude.am file that Doxygen needs and includes from
the srcdir confuses configure if Makefile.am uses @INC_AMINCLUDE@.
Especially if the builddir is separate from srcdir.
So renamed the local aminclude.am to doxygen.am as it is specific
to Doxygen.

* configure.ac [AC_INIT]: Version 1.0.2.
* Makefile.am: Include local doxygen.am from srcdir.
Enable use of @INC_AMINCLUDE@.
(ChangeLog.md): Add rule to use generate-changelog.sh to
generate ChangeLog.md from tags and git commit logs.
* README.md: Updated to list all supported include files and
URL of link to documentation on pveclib git wiki.

* doxygen.am: Renamed from local file aminclude.am
* ChangeLog.md: New file.
* Makefile.in: Regenerated from automake.
* configure: Regenerated from autoconf.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
11 months agoMerge pull request #76 from munroesj52/version-one0one v1.0.1
Steven Munroe [Mon, 17 Jun 2019 14:53:33 +0000 (09:53 -0500)]
Merge pull request #76 from munroesj52/version-one0one

Version one0one

11 months agoAdd script to generate ChangeLog.md file from git change log. 76/head
Steven Munroe [Fri, 14 Jun 2019 19:52:13 +0000 (14:52 -0500)]
Add script to generate ChangeLog.md file from git change log.

* generate-changelog.sh: New file.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
11 months agoMark pveclib as version 1.0.1 for release.
Steven Munroe [Fri, 14 Jun 2019 18:22:08 +0000 (13:22 -0500)]
Mark pveclib as version 1.0.1 for release.
Revert attempt to use .@INC_AMINCLUDE@

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
11 months agoMark pveclib as version 1.0.1 for release.
Steven Munroe [Fri, 14 Jun 2019 15:42:49 +0000 (10:42 -0500)]
Mark pveclib as version 1.0.1 for release.
Update and regenerate autoconf and automake files.

* configure.ac [AC_INIT]: V1.0.1 update.
* configure: Regenerated from autoconf.

* Makefile.am: Reenable @INC_AMINCLUDE@
* Makefile.in: Regenerate from automake.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
11 months agoMerge pull request #75 from munroesj52/gcc9-bcd
Steven Munroe [Fri, 14 Jun 2019 13:56:03 +0000 (08:56 -0500)]
Merge pull request #75 from munroesj52/gcc9-bcd

Test falures while testing Fedora 30 and GCC 9 compiler.

11 months agoTest falures while testing Fedora 30 and GCC 9 compiler. 75/head
Steven Munroe [Fri, 14 Jun 2019 01:42:12 +0000 (20:42 -0500)]
Test falures while testing Fedora 30 and GCC 9 compiler.
Traced back to bad code generation for if (__builtin_bcdsub_gt)
tests in vec_bcdaddcsq, vec_bcdsubcsq.
Found a worrk around by replacing the if test (which is setting
the carry sign code) with vec_bcdcpsgn()
Also found check_vint384 was not printing 128-bit values in
the correct high to low order. Corrected this.

* src/pveclib/vec_bcd_ppc.h (vec_bcdaddcsq, vec_bcdsubcsq):
Simplify logic to avoid GCC9 code gen bug.

* src/testsuite/arith128_print.c
(print_vint384 [__BYTE_ORDER__]):
Print the 128-bit values in high to low order.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
11 months agoMerge pull request #74 from munroesj52/f128-p9fix
Steven Munroe [Mon, 10 Jun 2019 15:03:30 +0000 (10:03 -0500)]
Merge pull request #74 from munroesj52/f128-p9fix

Fix vec_f128_ppc.h and dummmy for older compilers & without -mfloat128:

11 months agoFix vec_f128_ppc.h and dummmy for older compilers & without -mfloat128: 74/head
Steven Munroe [Fri, 7 Jun 2019 22:23:58 +0000 (17:23 -0500)]
Fix vec_f128_ppc.h and dummmy for older compilers & without -mfloat128:
Float128 support is work in progress and Distro toolchains do not
include all Float128 backports provided by the Advance Toolchain.
So PVECLIB f128 operations that compile with AT may not compile with
the distro toolchain of the same GCC version.
This would be dispointing for distros that claim power9 support.
This patch set insures that the Ubuntu 18.04 LTS toolchain can compile
and execute all pveclib vec_f128_ppc.h operations for -mcpu=power8/9.

*src/vec_f128_ppc.h (vec_all_isfinitef128, vec_all_isinff128,
vec_all_isnanf128, vec_all_isnormalf128,
vec_all_issubnormalf128, vec_all_iszerof128, vec_isfinitef128,
vec_isinf_signf128, vec_isinff128, vec_isnanf128,
vec_isnormalf128, vec_issubnormalf128, vec_iszerof128):
Add constrains "&& defined (__FLOAT128__) && (__GNUC__ > 7)"
before using scalar builtins.

* src/testsuite/vec_f128_dummy.c (test_sinf128 [__FLOAT128__]):
Avoid 'Q' suffix floating-point constances if __FLOAT128__ is
not defined. This requires compiler operation -mfloat128.
(test_cosf128 [__FLOAT128__]):
Avoid 'Q' suffix floating-point constances if __FLOAT128__ is
not defined. This requires compiler operation -mfloat128.

* src/testsuite/vec_pwr9_dummy.c
(test_sinf128_PWR9 [__FLOAT128__]):
Avoid 'Q' suffix floating-point constances if __FLOAT128__ is
not defined. This requires compiler operation -mfloat128.
(test_cosf128_PWR9 [__FLOAT128__]):
Avoid 'Q' suffix floating-point constances if __FLOAT128__ is
not defined. This requires compiler operation -mfloat128.

Signed-off-by: Steven Munroe <munroesj52.gmail.com>
11 months agoMerge pull request #73 from gftg85/merge-prs
Steven Munroe [Thu, 6 Jun 2019 15:13:07 +0000 (10:13 -0500)]
Merge pull request #73 from gftg85/merge-prs

Fixes for distribution

11 months agoAdd pveclib subdir prefix to include directives in installed headers 73/head
Gabriel F. T. Gomes [Wed, 29 May 2019 01:31:54 +0000 (22:31 -0300)]
Add pveclib subdir prefix to include directives in installed headers

Since 'make install' installs the header files in /usr/include/pveclib
(or /usr/local/include/pveclib), the '#include' directives in them
should explicitly mention this subdirectory.  This removes the need for
users to add pveclib header directory with '-I/path/to/pveclib'.

In order to achieve this, this patch moves some headers from 'src' to
'src/pveclib' in the repository, however, that doesn't change their
installation paths.

Signed-off-by: Gabriel F. T. Gomes <gabriel@inconstante.net.br>
11 months agoFix make dist check
Gabriel F. T. Gomes [Fri, 31 May 2019 22:08:25 +0000 (19:08 -0300)]
Fix make dist check

Many files are not listed as required in the tarball distribution,
leading to failures in 'make distcheck'.  This patch adds them to the
relevant '_SOURCES' variables so that they get distributed in the
tarball, but not installed with 'make install' (see libtool docs [1]).
Additionally, this patch removes the inclusion of 'arith128.h' from
'src/tipowof10.c', because it is not needed.

Signed-off-by: Gabriel F. T. Gomes <gabriel@inconstante.net.br>
[1] https://www.gnu.org/software/automake/manual/automake.html#Libtool-Libraries

12 months agoMerge pull request #70 from munroesj52/version-one v1.0.0
Steven Munroe [Wed, 22 May 2019 17:16:35 +0000 (12:16 -0500)]
Merge pull request #70 from munroesj52/version-one

Mark pveclib as version 1.0.0 for release

12 months agoMark pveclib as version 1.0.0 for release 70/head
Steven Munroe [Tue, 21 May 2019 19:09:33 +0000 (14:09 -0500)]
Mark pveclib as version 1.0.0 for release

* configure.ac [AC_INIT]: V1.0.0 update.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
12 months agoMerge pull request #69 from munroesj52/bcd-ldiv-p3
Steven Munroe [Fri, 17 May 2019 21:05:15 +0000 (16:05 -0500)]
Merge pull request #69 from munroesj52/bcd-ldiv-p3

Add extended quadword BCD <> binary conversion to vec_bcd_ppc.h:

12 months agoMerge pull request #68 from munroesj52/bcd-ldiv-p2
Steven Munroe [Fri, 17 May 2019 21:04:54 +0000 (16:04 -0500)]
Merge pull request #68 from munroesj52/bcd-ldiv-p2

Add extended quadword BCD <> binary conversion to vec_bcd_ppc.h:

12 months agoMerge branch 'master' into bcd-ldiv-p2 68/head
Steven Munroe [Fri, 17 May 2019 21:04:17 +0000 (16:04 -0500)]
Merge branch 'master' into bcd-ldiv-p2

12 months agoMerge pull request #67 from munroesj52/bcd-ldiv-p1
Steven Munroe [Fri, 17 May 2019 20:53:58 +0000 (15:53 -0500)]
Merge pull request #67 from munroesj52/bcd-ldiv-p1

Add extended quadword BCD <> binary conversion to vec_bcd_ppc.h:

12 months agoAdd extended quadword BCD <> binary conversion to vec_bcd_ppc.h: 67/head
Steven Munroe [Fri, 17 May 2019 20:51:31 +0000 (15:51 -0500)]
Add extended quadword BCD <> binary conversion to vec_bcd_ppc.h:
Doxygen text updates after review.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
12 months agoMerge pull request #66 from munroesj52/i128-ldiv-p4
Steven Munroe [Fri, 17 May 2019 20:39:31 +0000 (15:39 -0500)]
Merge pull request #66 from munroesj52/i128-ldiv-p4

Add long division factoring support to vec_int128_ppv.h:

12 months agoAdd long division factoring support to vec_int128_ppv.h: 66/head
Steven Munroe [Fri, 17 May 2019 20:35:09 +0000 (15:35 -0500)]
Add long division factoring support to vec_int128_ppv.h:
Minor updated after review.

* src/testsuite/arith128_test_i128.c (db_vec_divudq_10e31):
Remove unnecessary comment.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
12 months agoMerge pull request #65 from munroesj52/i128-ldiv-p3
Steven Munroe [Fri, 17 May 2019 20:15:40 +0000 (15:15 -0500)]
Merge pull request #65 from munroesj52/i128-ldiv-p3

Add long division factoring support to vec_int128_ppv.h:

12 months agoMerge pull request #64 from munroesj52/i128-ldiv-p2
Steven Munroe [Fri, 17 May 2019 20:13:34 +0000 (15:13 -0500)]
Merge pull request #64 from munroesj52/i128-ldiv-p2

Add long division factoring support to vec_int128_ppc.h:

12 months agoMerge pull request #63 from munroesj52/i128-ldiv-p1
Steven Munroe [Fri, 17 May 2019 19:42:26 +0000 (14:42 -0500)]
Merge pull request #63 from munroesj52/i128-ldiv-p1

Add long division factoring support to vec_int128_ppv.h:

12 months agoAdd long division factoring support to vec_int128_ppc.h: 63/head
Steven Munroe [Fri, 17 May 2019 19:38:55 +0000 (14:38 -0500)]
Add long division factoring support to vec_int128_ppc.h:
Doxygen text changes after review.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd extended quadword multiply support to vec_int128_ppc.h:
Steven Munroe [Thu, 25 Apr 2019 19:12:40 +0000 (14:12 -0500)]
Add extended quadword multiply support to vec_int128_ppc.h:
Enable extended quadword BCD conversion. Additional timed tests.

* src/testsuite/arith128_test_i128.c (test_time_i128):
Add timming calls for; timed_longbcdct_10e32 and
timed_cfmaxdouble_10e32.  Rename timed_maxdouble_10e32
to timed_ctmaxdouble_10e32.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd extended BCD convert support to vec_int128_ppc.h: 65/head
Steven Munroe [Thu, 25 Apr 2019 18:59:48 +0000 (13:59 -0500)]
Add extended BCD convert support to vec_int128_ppc.h:
Enable extended quadword multiply by constant powers of 10.
Additional timed performance tests. The functions
timed_ctmaxdouble_10e32 (sprintf) and timed_cfmaxdouble_10e32 (strtod)
for the double __DBL_MAX__ value provide a comparison with
timed_longbcdcf_10e32 and timed_longbcdct_10e32 using pveclib.

* src/testsuite/vec_perf_i128.h (timed_maxdouble_10e32):
Rename to timed_cfmaxdouble_10e32.
(timed_longbcdct_10e32, timed_ctmaxdouble_10e32):
New extern functions.

* src/testsuite/vec_perf_i128.c: Include <stdlib.h>.
(timed_longdiv_e32, timed_longbcdcf_10e32):
Remove debug printf and cleanup.
(example_longbcdct_10e32): Add extern.
(timed_longbcdct_10e32): New timed function.
(timed_maxdouble_10e32): Rename to timed_cfmaxdouble_10e32.
(timed_ctmaxdouble_10e32): New timed functions.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd long division factoring support to vec_int128_ppv.h: 64/head
Steven Munroe [Thu, 25 Apr 2019 18:35:13 +0000 (13:35 -0500)]
Add long division factoring support to vec_int128_ppv.h:
Enable extended quadword long division by constant powers of 10.
Addition compile tests and examples. Some examples are used as
performance tests.

* src/testsuite/vec_pwr9_dummy.c (test_vec_bcdctuq_PWR9):
 New function.
(example_longdiv_10e31_PWR9): New example functions.

* src/testsuite/vec_int16_dummy.c: Code cleanup.
(__test_mod10, __test_div10000): Remove unused variables.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd extended quadword BCD <> binary conversion to vec_bcd_ppc.h: 69/head
Steven Munroe [Thu, 25 Apr 2019 15:23:03 +0000 (10:23 -0500)]
Add extended quadword BCD <> binary conversion to vec_bcd_ppc.h:
Add unit tests for extended quadword BCD <-> Binary conversion.

* src/testsuite/arith128_test_bcd.c
(db_example_longbcdct_10e32): New debug function.
(test_bcd_p2, test_bcd_p2B, test_bcdct_p2, test_bcdct_p3):
New unit test functions.
(test_longbcdcf_10e32, test_longbcdct_10e32):
New unit test functions.
(test_vec_bcd): Update driver to call new unit tests above.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd extended quadword BCD <> binary conversion to vec_bcd_ppc.h:
Steven Munroe [Thu, 25 Apr 2019 15:05:26 +0000 (10:05 -0500)]
Add extended quadword BCD <> binary conversion to vec_bcd_ppc.h:
Add examples for extended quadword BCD <-> Binary conversion.
These are called from unit and timed performance tests as well as
used as examples in the overview documentation.

* src/testsuite/vec_bcd_dummy.c
(example_longbcdcf_10e32): New Example function.
(example_longbcdct_10e32): New Example function.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd extended quadword BCD <> binary conversion to vec_bcd_ppc.h:
Steven Munroe [Thu, 25 Apr 2019 14:35:29 +0000 (09:35 -0500)]
Add extended quadword BCD <> binary conversion to vec_bcd_ppc.h:
Documnent extened BCD <-> binary conversions with examples.
Add a POWER9 specific optimization to vec_bcdctuq().

* src/vec_bcd_ppc.h: Improve Doxygen overview.
Cross reference BCD conversion and quadword long division
from vec_int128_ppc.h
[subsubsection bcd128_convert_0_2_3]: Add section
"Multiple precision BCD to/from Binary conversion"
[\paragraph bcd128_convert_0_2_3_0]: Add section
"Multiple precision BCD from Binary conversion"
[\paragraph bcd128_convert_0_2_3_1]: Add section
"Multiple precision BCD to Binary conversion"
(vec_bcdctuq [_ARCH_PWR9]): Optimization.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd long division factoring support to vec_int128_ppv.h:
Steven Munroe [Tue, 16 Apr 2019 18:46:36 +0000 (13:46 -0500)]
Add long division factoring support to vec_int128_ppv.h:
Enable extended quadword long division by constant powers of 10.
Additional unit tests.

* src/testsuite/arith128_test_i128.c (db_vec_divudq_10e31,
db_vec_modudq_10e31, db_example_longdiv_10e31):
New debug functions.
(test_time_i128): Add timed tests; timed_longdiv_e32,
timed_longbcdcf_10e32, timed_maxdouble_10e32.
(test_div_moduq_e32, test_div_modsq_e31) remove unused
variables.
(test_div_modudq_e31, test_longdiv_e31, test_longdiv_e32):
New unit test functions.
(test_vec_i128): Add calls to new units tests above.

* src/testsuite/pveclib_test.c: Minor update.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd long division factoring support to vec_int128_ppv.h:
Steven Munroe [Tue, 16 Apr 2019 18:28:32 +0000 (13:28 -0500)]
Add long division factoring support to vec_int128_ppv.h:
Enable extended quadword long division by constant powers of 10.
Additional timed performance tests.

* src/testsuite/vec_perf_i128.h (timed_longdiv_e32,
timed_longbcdcf_10e32, timed_maxdouble_10e32):
New extern functions.

* src/testsuite/vec_perf_i128.c: Include <string.h>.
Add extern for example_longdiv_10e32 and
example_longbcdcf_10e32.
(timed_longdiv_e32, timed_longbcdcf_10e32,
timed_maxdouble_10e32): New timed functions.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd long division factoring support to vec_int128_ppv.h:
Steven Munroe [Tue, 16 Apr 2019 17:44:20 +0000 (12:44 -0500)]
Add long division factoring support to vec_int128_ppv.h:
Enable extended quadword long division by constant powers of 10.
Addition compile tests and examples. Some examples are used as
performance tests.

* src/testsuite/vec_bcd_dummy.c (example_longbcdcf_10e32):
New function.

* src/testsuite/vec_int128_dummy.c (__test_divudq_10e31,
__test_divudq_10e32, __test_remudq_10e31, __test_remudq_10e32):
New function.
(example_longdiv_10e31, example_longdiv_10e32):
New example functions.

* src/testsuite/vec_pwr9_dummy.c (__test_divudq_10e31_PWR9,
__test_divudq_10e32_PWR9, __test_remudq_10e31_PWR9,
__test_remudq_10e32_PWR9): New functions.
(example_longdiv_10e31_PWR9): New example functions.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoAdd long division factoring support to vec_int128_ppv.h:
Steven Munroe [Tue, 16 Apr 2019 17:23:48 +0000 (12:23 -0500)]
Add long division factoring support to vec_int128_ppv.h:
Enable extended quadword long division by constant powers of 10.
This support conversion of multi-quadword binary integers to Decimal.

*src/vec_int128_ppc.h (\subsection int128_examples_0_1_3_0):
Add subsection header for "Extended Quadword multiply".
(\subsubsection int128_examples_0_1_3_1): New header and
description for "Quadword Long Division".
(vec_divuq_10e31, vec_divuq_10e32, vec_moduq_10e31,
vec_moduq_10e32, vec_subeuqm): New forward refs.
(vec_divudq_10e31, vec_divudq_10e32): New functions.
(vec_divuq_10e31): Correct comment.
(vec_modudq_10e31, vec_modudq_10e32): New functions.
(vec_moduq_10e31): Correct comment.
(vec_vmuleud, vec_vmuloud): Correct comment.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoMerge pull request #62 from munroesj52/bcd-bin2bcd-p11
Steven Munroe [Wed, 3 Apr 2019 16:05:57 +0000 (11:05 -0500)]
Merge pull request #62 from munroesj52/bcd-bin2bcd-p11

More vec_bcd_ppc.h add binary to BCD conversion:

13 months agoMore vec_bcd_ppc.h add binary to BCD conversion, after review: 62/head
Steven Munroe [Wed, 3 Apr 2019 16:04:01 +0000 (11:04 -0500)]
More vec_bcd_ppc.h add binary to BCD conversion, after review:
Add compile tests for new operations. Some clean up.

* src/testsuite/vec_bcd_dummy.c (example_vec_cbcdecsq_loop):
Initialize variable cn to elliminate compiler warning.
(test_vec_rdxct100b_0): Fix comment. Use preferred vec_spalts.
(test_vec_rdxct100b_1): Use preferred vec_spalts.
(test_vec_bcdcfsq): Rename to __test_vec_bcdcfsq.
(test_vec_bcdcfud): Rename to __test_vec_bcdcfud.
(test_vec_bcdcfuq): Rename to __test_vec_bcdcfuq.

* src/testsuite/vec_pwr9_dummy.c
(example_vec_cbcdecsq_loop_PWR9):
Initialize variable cn to elliminate compiler warning.

* src/vec_char_ppc.h (vec_mulhub [__GNUC__ >= 7]):
Use prefered generic vec_mul for newer compilers.
Generates smaller code, allow strength reduction.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoMerge pull request #61 from munroesj52/bcd-bin2bcd-p10
Steven Munroe [Wed, 3 Apr 2019 15:38:15 +0000 (10:38 -0500)]
Merge pull request #61 from munroesj52/bcd-bin2bcd-p10

More vec_bcd_ppc.h add binary to BCD conversion:

13 months agoMore vec_bcd_ppc.h add binary to BCD conversio, after review: 61/head
Steven Munroe [Wed, 3 Apr 2019 15:35:58 +0000 (10:35 -0500)]
More vec_bcd_ppc.h add binary to BCD conversio, after review:
Add unit tests for new operations.

* src/testsuite/arith128_test_bcd.c (db_vec_cbcdaddcsq):
Elliminate unused variable warning.
(db_vec_cbcdsubcsq): Elliminate unused variable warning.
(db_vec_rdxcf10kh): Elliminate unused code.
(db_vec_rdxcf100mw): Elliminate unused code.
(test_bcd_adde256): Elliminate unused variable warning.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
13 months agoMerge pull request #60 from munroesj52/bcd-bin2bcd-p9
Steven Munroe [Wed, 3 Apr 2019 14:46:39 +0000 (09:46 -0500)]
Merge pull request #60 from munroesj52/bcd-bin2bcd-p9

More vec_bcd_ppc.h add binary to BCD conversion:

13 months agoMore vec_bcd_ppc.h add binary to BCD conversion. After review: 60/head
Steven Munroe [Wed, 3 Apr 2019 14:43:39 +0000 (09:43 -0500)]
More vec_bcd_ppc.h add binary to BCD conversion. After review:
Cleaned up doxygent documentation for syntax and clarity.
General code cleanup and compile warning removal.

* src/vec_bcd_ppc.h: Updateed Doxygen comments.
(vec_bcdaddecsq): Simplied code with bias to POWER9.
Elliminated compile warnings.
(vec_bcdcfz):
Elliminated unused variable bcd_s.
(vec_bcddive): Updated Doxygen comments.
(vec_bcddive [_ARCH_PWR9]): Eliminated unused varaible warning.
(vec_bcdmul): Updateed Doxygen comments.
(vec_bcdmulh [_ARCH_PWR9]): Eliminated unused varaible warning.
(vec_bcdsubecsq): Simplied code with bias to POWER9.
Elliminated compile warnings.
(vec_bcdtrunc): Code clean up.
Eliminated unused varaible warning.
(vec_cbcdaddcsq): Eliminated unused varaible warning.
(vec_cbcdaddecsq): Eliminated unused varaible warning.
(vec_cbcdmul): Updated Doxygen comments.
(vec_cbcdmul [_ARCH_PWR9]): Eliminated unused varaible warning.
(vec_cbcdsubcsq): Eliminated unused varaible warning.
(vec_cbcdsubcsq [(__GNUC__ > 6]): Use vector extensions.
(vec_rdxcf100mw [!_ARCH_PWR8]):
Eliminated unused varaible warning.
(vec_rdxcf10e32q [!_ARCH_PWR9]):
Eliminated unused varaible warning.
(vec_rdxct100b): Use preferred generic vec_splats.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMore vec_bcd_ppc.h add binary to BCD conversion:
Steven Munroe [Fri, 29 Mar 2019 19:40:32 +0000 (14:40 -0500)]
More vec_bcd_ppc.h add binary to BCD conversion:
Add compile tests for new operations.

* src/testsuite/vec_bcd_dummy.c (test_vec_BCD2BIN):
New function.
(test_vec_rdxcf100b, test_vec_rdxcf10kh, test_vec_rdxcf100mw,
test_vec_rdxcf10E16d, test_vec_rdxcf10e32q): New functions.
(test_vec_rdxct100b_0, test_vec_rdxct100b_1): New functions.
(test_vec_rdxct100b, test_vec_rdxct10kh, test_vec_rdxct10E16d,
test_vec_rdxct10e32q, test_vec_bcdcfsq, test_vec_bcdcfud,
test_vec_bcdcfuq): New functions.

* src/testsuite/vec_char_dummy.c
(__test_mulubm_gcc [__GNUC__ >= 7]): Compile vec_mul generic.

* src/testsuite/vec_pwr9_dummy.c (test_vec_rdxct10E16d_PWR9,
test_vec_rdxct10e32q_PWR9, test_vec_rdxcf10E16d_PWR9,
test_vec_rdxcf10e32q_PWR9, test_vec_bcdcfsq_PWR9,
test_vec_bcdcfuq_PWR9): New functions.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMore vec_bcd_ppc.h add binary to BCD conversion:
Steven Munroe [Fri, 29 Mar 2019 19:22:22 +0000 (14:22 -0500)]
More vec_bcd_ppc.h add binary to BCD conversion:
Add unit tests for new operations.
Clean up debug prints from last round.

* src/testsuite/arith128_test_bcd.c (db_vec_rdxcf100b,
db_vec_rdxcf10kh, db_vec_rdxcf100mw, db_vec_rdxcf10E16d,
db_vec_rdxcf10e32q): New Functions.
(test_rdxcf100b, test_rdxcf10kh, test_rdxcf100mw,
test_rdxcf10E16d, test_rdxcf10e32q, test_vec_bcdcfuq,
test_vec_bcdcfsq, test_bcddive): New Functions.
(test_vec_bcd): Add new function calls to driver.

* src/testsuite/arith128_test_i128.c
(test_div_modsq_e31 [__DEBUG_PRINT__]): Disable.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMore vec_bcd_ppc.h add binary to BCD conversion:
Steven Munroe [Fri, 29 Mar 2019 18:55:50 +0000 (13:55 -0500)]
More vec_bcd_ppc.h add binary to BCD conversion:
More doxygent documentation. New functions for binary to BCD conversion.
Selective optimizations to leverage; POWER9, DFU, and compiler strength
redunction.

* src/vec_bcd_ppc.h (\subsection bcd128_extended_0_2):
Extended Precision computation with BCD. Replace todo with
overview text.
(\subsubsection bcd128_extended_0_2_0 Vector):  Add/Subtrace
with Carry/Extend example. Expand \todo with remaining issues
affecting extended BCD add/sub.
(\subsubsection bcd128_muldiv_0_2_1): Vector BCD
Multiply/Divide Quadword example.
Replaces bcd128_extended_0_2_1
(\subsubsection bcd128_convert_0_2_2): New subsubsection.
Vector BCD to/from Binary conversion.
(\paragraph bcd128_convert_0_2_2_1): New paragraph.
Vector Parallel conversion.
(\paragraph bcd128_convert_0_2_2_2): New paragraph.
Vector Parallel BCD to quadword conversion.
(\paragraph bcd128_convert_0_2_2_3): New paragraph.
Vector Parallel quadword to BCD conversion.
(vec_bcdcfuq, vec_rdxcf100b, vec_rdxcf10kh, vec_rdxcf100mw,
vec_rdxcf10E16d, vec_rdxcf10e32q): New function forward refs.
(vec_BCD2BIN, vec_BIN2BCD, vec_bcdcfsq, vec_bcdcfud,
vec_bcdcfuq): New Functions.
(vec_bcdctsq): Update documented throughput.
(vec_bcdctud [_ARCH_PWR7]): Add vec_BCD2BIN optimization.
(vec_bcdctuq [_ARCH_PWR7]): Add vec_BCD2BIN optimization.
(vec_bcddive): New Function.
(vec_rdxcf100b, vec_rdxcf10kh, vec_rdxcf100mw, vec_rdxcf10E16d,
vec_rdxcf10e32q) New functions.
(vec_rdxct100b [__GNUC__ > 7)]): If compiler supports vec_mul
generic, use it.
(vec_rdxct10e32q): Update documented throughput.
(vec_rdxct10e32q [_ARCH_PWR9]): Use POWER8 msumudm instruction.

* src/vec_char_ppc.h (vec_mulubm [__GNUC__ >= 7]): If compiler
supports vec_mul generic, use it.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMerge pull request #59 from munroesj52/i128-ctbcd-p1
Steven Munroe [Fri, 29 Mar 2019 15:35:22 +0000 (10:35 -0500)]
Merge pull request #59 from munroesj52/i128-ctbcd-p1

More vec_bcd_ppc.h add factoring support to vec_int128_ppv.h:

14 months agoMore vec_bcd_ppc.h add factoring support to vec_int128_ppv.h: 59/head
Steven Munroe [Fri, 29 Mar 2019 15:31:48 +0000 (10:31 -0500)]
More vec_bcd_ppc.h add factoring support to vec_int128_ppv.h:
After review: Update doxygent text. Clarifying code changes.
Add new compile tests to vec_int128_dummy.c and vec_pwr9_dummy.c

*src/vec_int128_ppc.h (\subsection int128_examples_0_1_2)
Update doxygent text.
(vec_divsq_10e31, vec_divuq_10e31, vec_divuq_10e32):
Use int const for zero initializer. Clarify corrective add
comment. Remove dedundent parenthesese.
(vec_modsq_10e31): Use int const for zero initializer. Clarify
corrective add comment.

*src/testsuite/vec_int128_dummy.c (test_vec_mul10uq_c):
Correct mul10 plus carry example.
(__test_modsq_10e31, __test_remsq_10e31, __test_remuq_10e31,
__test_remuq_10e32): New Functions.

* src/testsuite/vec_pwr9_dummy.c (__test_remsq_10e31,
__test_remuq_10e31, __test_remuq_10e32): New Functions.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMerge pull request #58 from munroesj52/bcd-p9isa-p8
Steven Munroe [Thu, 28 Mar 2019 19:03:55 +0000 (14:03 -0500)]
Merge pull request #58 from munroesj52/bcd-p9isa-p8

More vec_bcd_ppc.h compile and unit tests:

14 months agoMore vec_bcd_ppc.h compile and unit tests: 58/head
Steven Munroe [Thu, 28 Mar 2019 18:59:47 +0000 (13:59 -0500)]
More vec_bcd_ppc.h compile and unit tests:
One more spelling error.

* src/testsuite/arith128_test_bcd.c (db_vec_cbcdaddcsq):
Correct spelling of borrow.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMore vec_bcd_ppc.h compile and unit tests:
Steven Munroe [Thu, 28 Mar 2019 18:48:04 +0000 (13:48 -0500)]
More vec_bcd_ppc.h compile and unit tests:
Updates after review

* src/testsuite/arith128_test_bcd.c (test_bcd_cadde256):
Add comment to explain deferred test.
(test_bcdutrunc [__DEBUG_PRINT__]): Change debug text to match
operation.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMore vec_bcd_ppc.h add factoring support to vec_int128_ppv.h:
Steven Munroe [Wed, 20 Mar 2019 15:24:04 +0000 (10:24 -0500)]
More vec_bcd_ppc.h add factoring support to vec_int128_ppv.h:
Added divide/modulo operations to to factor signed/unsigned vector
__int128 values to convert to BCD. The __int128 type can represent
intergers upto 39 digits while vector BCD can represent 31 (signed)
or 32 (unsigned) digits. The operations use the multiplicative inverse
to divide by 10**31 or 10*32) for the quotient. Then multiple and
subtrace to issolate the remainder.

*src/vec_int128_ppc.h (\subsection int128_examples_0_1_2):
New Doxygen for "Converting Vector __int128 values to BCD".
(\subsection int128_examples_0_1_3): Renumbered from
\subsection int128_examples_0_1_2.
(vec_divsq_10e31, vec_divuq_10e31, vec_divuq_10e32,
vec_modsq_10e31, vec_moduq_10e31, vec_moduq_10e32):
New Functions.

* src/testsuite/arith128_test_i128.c (test_div_moduq_e32,
test_div_moduq_e31, test_div_modsq_e31): New Functions.
(test_vec_i128): Add calls to the above new functions.

* src/testsuite/arith128_print.h (print_vint128s): New extern.
* src/testsuite/arith128_print.c (print_vint128s): New Function
to print vi128_t.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMore vec_bcd_ppc.h compile and unit tests:
Steven Munroe [Fri, 15 Mar 2019 18:12:15 +0000 (13:12 -0500)]
More vec_bcd_ppc.h compile and unit tests:
Update unit tests after change to bcdsub and related functions.
Added unit tests for new operations.
Added compile tests for new operations.
Added power9 target tests for new operations.

* src/testsuite/arith128_test_bcd.c (vec_copysign_Decimal128):
Prototype, disable for now.
(db_vec_cbcdaddcsq): Add debug printf. Update "borrow" test to
check for != BCD 0. Ditto for _ARCH_PWR7.
(db_vec_cbcdsubcsq): New function.
(test_bcd_subesqm): Correct unit tests results after changes.
(test_bcd_subecsq): Ditto.
(test_bcd_cadde256): Add unit tests.
(test_bcdcmpsq_p2, test_bcdcmpsq, test_bcdcmp_p2, test_bcdcmp,
test_bcdsetsgn, test_bcdcfz, test_bcdctz, test_bcdsr,
test_bcdtrunc, test_bcdutrunc, test_bcdtruncqi,
test_bcdutruncqi, test_bcdsrrqi, test_bcd_csube256):
New function.
(test_vec_bcd): Update to call new unit test functions.

* src/testsuite/vec_bcd_dummy.c (test_vec_cbcdmul_2,
test_vec_bcdsetsgn, test_vec_bcdcfz, test_vec_bcdctz,
test_vec_bcdsr, test_vec_bcdtrunc, test_vec_bcdtruncqi,
test_vec_bcdutrunc, test_vec_bcdutruncqi): New functions.
(test_vec_bcdcmp_eqsq, test_vec_bcdcmp_nesq,
test_vec_bcdcmp_gtsq, test_vec_bcdcmp_gesq,
test_vec_bcdcmp_ltsq, test_vec_bcdcmp_lesq,
test_vec_bcdcmpeq, test_vec_bcdcmpne, test_vec_bcdcmpgt,
` test_vec_bcdcmpge, test_vec_bcdcmplt, test_vec_bcdcmple):
New functions.
(example_bcdmul_2x2): New function.
(test__builtin_bcdsub_lt): New function.

* src/testsuite/vec_pwr9_dummy.c (test_vec_bcdsr_PWR9,
test_vec_bcdsrrqi_PWR9, test_vec_bcdtrunc_PWR9,
test_vec_bcdtruncqi_PWR9, test_vec_bcdutrunc_PWR9,
test_vec_bcdutruncq): New functions.
(test__vec_bcdaddcsq_PWR9): New function.
(test__vec_bcdaddcsq2_PWR9): Renamed from
test__vec_bcdaddcsq_PWR9.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMerge pull request #57 from munroesj52/bcd-p9isa-p7
Steven Munroe [Fri, 15 Mar 2019 14:16:23 +0000 (09:16 -0500)]
Merge pull request #57 from munroesj52/bcd-p9isa-p7

More vec_bcd_ppc.h:

14 months agoMore vec_bcd_ppc.h: Clean up after review. 57/head
Steven Munroe [Fri, 15 Mar 2019 14:09:52 +0000 (09:09 -0500)]
More vec_bcd_ppc.h: Clean up after review.
Update Doxygen comments for spelling & syntax.
Adjust indent and whitespace in some functions.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMore vec_bcd_ppc.h:
Steven Munroe [Thu, 14 Mar 2019 22:19:56 +0000 (17:19 -0500)]
More vec_bcd_ppc.h:
Clean up of Extend/carry support for BCD add/sub.
Fix a bug with borrow.
Added BCD compare for predicate and bool
New functions; vec_bcdcfz vec_bcdctz
Add shift and round, shift unsigned, Truncate, and
Unsigned truncate.
Latentcy estimates for new functions.
Add Doxygen exmaple of extented multiply.

* src/vec_bcd_ppc.h: Update Doxygen description.
[\section bcd128_details_0_0]: Added Some details of BCD
computation.
[\subsection bcd128_extended_0_1]: Added Preferred sign, zone,
and zero.
[\subsection bcd128_extended_0_2]: Was a section.
Corrected some of the code examples.
[\subsubsection bcd128_extended_0_2_1; Added Vector BCD
Multiply Quadword example.
(vbBCD_): New typedef for vector bool for BCD.
(vec_bcdsub): Forward ref prototype.
(vec_bcdaddcsq): Code improvements, reduced latency.
(vec_bcdaddecsq): Code improvements, reduced latency.
(vec_bcdcfz): New function.
(vec_bcdcmp_eqsq, vec_bcdcmp_gesq, vec_bcdcmp_gtsq,
vec_bcdcmp_lesq, vec_bcdcmp_ltsq, vec_bcdcmp_nesq):
New function.
(vec_bcdcmpeq. vec_bcdcmpge, vec_bcdcmpgt, vec_bcdcmple,
vec_bcdcmplt, vec_bcdcmpne): New function.
(vec_bcdctz, vec_bcdsetsgn, vec_bcdslqi, vec_bcdsluqi,
vec_bcdsr, vec_bcdsrrqi): New function.
(vec_bcdsubcsq): Code improvements, reduced latency.
(vec_bcdsubecsq): Code improvements, reduced latency.
(vec_bcdsubesqm): Code improvements.
(vec_bcdtrunc, vec_bcdtruncqi, vec_bcdutrunc, vec_bcdutruncqi):
New function.
(vec_cbcdaddcsq, vec_cbcdaddecsq):  Code improvements.
(vec_cbcdsubcsq): New function.

*src/vec_common_ppc.h (VEC_HW_L_DWH): define for truncate
digit count hword.

Signed-off-by: Steven Munroe <munroesj52@gmail.com>
14 months agoMerge pull request #56 from munroesj52/bcd-addextend-p6
Steven Munroe [Thu, 7 Mar 2019 18:40:03 +0000 (12:40 -0600)]
Merge pull request #56 from munroesj52/bcd-addextend-p6

More vec_bcd_ppc.h compile and unit tests.