This commits adds support for runtime detection of SIMD instructions. The idea is that you would build once with all supported SIMD functions and the same binaries could run on different machines with varying support for SIMD. At runtime gf-complete will select the right functions based on the processor.
gf_cpu.c has the logic to detect SIMD instructions. On Intel processors this is done through cpuid. For ARM on linux we use getauxv.
The logic in gf_w*.c has been changed to check for runtime SIMD support and fallback to generic code.
Also a new test has been added. It compares the functions selected by gf_init when we enable/disable SIMD support through build flags, with runtime enabling/disabling. The test checks if the results are identical.
* (1 << w) are changed into ((uint32_t)1 << w)
* int are changed into uint32_t
gf.c: gf_composite_get_default_poly:
a larger unsigned were assigned to unsigned integers in which case
the type of the assigned variable is changed to be the same as the
value assigned to it.
gf_w16.c: GF_MULTBY_TWO
setting the parameter to a variable instead of passing the expression
resolves the warning for some reason.
Signed-off-by: Loic Dachary <loic@dachary.org>
When a fatal error (unaligned memory etc.) is detected, gf-complete should
assert(3) instead of exit(3) to give a chance to the calling program to
catch the exception and display a stack trace. Although it is possible
for gdb to display the stack trace and break on exit, libraries are not
usually expected to terminate the calling program in this way.
Signed-off-by: Loic Dachary <loic@dachary.org>
Optimisations for the 4,4 split table region multiplication and carry
less multiplication using NEON's polynomial long multiplication.
arm: w8: NEON carry less multiplication
Selected time_tool.sh results for a 1.7GHz cortex-a9:
Region Best (MB/s): 375.86 W-Method: 8 -m CARRY_FREE -
Region Best (MB/s): 142.94 W-Method: 8 -m TABLE -
Region Best (MB/s): 225.01 W-Method: 8 -m TABLE -r DOUBLE -
Region Best (MB/s): 211.23 W-Method: 8 -m TABLE -r DOUBLE -r LAZY -
Region Best (MB/s): 160.09 W-Method: 8 -m LOG -
Region Best (MB/s): 123.61 W-Method: 8 -m LOG_ZERO -
Region Best (MB/s): 123.85 W-Method: 8 -m LOG_ZERO_EXT -
Region Best (MB/s): 1183.79 W-Method: 8 -m SPLIT 8 4 -r SIMD -
Region Best (MB/s): 177.68 W-Method: 8 -m SPLIT 8 4 -r NOSIMD -
Region Best (MB/s): 87.85 W-Method: 8 -m COMPOSITE 2 - -
Region Best (MB/s): 428.59 W-Method: 8 -m COMPOSITE 2 - -r ALTMAP -
Remove identical expression, reorganize code in gf_error_check()
to be identical handled trough all checks. Removed (raltmap && arg1 != 4)
check - this is dead code (arg1 is always 4 in this code path).
Fix for coverity issue from Ceph project:
CID 1193071 (#1 of 1): Same on both sides (CONSTANT_EXPRESSION_RESULT)
pointless_expression: The expression (arg1 == 4 && arg2 == 32) ||
(arg1 == 4 && arg2 == 32) does not accomplish anything because it
evaluates to either of its identical operands, arg1 == 4 && arg2 == 32.
Did you intend the operands to be different?
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>