Commit Graph

11 Commits (master)

Author SHA1 Message Date
Bassam Tabbara 4339569f14 Support for runtime SIMD detection
This commits adds support for runtime detection of SIMD instructions. The idea is that you would build once with all supported SIMD functions and the same binaries could run on different machines with varying support for SIMD. At runtime gf-complete will select the right functions based on the processor.

gf_cpu.c has the logic to detect SIMD instructions. On Intel processors this is done through cpuid. For ARM on linux we use getauxv.

The logic in gf_w*.c has been changed to check for runtime SIMD support and fallback to generic code.

Also a new test has been added. It compares the functions selected by gf_init when we enable/disable SIMD support through build flags, with runtime enabling/disabling. The test checks if the results are identical.
2016-09-13 12:24:25 -07:00
Loic Dachary d1b6bbf706 add -Wsign-compare and address the warnings
* (1 << w) are changed into ((uint32_t)1 << w)
* int are changed into uint32_t

gf.c: gf_composite_get_default_poly:

   a larger unsigned were assigned to unsigned integers in which case
   the type of the assigned variable is changed to be the same as the
   value assigned to it.

gf_w16.c: GF_MULTBY_TWO

   setting the parameter to a variable instead of passing the expression
   resolves the warning for some reason.

Signed-off-by: Loic Dachary <loic@dachary.org>
2015-09-02 19:20:33 +02:00
Janne Grunau 6fdd8bc3d3 arm: NEON optimisations for gf_w64
Optimisations for 4,64 split table region multiplications. Only used on
ARMv8-A since it is not faster on ARMv7-A.
2014-10-24 14:54:55 +02:00
Janne Grunau 370c88b901 arm: NEON optimisations for gf_w32
Optimisations for 4,32 split table multiplications.

Selected time_tool.sh results on a 1.7 GHz cortex-a9:
Region Best (MB/s):   346.67   W-Method: 32 -m SPLIT 32 4 -r SIMD -
Region Best (MB/s):    92.89   W-Method: 32 -m SPLIT 32 4 -r NOSIMD -
Region Best (MB/s):   258.17   W-Method: 32 -m SPLIT 32 4 -r SIMD -r ALTMAP -
Region Best (MB/s):   162.00   W-Method: 32 -m SPLIT 32 8 -
Region Best (MB/s):   160.53   W-Method: 32 -m SPLIT 8 8 -
Region Best (MB/s):    32.74   W-Method: 32 -m COMPOSITE 2 - -
Region Best (MB/s):   199.79   W-Method: 32 -m COMPOSITE 2 - -r ALTMAP -
2014-10-24 14:54:27 +02:00
Janne Grunau 474010a91d arm: NEON optimisations for gf_w16
Optimisations for the 4,16 split table region multiplications.

Selected time_tool.sh 16 -A -B results for a 1.7 GHz cortex-a9:
Region Best (MB/s):   532.14   W-Method: 16 -m SPLIT 16 4 -r SIMD -
Region Best (MB/s):   212.34   W-Method: 16 -m SPLIT 16 4 -r NOSIMD -
Region Best (MB/s):   801.36   W-Method: 16 -m SPLIT 16 4 -r SIMD -r ALTMAP -
Region Best (MB/s):    93.20   W-Method: 16 -m SPLIT 16 4 -r NOSIMD -r ALTMAP -
Region Best (MB/s):   273.99   W-Method: 16 -m SPLIT 16 8 -
Region Best (MB/s):   270.81   W-Method: 16 -m SPLIT 8 8 -
Region Best (MB/s):    70.42   W-Method: 16 -m COMPOSITE 2 - -
Region Best (MB/s):   393.54   W-Method: 16 -m COMPOSITE 2 - -r ALTMAP -
2014-10-24 14:53:57 +02:00
Janne Grunau bec15359de arm: NEON optimisations for gf_w8
Optimisations for the 4,4 split table region multiplication and carry
less multiplication using NEON's polynomial long multiplication.
arm: w8: NEON carry less multiplication

Selected time_tool.sh results for a 1.7GHz cortex-a9:
Region Best (MB/s):   375.86   W-Method: 8 -m CARRY_FREE -
Region Best (MB/s):   142.94   W-Method: 8 -m TABLE -
Region Best (MB/s):   225.01   W-Method: 8 -m TABLE -r DOUBLE -
Region Best (MB/s):   211.23   W-Method: 8 -m TABLE -r DOUBLE -r LAZY -
Region Best (MB/s):   160.09   W-Method: 8 -m LOG -
Region Best (MB/s):   123.61   W-Method: 8 -m LOG_ZERO -
Region Best (MB/s):   123.85   W-Method: 8 -m LOG_ZERO_EXT -
Region Best (MB/s):  1183.79   W-Method: 8 -m SPLIT 8 4 -r SIMD -
Region Best (MB/s):   177.68   W-Method: 8 -m SPLIT 8 4 -r NOSIMD -
Region Best (MB/s):    87.85   W-Method: 8 -m COMPOSITE 2 - -
Region Best (MB/s):   428.59   W-Method: 8 -m COMPOSITE 2 - -r ALTMAP -
2014-10-24 14:53:35 +02:00
Janne Grunau 1311a44f7a arm: NEON optimisations for gf_w4
Optimisations for the single table region multiplication and carry less
multiplication using NEON's polynomial multiplication of 8-bit values.

The single polynomial multiplication is not that useful but vector
version is for region multiplication.

Selected time_tool.sh results for a 1.7GHz cortex-a9:
Region Best (MB/s):   672.72   W-Method: 4 -m CARRY_FREE -
Region Best (MB/s):   265.84   W-Method: 4 -m BYTWO_p -
Region Best (MB/s):   329.41   W-Method: 4 -m TABLE -r DOUBLE -
Region Best (MB/s):   278.63   W-Method: 4 -m TABLE -r QUAD -
Region Best (MB/s):   329.81   W-Method: 4 -m TABLE -r QUAD -r LAZY -
Region Best (MB/s):  1318.03   W-Method: 4 -m TABLE -r SIMD -
Region Best (MB/s):   165.15   W-Method: 4 -m TABLE -r NOSIMD -
Region Best (MB/s):    99.73   W-Method: 4 -m LOG -
2014-10-24 14:53:12 +02:00
Janne Grunau f6828cfbc1 build: fix out of source tree build 2014-10-09 23:22:28 +02:00
Adam Disney 5be1fecbcb Fixed a few minor warnings when running autogen.sh. 2014-06-16 12:27:19 -04:00
Kevin Greenan e1c76b4dd4 Added exhaustive test support (Ethan's changes to gf_unit and gf_methods) and overrode autoconf's defaults for CFLAGS. 2013-12-07 16:05:31 -08:00
Kevin Greenan 153dd20988 Setting up autoconf/automake for GF-Complete
Also re-organized the directory structure.

Signed-off-by: Kevin Greenan <kmgreen2@gmail.com>
2013-12-04 21:24:29 -08:00