Commit Graph

26 Commits (master)

Author SHA1 Message Date
Brad Hubbard 2700e1b9ae Resolve cppcheck Signed integer overflow errors
The type of expression '1<<31' is signed int and this causes cppcheck to
issue the following warning.

src/gf_w32.c:681]: (error) Signed integer overflow for expression
'1<<31'.

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2017-04-10 17:21:30 +10:00
Bassam Tabbara 4339569f14 Support for runtime SIMD detection
This commits adds support for runtime detection of SIMD instructions. The idea is that you would build once with all supported SIMD functions and the same binaries could run on different machines with varying support for SIMD. At runtime gf-complete will select the right functions based on the processor.

gf_cpu.c has the logic to detect SIMD instructions. On Intel processors this is done through cpuid. For ARM on linux we use getauxv.

The logic in gf_w*.c has been changed to check for runtime SIMD support and fallback to generic code.

Also a new test has been added. It compares the functions selected by gf_init when we enable/disable SIMD support through build flags, with runtime enabling/disabling. The test checks if the results are identical.
2016-09-13 12:24:25 -07:00
Bassam Tabbara 87f0d4395d Add support for printing functions selected in gf_init
There is currently no way to figure out which functions were selected
during gf_init and as a result of SIMD options. This is not even possible
in gdb since most functions are static.

This commit adds a new macro SET_FUNCTION that records the name of the
function selected during init inside the gf_internal structure. This macro
only works when DEBUG_FUNCTIONS is defined during compile. Otherwise the
code works exactly as it did before this change.

The names of selected functions will be used during testing of SIMD
runtime detection.

All calls such as:

gf->multiply.w32 = gf_w16_shift_multiply;

need to be replaced with the following:

SET_FUNCTION(gf,multiply,w32,gf_w16_shift_multiply)

Also added a new flag to tools/gf_methods that will print the names of
functions selected during gf_init.
2016-09-13 12:24:25 -07:00
Janne Grunau 6fdd8bc3d3 arm: NEON optimisations for gf_w64
Optimisations for 4,64 split table region multiplications. Only used on
ARMv8-A since it is not faster on ARMv7-A.
2014-10-24 14:54:55 +02:00
Janne Grunau 370c88b901 arm: NEON optimisations for gf_w32
Optimisations for 4,32 split table multiplications.

Selected time_tool.sh results on a 1.7 GHz cortex-a9:
Region Best (MB/s):   346.67   W-Method: 32 -m SPLIT 32 4 -r SIMD -
Region Best (MB/s):    92.89   W-Method: 32 -m SPLIT 32 4 -r NOSIMD -
Region Best (MB/s):   258.17   W-Method: 32 -m SPLIT 32 4 -r SIMD -r ALTMAP -
Region Best (MB/s):   162.00   W-Method: 32 -m SPLIT 32 8 -
Region Best (MB/s):   160.53   W-Method: 32 -m SPLIT 8 8 -
Region Best (MB/s):    32.74   W-Method: 32 -m COMPOSITE 2 - -
Region Best (MB/s):   199.79   W-Method: 32 -m COMPOSITE 2 - -r ALTMAP -
2014-10-24 14:54:27 +02:00
Janne Grunau 474010a91d arm: NEON optimisations for gf_w16
Optimisations for the 4,16 split table region multiplications.

Selected time_tool.sh 16 -A -B results for a 1.7 GHz cortex-a9:
Region Best (MB/s):   532.14   W-Method: 16 -m SPLIT 16 4 -r SIMD -
Region Best (MB/s):   212.34   W-Method: 16 -m SPLIT 16 4 -r NOSIMD -
Region Best (MB/s):   801.36   W-Method: 16 -m SPLIT 16 4 -r SIMD -r ALTMAP -
Region Best (MB/s):    93.20   W-Method: 16 -m SPLIT 16 4 -r NOSIMD -r ALTMAP -
Region Best (MB/s):   273.99   W-Method: 16 -m SPLIT 16 8 -
Region Best (MB/s):   270.81   W-Method: 16 -m SPLIT 8 8 -
Region Best (MB/s):    70.42   W-Method: 16 -m COMPOSITE 2 - -
Region Best (MB/s):   393.54   W-Method: 16 -m COMPOSITE 2 - -r ALTMAP -
2014-10-24 14:53:57 +02:00
Janne Grunau bec15359de arm: NEON optimisations for gf_w8
Optimisations for the 4,4 split table region multiplication and carry
less multiplication using NEON's polynomial long multiplication.
arm: w8: NEON carry less multiplication

Selected time_tool.sh results for a 1.7GHz cortex-a9:
Region Best (MB/s):   375.86   W-Method: 8 -m CARRY_FREE -
Region Best (MB/s):   142.94   W-Method: 8 -m TABLE -
Region Best (MB/s):   225.01   W-Method: 8 -m TABLE -r DOUBLE -
Region Best (MB/s):   211.23   W-Method: 8 -m TABLE -r DOUBLE -r LAZY -
Region Best (MB/s):   160.09   W-Method: 8 -m LOG -
Region Best (MB/s):   123.61   W-Method: 8 -m LOG_ZERO -
Region Best (MB/s):   123.85   W-Method: 8 -m LOG_ZERO_EXT -
Region Best (MB/s):  1183.79   W-Method: 8 -m SPLIT 8 4 -r SIMD -
Region Best (MB/s):   177.68   W-Method: 8 -m SPLIT 8 4 -r NOSIMD -
Region Best (MB/s):    87.85   W-Method: 8 -m COMPOSITE 2 - -
Region Best (MB/s):   428.59   W-Method: 8 -m COMPOSITE 2 - -r ALTMAP -
2014-10-24 14:53:35 +02:00
Janne Grunau 1311a44f7a arm: NEON optimisations for gf_w4
Optimisations for the single table region multiplication and carry less
multiplication using NEON's polynomial multiplication of 8-bit values.

The single polynomial multiplication is not that useful but vector
version is for region multiplication.

Selected time_tool.sh results for a 1.7GHz cortex-a9:
Region Best (MB/s):   672.72   W-Method: 4 -m CARRY_FREE -
Region Best (MB/s):   265.84   W-Method: 4 -m BYTWO_p -
Region Best (MB/s):   329.41   W-Method: 4 -m TABLE -r DOUBLE -
Region Best (MB/s):   278.63   W-Method: 4 -m TABLE -r QUAD -
Region Best (MB/s):   329.81   W-Method: 4 -m TABLE -r QUAD -r LAZY -
Region Best (MB/s):  1318.03   W-Method: 4 -m TABLE -r SIMD -
Region Best (MB/s):   165.15   W-Method: 4 -m TABLE -r NOSIMD -
Region Best (MB/s):    99.73   W-Method: 4 -m LOG -
2014-10-24 14:53:12 +02:00
Janne Grunau eb5ce0ca42 configure: add ARM/AArch64 NEON support
Checks for arm_neon.h header.
2014-10-09 23:22:33 +02:00
Janne Grunau 568df90edc simd: rename the region flags from SSE to SIMD
SSE is not the only supported SIMD instruction set. Keep the old names
for backward compatibility.
2014-10-09 23:22:32 +02:00
Leo Laksmana 6f160921dc On CPU that doesn't support SSE4.2 instructions set, this will fail
because incorrect header is included.

smmintrin.h => SSE4.1
nmmintrin.h => SSE4.2
2014-08-23 18:08:31 +08:00
Adam Disney c25310f215 Removed comments marking CARRY_FREE_GK additions. 2014-06-16 13:04:15 -04:00
Adam Disney d08de3bdcb Merge remote-tracking branch 'jayrde/wip-autoconf-cleanup'
Conflicts:
	.gitignore
	INSTALL
	Makefile.in
	aclocal.m4
	config.guess
	config.sub
	configure
	examples/Makefile.in
	include/config.h.in
	include/config.h.in~
	install-sh
	ltmain.sh
	m4/libtool.m4
	m4/ltversion.m4
	missing
	src/Makefile.in
	test/Makefile.in
	tools/Makefile.in
2014-06-16 12:24:06 -04:00
Kevin Greenan 259d91ad43 autoreconf'd to reflect addition of --disable-sse 2014-06-09 12:36:05 -07:00
Adam Disney 6bb1ebb9f4 Implemented CARRY_FREE_GK. Sections added are tagged with a comment //ADAM
for easy navigation.
2014-06-06 13:09:04 -04:00
Danny Al-Gaaf 13f0e8888f fix comment/message on GF_E_SP128_A/GF_E_SP128_S
Swap comments/messages on GF_E_SP128_A/GF_E_SP128_S.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-04-22 20:08:26 +02:00
Kevin Greenan 8a96cbb371 Ran autogen to pick-up the changes needed to run 'make check' 2014-04-02 10:35:21 -07:00
Jens Rosenboom 2758e242fe remove autogenerated files from repository 2014-03-18 21:53:24 +01:00
Kevin Greenan 02bc991f68 Added more header files to the distribution, which will allow
clients of the lib to take advantage of even more stuff.
2014-01-02 10:03:06 -08:00
Jim Plank f0c32c94bc Removed GROUP/128/SSE. It wasn't compiling, and it needed an overhaul.
I'll do it someday when I'm bored.
2014-01-01 11:00:40 -05:00
Jim Plank fb0bbdcf62 Fixed the problem with PCLMUL and gf_complete.h. Removed
ARCH_64 from everything but 128/GROUP/SSE.  Fortunately, no
one ever uses that.
2013-12-31 20:08:18 -05:00
Kevin Greenan 5687b9c2cc Third.1 time's a charm (autoconf non-sense for PCLMUL). 2013-12-30 22:50:04 -08:00
Kevin Greenan a98f6c1115 Added entry to configure.ac to avoid running autotools during normal build. 2013-12-30 16:31:54 -08:00
Kevin Greenan a97563f0e4 Added PCLMUL to the autoconf macro... 2013-12-30 14:14:08 -08:00
Kevin Greenan 639c106d23 Build failed... It was because the some headers were in the wrong place.
It was working for me because the headers were installed in /usr/local/include
on my Linux box.
2013-12-04 21:58:41 -08:00
Kevin Greenan 153dd20988 Setting up autoconf/automake for GF-Complete
Also re-organized the directory structure.

Signed-off-by: Kevin Greenan <kmgreen2@gmail.com>
2013-12-04 21:24:29 -08:00