gf-complete

Commit Graph

Author	SHA1	Message	Date
Bassam Tabbara	87f0d4395d	Add support for printing functions selected in gf_init There is currently no way to figure out which functions were selected during gf_init and as a result of SIMD options. This is not even possible in gdb since most functions are static. This commit adds a new macro SET_FUNCTION that records the name of the function selected during init inside the gf_internal structure. This macro only works when DEBUG_FUNCTIONS is defined during compile. Otherwise the code works exactly as it did before this change. The names of selected functions will be used during testing of SIMD runtime detection. All calls such as: gf->multiply.w32 = gf_w16_shift_multiply; need to be replaced with the following: SET_FUNCTION(gf,multiply,w32,gf_w16_shift_multiply) Also added a new flag to tools/gf_methods that will print the names of functions selected during gf_init.	2016-09-13 12:24:25 -07:00
animetosho	643743d048	Move conditional outside loop for NEON SPLIT4 implementation Seems to improve performance a fair bit	2015-11-14 16:32:25 +10:00
animetosho	05057e5635	Eliminate unnecessary VTRNs in SPLIT(16,4) NEON implementation Also makes the ARMv8 version consistent with the older one, in terms of processing width	2015-11-12 22:17:53 +10:00
animetosho	438283c12d	Use similar strategy for SPLIT(16,4) ALTMAP NEON implementation as SPLIT(32,4)	2015-11-12 21:17:13 +10:00
animetosho	f373b138aa	Initial fix for SPLIT(16,4) ALTMAP NEON (non ARMv8)	2015-11-12 21:09:44 +10:00
Janne Grunau	6fdd8bc3d3	arm: NEON optimisations for gf_w64 Optimisations for 4,64 split table region multiplications. Only used on ARMv8-A since it is not faster on ARMv7-A.	2014-10-24 14:54:55 +02:00
Janne Grunau	370c88b901	arm: NEON optimisations for gf_w32 Optimisations for 4,32 split table multiplications. Selected time_tool.sh results on a 1.7 GHz cortex-a9: Region Best (MB/s): 346.67 W-Method: 32 -m SPLIT 32 4 -r SIMD - Region Best (MB/s): 92.89 W-Method: 32 -m SPLIT 32 4 -r NOSIMD - Region Best (MB/s): 258.17 W-Method: 32 -m SPLIT 32 4 -r SIMD -r ALTMAP - Region Best (MB/s): 162.00 W-Method: 32 -m SPLIT 32 8 - Region Best (MB/s): 160.53 W-Method: 32 -m SPLIT 8 8 - Region Best (MB/s): 32.74 W-Method: 32 -m COMPOSITE 2 - - Region Best (MB/s): 199.79 W-Method: 32 -m COMPOSITE 2 - -r ALTMAP -	2014-10-24 14:54:27 +02:00
Janne Grunau	474010a91d	arm: NEON optimisations for gf_w16 Optimisations for the 4,16 split table region multiplications. Selected time_tool.sh 16 -A -B results for a 1.7 GHz cortex-a9: Region Best (MB/s): 532.14 W-Method: 16 -m SPLIT 16 4 -r SIMD - Region Best (MB/s): 212.34 W-Method: 16 -m SPLIT 16 4 -r NOSIMD - Region Best (MB/s): 801.36 W-Method: 16 -m SPLIT 16 4 -r SIMD -r ALTMAP - Region Best (MB/s): 93.20 W-Method: 16 -m SPLIT 16 4 -r NOSIMD -r ALTMAP - Region Best (MB/s): 273.99 W-Method: 16 -m SPLIT 16 8 - Region Best (MB/s): 270.81 W-Method: 16 -m SPLIT 8 8 - Region Best (MB/s): 70.42 W-Method: 16 -m COMPOSITE 2 - - Region Best (MB/s): 393.54 W-Method: 16 -m COMPOSITE 2 - -r ALTMAP -	2014-10-24 14:53:57 +02:00
Janne Grunau	bec15359de	arm: NEON optimisations for gf_w8 Optimisations for the 4,4 split table region multiplication and carry less multiplication using NEON's polynomial long multiplication. arm: w8: NEON carry less multiplication Selected time_tool.sh results for a 1.7GHz cortex-a9: Region Best (MB/s): 375.86 W-Method: 8 -m CARRY_FREE - Region Best (MB/s): 142.94 W-Method: 8 -m TABLE - Region Best (MB/s): 225.01 W-Method: 8 -m TABLE -r DOUBLE - Region Best (MB/s): 211.23 W-Method: 8 -m TABLE -r DOUBLE -r LAZY - Region Best (MB/s): 160.09 W-Method: 8 -m LOG - Region Best (MB/s): 123.61 W-Method: 8 -m LOG_ZERO - Region Best (MB/s): 123.85 W-Method: 8 -m LOG_ZERO_EXT - Region Best (MB/s): 1183.79 W-Method: 8 -m SPLIT 8 4 -r SIMD - Region Best (MB/s): 177.68 W-Method: 8 -m SPLIT 8 4 -r NOSIMD - Region Best (MB/s): 87.85 W-Method: 8 -m COMPOSITE 2 - - Region Best (MB/s): 428.59 W-Method: 8 -m COMPOSITE 2 - -r ALTMAP -	2014-10-24 14:53:35 +02:00
Janne Grunau	1311a44f7a	arm: NEON optimisations for gf_w4 Optimisations for the single table region multiplication and carry less multiplication using NEON's polynomial multiplication of 8-bit values. The single polynomial multiplication is not that useful but vector version is for region multiplication. Selected time_tool.sh results for a 1.7GHz cortex-a9: Region Best (MB/s): 672.72 W-Method: 4 -m CARRY_FREE - Region Best (MB/s): 265.84 W-Method: 4 -m BYTWO_p - Region Best (MB/s): 329.41 W-Method: 4 -m TABLE -r DOUBLE - Region Best (MB/s): 278.63 W-Method: 4 -m TABLE -r QUAD - Region Best (MB/s): 329.81 W-Method: 4 -m TABLE -r QUAD -r LAZY - Region Best (MB/s): 1318.03 W-Method: 4 -m TABLE -r SIMD - Region Best (MB/s): 165.15 W-Method: 4 -m TABLE -r NOSIMD - Region Best (MB/s): 99.73 W-Method: 4 -m LOG -	2014-10-24 14:53:12 +02:00

10 Commits (master)