reedsolomon-go

Commit Graph

Author	SHA1	Message	Date
Frank Wessels	2f8e50e65c	Better test coverage for AVX512 (parallel version) (#134 )	2020-05-07 09:28:23 +02:00
Frank Wessels	1b9e129671	Avx512 parallel81 (#131 ) * AVX512 routine for 8x1 parallel processing (WIP) * Testing and integration of Parallel81 assembly routine	2020-05-06 12:32:31 +02:00
Klaus Post	de70cc155f	AVX512 parallel processing (#120 ) Do concurrent processing in AVX512 mode and split jobs by cache size.	2020-05-04 09:17:40 +02:00
Klaus Post	65df535980	Make single goroutine encodes more efficient (#122 ) Calculate the optimal per round size to keep data in cache when not using WithAutoGoroutines. ``` λ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkParallel_8x8x05M-16 675225 321053 -52.45% BenchmarkParallel_20x10x05M-16 3471988 600740 -82.70% BenchmarkParallel_8x8x1M-16 3948606 728093 -81.56% BenchmarkParallel_8x8x8M-16 47361588 5976467 -87.38% BenchmarkParallel_8x8x32M-16 195044200 24365474 -87.51% benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x05M-16 6211.71 13064.22 2.10x BenchmarkParallel_20x10x05M-16 3020.10 17454.73 5.78x BenchmarkParallel_8x8x1M-16 2124.45 11521.34 5.42x BenchmarkParallel_8x8x8M-16 1416.95 11228.85 7.92x BenchmarkParallel_8x8x32M-16 1376.28 11017.04 8.00x ```	2020-05-03 19:37:22 +02:00
Frank Wessels	0b98f5350a	Refactor AVX512 code to use Go assembly instructions. (#121 ) Additionally there is a small performance improvement using VPTERNLOGD (instead of two VPXORD instructions).	2020-05-03 13:43:52 +02:00
Klaus Post	d2cfcb8065	Add commandline arg to disable asm for tests. (#116 ) * Add commandline test args	2020-04-22 15:38:21 +02:00
Klaus Post	101092fa3b	Make AVX512 short tests (#114 ) Tests are timing out. Use shorter tests for -short.	2020-01-18 14:50:31 +01:00
Frank Wessels	79aee05119	AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91 ) The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table: ``` $ benchcmp avx2.txt avx512.txt benchmark AVX2 MB/s AVX512 MB/s speedup BenchmarkEncode8x8x1M-72 1681.35 4125.64 2.45x BenchmarkEncode8x4x8M-72 1529.36 5507.97 3.60x BenchmarkEncode8x8x8M-72 791.16 2952.29 3.73x BenchmarkEncode8x8x32M-72 573.26 2168.61 3.78x BenchmarkEncode12x4x12M-72 1234.41 4912.37 3.98x BenchmarkEncode16x4x16M-72 1189.59 5138.01 4.32x BenchmarkEncode24x8x24M-72 690.68 2583.70 3.74x BenchmarkEncode24x8x48M-72 674.20 2643.31 3.92x ```	2019-02-10 11:17:23 +01:00

8 Commits (2df03bd4d177fc6b5bb09379e8b44f1106923ca4)