reedsolomon-go

Commit Graph

Author	SHA1	Message	Date
Klaus Post	3a82d28edb	Add GF16 AVX2, AVX512 and SSSE3 (#193 ) * Add GF16 AVX2 * Add SSSE3 fallback. * Fix reconstruction was skipped if first shard was empty. * Combine lookups in pure Go * Faster xor on pure Go. * Add 4way butterfly AVX2. * Add fftDIT4 avx2. Add avx512 version. Add noescape. * Remove +build space. Do size varied 800x200 bench. * Use VPTERNLOGD for avx512. * Remove refMulAdd inner loop bounds checks. ~10-20% faster	2022-07-26 12:37:28 +02:00
Klaus Post	5593e2b2dd	Unroll pure go xor loop (#172 ) * Unroll pure go xor loop Testing with `go test -bench=x1x -tags=noasm -short` ``` before: BenchmarkEncode2x1x1M-32 13658 87980 ns/op 35754.96 MB/s after: BenchmarkEncode2x1x1M-32 21633 55498 ns/op 56682.24 MB/s ```	2021-12-02 16:39:56 +01:00
Klaus Post	7761c8f7cd	Use Workflows (#169 ) * Use Workflows * Go 1.17 build tags * Do races separately.	2021-09-01 18:55:02 +02:00
Klaus Post	cb7a0b5aef	Do fast by one multiplication (#130 ) When multiplying by one we can use faster math.	2020-05-06 11:14:25 +02:00
Klaus Post	f525ef0450	Clean up build tags (#126 ) Move non-amd64 code to a separate file and remove references in other files. Fixes #125	2020-05-04 20:06:47 +02:00
Klaus Post	de70cc155f	AVX512 parallel processing (#120 ) Do concurrent processing in AVX512 mode and split jobs by cache size.	2020-05-04 09:17:40 +02:00
Klaus Post	d2cfcb8065	Add commandline arg to disable asm for tests. (#116 ) * Add commandline test args	2020-04-22 15:38:21 +02:00
Klaus Post	a9588190c0	Optimize pure Go version. (#96 ) * Optimize pure Go version. * Update docs. Add Go 1.12 CI * Avoid dst bounds check when using noasm ~ 40-50% faster. * Convert multiply table to a slice whenever used. * Split on 32 byte boundaries instead of 16 byte.	2019-03-08 10:49:27 +01:00
Frank Wessels	79aee05119	AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91 ) The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table: ``` $ benchcmp avx2.txt avx512.txt benchmark AVX2 MB/s AVX512 MB/s speedup BenchmarkEncode8x8x1M-72 1681.35 4125.64 2.45x BenchmarkEncode8x4x8M-72 1529.36 5507.97 3.60x BenchmarkEncode8x8x8M-72 791.16 2952.29 3.73x BenchmarkEncode8x8x32M-72 573.26 2168.61 3.78x BenchmarkEncode12x4x12M-72 1234.41 4912.37 3.98x BenchmarkEncode16x4x16M-72 1189.59 5138.01 4.32x BenchmarkEncode24x8x24M-72 690.68 2583.70 3.74x BenchmarkEncode24x8x48M-72 674.20 2643.31 3.92x ```	2019-02-10 11:17:23 +01:00
Frank Wessels	8885f3a1c7	Feature/ppc support (#88 ) Add accelerated PPC support.	2018-12-18 20:39:59 +01:00
Klaus Post	454fd91890	Maintenance updates. (#86 ) * Add gcc go build tags. * Update Travis. * Fix typo	2018-11-12 13:25:55 +01:00
Frank Wessels	7b88f42e61	Add NEON support for ARM64 (#62 ) * Add support for arm64 using NEON instructions Specifically using the PMULL/PMULL2 polynomial multiplication instructions followed by a reduction step (actually two steps). * Add ARM performance numbers * Formatting for performance table * Refactoring of NEON version and 256-bit wide version * Expand test slice beyond 32 (for AVX2 and NEON) and test galMulSliceXor explicitly. * Fix ARM code with missing function. * Fix missing newline	2017-08-26 11:47:42 +02:00
chenzhongtao	d78bf472d8	add Update parity function (#60 ) Add Update parity function	2017-08-20 11:42:39 +02:00
Klaus Post	5abf0ee302	Add options (#46 ) * Add options Make constants changeable as options. The API remains backwards compatible. * Update documentation. * Fix line endings * fmt * fmt * Use functions for parameters. Much neater.	2017-02-19 11:13:22 +01:00
Klaus Post	f1c2cf4160	Don't use assembler on app engine.	2015-06-21 22:54:13 +02:00
Klaus Post	5aa37c3492	Add AMD64 SSE3 Galois multiplication. Approximately 5-10x faster. BenchmarkEncode10x2x10000 333.31 5827.17 17.48x BenchmarkEncode10x2x10000-2 431.20 2802.53 6.50x BenchmarkEncode10x2x10000-4 553.98 2432.95 4.39x BenchmarkEncode10x2x10000-8 585.79 3469.61 5.92x BenchmarkEncode100x20x10000 32.59 583.40 17.90x BenchmarkEncode100x20x10000-2 59.52 726.70 12.21x BenchmarkEncode100x20x10000-4 108.04 1363.25 12.62x BenchmarkEncode100x20x10000-8 113.76 1274.62 11.20x BenchmarkEncode17x3x1M 215.28 3141.85 14.59x BenchmarkEncode17x3x1M-2 398.76 3650.12 9.15x BenchmarkEncode17x3x1M-4 655.32 6071.11 9.26x BenchmarkEncode17x3x1M-8 832.16 6616.47 7.95x BenchmarkEncode10x4x16M 154.48 1357.30 8.79x BenchmarkEncode10x4x16M-2 295.62 2377.92 8.04x BenchmarkEncode10x4x16M-4 529.89 3519.49 6.64x BenchmarkEncode10x4x16M-8 632.11 4521.90 7.15x BenchmarkEncode5x2x1M 327.87 4879.09 14.88x BenchmarkEncode5x2x1M-2 576.11 2599.20 4.51x BenchmarkEncode5x2x1M-4 1043.65 3559.12 3.41x BenchmarkEncode5x2x1M-8 1227.77 4255.34 3.47x BenchmarkEncode10x2x1M 321.24 4574.68 14.24x BenchmarkEncode10x2x1M-2 587.73 3100.28 5.28x BenchmarkEncode10x2x1M-4 1101.96 4770.32 4.33x BenchmarkEncode10x2x1M-8 1217.08 5812.17 4.78x BenchmarkEncode10x4x1M 155.34 2037.27 13.11x BenchmarkEncode10x4x1M-2 298.38 2470.97 8.28x BenchmarkEncode10x4x1M-4 548.67 3603.15 6.57x BenchmarkEncode10x4x1M-8 625.23 4827.42 7.72x BenchmarkEncode50x20x1M 31.37 347.65 11.08x BenchmarkEncode50x20x1M-2 59.81 713.28 11.93x BenchmarkEncode50x20x1M-4 105.34 1175.47 11.16x BenchmarkEncode50x20x1M-8 123.84 1491.91 12.05x BenchmarkEncode17x3x16M 209.55 1861.59 8.88x BenchmarkEncode17x3x16M-2 394.19 3331.73 8.45x BenchmarkEncode17x3x16M-4 643.30 4942.74 7.68x BenchmarkEncode17x3x16M-8 839.64 6213.43 7.40x	2015-06-21 21:23:22 +02:00

16 Commits (jerasure-matrix)