reedsolomon-go

Commit Graph

Author	SHA1	Message	Date
Klaus Post	3a82d28edb	Add GF16 AVX2, AVX512 and SSSE3 (#193 ) * Add GF16 AVX2 * Add SSSE3 fallback. * Fix reconstruction was skipped if first shard was empty. * Combine lookups in pure Go * Faster xor on pure Go. * Add 4way butterfly AVX2. * Add fftDIT4 avx2. Add avx512 version. Add noescape. * Remove +build space. Do size varied 800x200 bench. * Use VPTERNLOGD for avx512. * Remove refMulAdd inner loop bounds checks. ~10-20% faster	2022-07-26 12:37:28 +02:00
Klaus Post	5593e2b2dd	Unroll pure go xor loop (#172 ) * Unroll pure go xor loop Testing with `go test -bench=x1x -tags=noasm -short` ``` before: BenchmarkEncode2x1x1M-32 13658 87980 ns/op 35754.96 MB/s after: BenchmarkEncode2x1x1M-32 21633 55498 ns/op 56682.24 MB/s ```	2021-12-02 16:39:56 +01:00
Klaus Post	7761c8f7cd	Use Workflows (#169 ) * Use Workflows * Go 1.17 build tags * Do races separately.	2021-09-01 18:55:02 +02:00
Frank Wessels	d5afb5f48e	Faster arm64 implementation that does not use PMULL instruction (#140 ) * Faster arm64 implementation that does not use PMULL instruction * Add NEON version for sliceXor	2020-05-13 10:24:22 +02:00
Klaus Post	cb7a0b5aef	Do fast by one multiplication (#130 ) When multiplying by one we can use faster math.	2020-05-06 11:14:25 +02:00
Klaus Post	f525ef0450	Clean up build tags (#126 ) Move non-amd64 code to a separate file and remove references in other files. Fixes #125	2020-05-04 20:06:47 +02:00
Klaus Post	a9588190c0	Optimize pure Go version. (#96 ) * Optimize pure Go version. * Update docs. Add Go 1.12 CI * Avoid dst bounds check when using noasm ~ 40-50% faster. * Convert multiply table to a slice whenever used. * Split on 32 byte boundaries instead of 16 byte.	2019-03-08 10:49:27 +01:00
Frank Wessels	79aee05119	AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91 ) The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table: ``` $ benchcmp avx2.txt avx512.txt benchmark AVX2 MB/s AVX512 MB/s speedup BenchmarkEncode8x8x1M-72 1681.35 4125.64 2.45x BenchmarkEncode8x4x8M-72 1529.36 5507.97 3.60x BenchmarkEncode8x8x8M-72 791.16 2952.29 3.73x BenchmarkEncode8x8x32M-72 573.26 2168.61 3.78x BenchmarkEncode12x4x12M-72 1234.41 4912.37 3.98x BenchmarkEncode16x4x16M-72 1189.59 5138.01 4.32x BenchmarkEncode24x8x24M-72 690.68 2583.70 3.74x BenchmarkEncode24x8x48M-72 674.20 2643.31 3.92x ```	2019-02-10 11:17:23 +01:00
Klaus Post	454fd91890	Maintenance updates. (#86 ) * Add gcc go build tags. * Update Travis. * Fix typo	2018-11-12 13:25:55 +01:00
Frank Wessels	7b88f42e61	Add NEON support for ARM64 (#62 ) * Add support for arm64 using NEON instructions Specifically using the PMULL/PMULL2 polynomial multiplication instructions followed by a reduction step (actually two steps). * Add ARM performance numbers * Formatting for performance table * Refactoring of NEON version and 256-bit wide version * Expand test slice beyond 32 (for AVX2 and NEON) and test galMulSliceXor explicitly. * Fix ARM code with missing function. * Fix missing newline	2017-08-26 11:47:42 +02:00

10 Commits (jerasure-matrix)