Klaus Post
7b7dbe6919
Improve fwht speed ( #198 )
...
Improve fwht speed
* Send `*[65536]ffe` instead of slice, so 16 bit lookups can be done without bounds checks.
* Unroll fwht4
* Move `s2` out of loop.
* Load values instead of modifying pointers.
```
BenchmarkDecode1K/16x16-32 1029 1175899 ns/op 27.87 MB/s 16410 B/op 17 allocs/op
BenchmarkDecode1K/32x32-32 1023 1184744 ns/op 55.32 MB/s 32794 B/op 33 allocs/op
BenchmarkDecode1K/64x64-32 979 1240467 ns/op 105.66 MB/s 65701 B/op 65 allocs/op
BenchmarkDecode1K/128x128-32 922 1314928 ns/op 199.36 MB/s 131703 B/op 129 allocs/op
BenchmarkDecode1K/256x256-32 792 1530508 ns/op 342.56 MB/s 263548 B/op 258 allocs/op
```
After:
```
BenchmarkDecode1K/16x16-32 1503 798172 ns/op 41.05 MB/s 16408 B/op 17 allocs/op
BenchmarkDecode1K/32x32-32 1483 804507 ns/op 81.46 MB/s 32792 B/op 33 allocs/op
BenchmarkDecode1K/64x64-32 1408 852737 ns/op 153.71 MB/s 65658 B/op 65 allocs/op
BenchmarkDecode1K/128x128-32 1315 917534 ns/op 285.70 MB/s 131513 B/op 129 allocs/op
BenchmarkDecode1K/256x256-32 1069 1115760 ns/op 469.89 MB/s 263689 B/op 258 allocs/op
```
2022-07-29 16:26:51 +02:00
Klaus Post
9c824807d6
docs: Fix comments ( #195 )
2022-07-27 14:18:58 +02:00
Klaus Post
77188e96d2
Add GF16 Split/Join ( #194 )
...
* Add GF16 Split/Join
Also check if we have enough shards when reconstructing.
2022-07-26 18:14:03 +02:00
Klaus Post
3a82d28edb
Add GF16 AVX2, AVX512 and SSSE3 ( #193 )
...
* Add GF16 AVX2
* Add SSSE3 fallback.
* Fix reconstruction was skipped if first shard was empty.
* Combine lookups in pure Go
* Faster xor on pure Go.
* Add 4way butterfly AVX2.
* Add fftDIT4 avx2. Add avx512 version. Add noescape.
* Remove +build space. Do size varied 800x200 bench.
* Use VPTERNLOGD for avx512.
* Remove refMulAdd inner loop bounds checks. ~10-20% faster
2022-07-26 12:37:28 +02:00
Elias Naur
49be604db0
Add an efficient implementation of shard counts up to 65536 ( #191 )
...
This is a O(n*log n) implementation of Reed-Solomon
codes, ported from the C++ library https://github.com/catid/leopard .
The implementation is based on the paper
"Novel Polynomial Basis with Fast Fourier Transform
and Its Application to Reed-Solomon Erasure Codes"
Several TODOs are left for future commits:
- Performance optimizations, in particular SIMD and multiple goroutines
- Documentation
- Detailed Tests
- Merging of reedSolomonFF16 and reedSolomon types
- Turn the straight C++ port into more idiomatic Go
This change also bumps one race testing timeout, to ensure adequate
time on CI.
2022-07-21 14:27:10 +02:00