Commit Graph

5 Commits (7b7dbe6919e4bcfa7b60dd540be5987580f4037b)

Author SHA1 Message Date
Klaus Post 7b7dbe6919
Improve fwht speed (#198)
Improve fwht speed

* Send `*[65536]ffe` instead of slice, so 16 bit lookups can be done without bounds checks.
* Unroll fwht4
* Move `s2` out of loop.
* Load values instead of modifying pointers.

```
BenchmarkDecode1K/16x16-32                  1029           1175899 ns/op          27.87 MB/s       16410 B/op         17 allocs/op
BenchmarkDecode1K/32x32-32                  1023           1184744 ns/op          55.32 MB/s       32794 B/op         33 allocs/op
BenchmarkDecode1K/64x64-32                   979           1240467 ns/op         105.66 MB/s       65701 B/op         65 allocs/op
BenchmarkDecode1K/128x128-32                 922           1314928 ns/op         199.36 MB/s      131703 B/op        129 allocs/op
BenchmarkDecode1K/256x256-32                 792           1530508 ns/op         342.56 MB/s      263548 B/op        258 allocs/op
```

After:
```
BenchmarkDecode1K/16x16-32                  1503            798172 ns/op          41.05 MB/s       16408 B/op         17 allocs/op
BenchmarkDecode1K/32x32-32                  1483            804507 ns/op          81.46 MB/s       32792 B/op         33 allocs/op
BenchmarkDecode1K/64x64-32                  1408            852737 ns/op         153.71 MB/s       65658 B/op         65 allocs/op
BenchmarkDecode1K/128x128-32                1315            917534 ns/op         285.70 MB/s      131513 B/op        129 allocs/op
BenchmarkDecode1K/256x256-32                1069           1115760 ns/op         469.89 MB/s      263689 B/op        258 allocs/op
```
2022-07-29 16:26:51 +02:00
Klaus Post 9c824807d6
docs: Fix comments (#195) 2022-07-27 14:18:58 +02:00
Klaus Post 77188e96d2
Add GF16 Split/Join (#194)
* Add GF16 Split/Join

Also check if we have enough shards when reconstructing.
2022-07-26 18:14:03 +02:00
Klaus Post 3a82d28edb
Add GF16 AVX2, AVX512 and SSSE3 (#193)
* Add GF16 AVX2
* Add SSSE3 fallback.
* Fix reconstruction was skipped if first shard was empty.
* Combine lookups in pure Go
* Faster xor on pure Go.
* Add 4way butterfly AVX2.
* Add fftDIT4 avx2. Add avx512 version. Add noescape.
* Remove +build space. Do size varied 800x200 bench.
* Use VPTERNLOGD for avx512.
* Remove refMulAdd inner loop bounds checks. ~10-20% faster
2022-07-26 12:37:28 +02:00
Elias Naur 49be604db0
Add an efficient implementation of shard counts up to 65536 (#191)
This is a O(n*log n) implementation of Reed-Solomon
codes, ported from the C++ library https://github.com/catid/leopard.

The implementation is based on the paper

"Novel Polynomial Basis with Fast Fourier Transform
and Its Application to Reed-Solomon Erasure Codes"

Several TODOs are left for future commits:

- Performance optimizations, in particular SIMD and multiple goroutines
- Documentation
- Detailed Tests
- Merging of reedSolomonFF16 and reedSolomon types
- Turn the straight C++ port into more idiomatic Go

This change also bumps one race testing timeout, to ensure adequate
time on CI.
2022-07-21 14:27:10 +02:00