Commit Graph

8 Commits (101092fa3bb75aba15f0399e4b3927f3bf1233fa)

Author SHA1 Message Date
Klaus Post 0883d2f011 Only enable AVX512 on AMD64
Fixes #102
2019-05-26 12:12:55 +02:00
Frank Wessels 79aee05119 AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91)
The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table:

```
$ benchcmp avx2.txt avx512.txt
benchmark                      AVX2 MB/s    AVX512 MB/s   speedup
BenchmarkEncode8x8x1M-72       1681.35      4125.64       2.45x
BenchmarkEncode8x4x8M-72       1529.36      5507.97       3.60x
BenchmarkEncode8x8x8M-72        791.16      2952.29       3.73x
BenchmarkEncode8x8x32M-72       573.26      2168.61       3.78x
BenchmarkEncode12x4x12M-72     1234.41      4912.37       3.98x
BenchmarkEncode16x4x16M-72     1189.59      5138.01       4.32x
BenchmarkEncode24x8x24M-72      690.68      2583.70       3.74x
BenchmarkEncode24x8x48M-72      674.20      2643.31       3.92x
```
2019-02-10 11:17:23 +01:00
Klaus Post f5e73dcfe2 Split blocks into size divisible by 16
Older systems (typically without AVX2) are more sensitive to misaligned load+stores.

Add parameter to automatically set the number of goroutines.

name                  old time/op    new time/op    delta
Encode10x2x10000-8      18.4µs ± 1%    16.1µs ± 1%  -12.43%    (p=0.000 n=9+9)
Encode100x20x10000-8     692µs ± 1%     608µs ± 1%  -12.10%  (p=0.000 n=10+10)
Encode17x3x1M-8         1.78ms ± 5%    1.49ms ± 1%  -16.63%  (p=0.000 n=10+10)
Encode10x4x16M-8        21.5ms ± 5%    19.6ms ± 4%   -8.74%   (p=0.000 n=10+9)
Encode5x2x1M-8           343µs ± 2%     267µs ± 2%  -22.22%   (p=0.000 n=9+10)
Encode10x2x1M-8          858µs ± 5%     701µs ± 5%  -18.34%  (p=0.000 n=10+10)
Encode10x4x1M-8         1.34ms ± 1%    1.16ms ± 1%  -13.19%    (p=0.000 n=9+9)
Encode50x20x1M-8        30.3ms ± 4%    25.0ms ± 2%  -17.51%   (p=0.000 n=10+8)
Encode17x3x16M-8        26.9ms ± 1%    24.5ms ± 4%   -9.13%   (p=0.000 n=8+10)

name                  old speed      new speed      delta
Encode10x2x10000-8    5.45GB/s ± 1%  6.22GB/s ± 1%  +14.20%    (p=0.000 n=9+9)
Encode100x20x10000-8  1.44GB/s ± 1%  1.64GB/s ± 1%  +13.77%  (p=0.000 n=10+10)
Encode17x3x1M-8       10.0GB/s ± 5%  12.0GB/s ± 1%  +19.88%  (p=0.000 n=10+10)
Encode10x4x16M-8      7.81GB/s ± 5%  8.56GB/s ± 5%   +9.58%   (p=0.000 n=10+9)
Encode5x2x1M-8        15.3GB/s ± 2%  19.6GB/s ± 2%  +28.57%   (p=0.000 n=9+10)
Encode10x2x1M-8       12.2GB/s ± 5%  15.0GB/s ± 5%  +22.45%  (p=0.000 n=10+10)
Encode10x4x1M-8       7.84GB/s ± 1%  9.03GB/s ± 1%  +15.19%    (p=0.000 n=9+9)
Encode50x20x1M-8      1.73GB/s ± 4%  2.09GB/s ± 4%  +20.59%   (p=0.000 n=10+9)
Encode17x3x16M-8      10.6GB/s ± 1%  11.7GB/s ± 4%  +10.12%   (p=0.000 n=8+10)
2017-11-18 22:00:55 +01:00
Klaus Post 61c22eab55 Cauchy Matrix option (#70)
* Experimental Cauchy Matrix

Experimental support for Cauchy style matrix

http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf

All matrices appear reversible.

* Remove Go 1.5 and 1.6 from CI tests.

* Fix comment.

* Increase max number of goroutines+docs.
2017-10-01 14:02:11 +02:00
chenzhongtao d78bf472d8 add Update parity function (#60)
Add Update parity function
2017-08-20 11:42:39 +02:00
Fred Akalin 18d548df63 Add support for PAR1 (#55)
PAR1 is a file format which uses a Reed-Solomon code similar
to the current one, except it uses a different (flawed) coding
matrix.

Add support for it via a WithPAR1Matrix option, so that this code
can be used to encode/decode PAR1 files. Also add the option to
existing tests, and add a test demonstrating the flaw in PAR1's
coding matrix.

Also fix an mistakenly inverted test in testOpts().

Incidentally, PAR1 is obsoleted by PAR2, which uses GF(2^16)
and tries to fix the flaw in the coding matrix; however, PAR2's
coding matrix is still flawed! The real solution is to build the
coding matrix like in this repository.

PAR1 spec:
http://parchive.sourceforge.net/docs/specifications/parity-volume-spec-1.0/article-spec.html

Paper describing the (flawed) Reed-Solomon code used by PAR1:
http://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.html
2017-06-20 20:24:57 +02:00
Klaus Post dde6ad55c5 Set correct field in WithMinSplitSize
Fixes #51
2017-05-28 12:38:06 +02:00
Klaus Post 5abf0ee302 Add options (#46)
* Add options

Make constants changeable as options.

The API remains backwards compatible.

* Update documentation.

* Fix line endings

* fmt

* fmt

* Use functions for parameters.

Much neater.
2017-02-19 11:13:22 +01:00