Commit Graph

18 Commits (b933ef1add0b9ebb756b52a03ba5017b68c486e4)

Author SHA1 Message Date
Vitaliy Filippov b933ef1add Implement jerasure algorithm of matrix generation for interoperability 2022-08-15 14:30:30 +03:00
Vitaliy Filippov 10e7890be7
Add custom coding matrix support (#187)
Co-authored-by: Vitaliy Filippov <vitalif@yourcmc.ru>
2022-06-17 11:43:51 +02:00
Vitaliy Filippov 1ef513248a
Publish withSSE/withAVX options (#186)
Co-authored-by: Vitaliy Filippov <vitalif@yourcmc.ru>
2022-06-17 11:42:53 +02:00
Klaus Post ab26eb4126
Add WithInversionCache and use pointer methods (#160)
There appears to be writes to value receivers.

Add `WithInversionCache(bool)` to disable cache.

Fixes #159
2021-01-13 10:21:28 +01:00
Klaus Post 519603f6e1
Update packages (#154)
* Update packages

Update cpuid and clean up generated.
2020-12-09 22:56:01 +01:00
Klaus Post cf8495259a
Add pure XOR for 1 parity (#138)
WithFastOneParityMatrix will switch the matrix to a simple xor if there is only one parity shard.
The PAR1 matrix already has this property so it has little effect there.
2020-05-13 11:10:58 +02:00
Klaus Post abb309aca7
Fix stream allocations (#129)
Numbers speak for themselves:

```
benchmark                                old ns/op     new ns/op     delta
BenchmarkStreamEncode10x2x10000-32       4792420       7937          -99.83%
BenchmarkStreamEncode100x20x10000-32     38424066      473285        -98.77%
BenchmarkStreamEncode17x3x1M-32          8195036       1482191       -81.91%
BenchmarkStreamEncode10x4x16M-32         21356715      18051773      -15.47%
BenchmarkStreamEncode5x2x1M-32           3295827       412301        -87.49%
BenchmarkStreamEncode10x2x1M-32          5249011       798828        -84.78%
BenchmarkStreamEncode10x4x1M-32          6392974       904818        -85.85%
BenchmarkStreamEncode50x20x1M-32         29083474      7199282       -75.25%
BenchmarkStreamEncode17x3x16M-32         32451850      28036421      -13.61%
BenchmarkStreamVerify10x2x10000-32       4858416       12988         -99.73%
BenchmarkStreamVerify50x5x50000-32       17047361      377003        -97.79%
BenchmarkStreamVerify10x2x1M-32          4869964       887214        -81.78%
BenchmarkStreamVerify5x2x1M-32           3282999       591669        -81.98%
BenchmarkStreamVerify10x4x1M-32          5824392       1230888       -78.87%
BenchmarkStreamVerify50x20x1M-32         27301648      6204613       -77.27%
BenchmarkStreamVerify10x4x16M-32         8508963       18845695      +121.48%

benchmark                                old MB/s     new MB/s     speedup
BenchmarkStreamEncode10x2x10000-32       20.87        12599.82     603.73x
BenchmarkStreamEncode100x20x10000-32     26.03        2112.89      81.17x
BenchmarkStreamEncode17x3x1M-32          2175.19      12026.65     5.53x
BenchmarkStreamEncode10x4x16M-32         7855.71      9293.94      1.18x
BenchmarkStreamEncode5x2x1M-32           1590.76      12716.14     7.99x
BenchmarkStreamEncode10x2x1M-32          1997.66      13126.43     6.57x
BenchmarkStreamEncode10x4x1M-32          1640.20      11588.81     7.07x
BenchmarkStreamEncode50x20x1M-32         1802.70      7282.50      4.04x
BenchmarkStreamEncode17x3x16M-32         8788.80      10172.93     1.16x
BenchmarkStreamVerify10x2x10000-32       20.58        7699.20      374.11x
BenchmarkStreamVerify50x5x50000-32       293.30       13262.49     45.22x
BenchmarkStreamVerify10x2x1M-32          2153.15      11818.75     5.49x
BenchmarkStreamVerify5x2x1M-32           1596.98      8861.17      5.55x
BenchmarkStreamVerify10x4x1M-32          1800.32      8518.86      4.73x
BenchmarkStreamVerify50x20x1M-32         1920.35      8449.97      4.40x
BenchmarkStreamVerify10x4x16M-32         19717.11     8902.41      0.45x
```
2020-05-05 16:35:35 +02:00
Klaus Post 65df535980
Make single goroutine encodes more efficient (#122)
Calculate the optimal per round size to keep data in cache when not using WithAutoGoroutines.

```
λ benchcmp before.txt after.txt
benchmark                          old ns/op     new ns/op     delta
BenchmarkParallel_8x8x05M-16       675225        321053        -52.45%
BenchmarkParallel_20x10x05M-16     3471988       600740        -82.70%
BenchmarkParallel_8x8x1M-16        3948606       728093        -81.56%
BenchmarkParallel_8x8x8M-16        47361588      5976467       -87.38%
BenchmarkParallel_8x8x32M-16       195044200     24365474      -87.51%

benchmark                          old MB/s     new MB/s     speedup
BenchmarkParallel_8x8x05M-16       6211.71      13064.22     2.10x
BenchmarkParallel_20x10x05M-16     3020.10      17454.73     5.78x
BenchmarkParallel_8x8x1M-16        2124.45      11521.34     5.42x
BenchmarkParallel_8x8x8M-16        1416.95      11228.85     7.92x
BenchmarkParallel_8x8x32M-16       1376.28      11017.04     8.00x

```
2020-05-03 19:37:22 +02:00
Klaus Post c3634dce94
Use CPU cache to set minSplitSize (#117)
Use L1 cache size to set default split size.
2020-04-22 16:12:18 +02:00
Klaus Post d2cfcb8065
Add commandline arg to disable asm for tests. (#116)
* Add commandline test args
2020-04-22 15:38:21 +02:00
Klaus Post 0883d2f011 Only enable AVX512 on AMD64
Fixes #102
2019-05-26 12:12:55 +02:00
Frank Wessels 79aee05119 AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91)
The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table:

```
$ benchcmp avx2.txt avx512.txt
benchmark                      AVX2 MB/s    AVX512 MB/s   speedup
BenchmarkEncode8x8x1M-72       1681.35      4125.64       2.45x
BenchmarkEncode8x4x8M-72       1529.36      5507.97       3.60x
BenchmarkEncode8x8x8M-72        791.16      2952.29       3.73x
BenchmarkEncode8x8x32M-72       573.26      2168.61       3.78x
BenchmarkEncode12x4x12M-72     1234.41      4912.37       3.98x
BenchmarkEncode16x4x16M-72     1189.59      5138.01       4.32x
BenchmarkEncode24x8x24M-72      690.68      2583.70       3.74x
BenchmarkEncode24x8x48M-72      674.20      2643.31       3.92x
```
2019-02-10 11:17:23 +01:00
Klaus Post f5e73dcfe2 Split blocks into size divisible by 16
Older systems (typically without AVX2) are more sensitive to misaligned load+stores.

Add parameter to automatically set the number of goroutines.

name                  old time/op    new time/op    delta
Encode10x2x10000-8      18.4µs ± 1%    16.1µs ± 1%  -12.43%    (p=0.000 n=9+9)
Encode100x20x10000-8     692µs ± 1%     608µs ± 1%  -12.10%  (p=0.000 n=10+10)
Encode17x3x1M-8         1.78ms ± 5%    1.49ms ± 1%  -16.63%  (p=0.000 n=10+10)
Encode10x4x16M-8        21.5ms ± 5%    19.6ms ± 4%   -8.74%   (p=0.000 n=10+9)
Encode5x2x1M-8           343µs ± 2%     267µs ± 2%  -22.22%   (p=0.000 n=9+10)
Encode10x2x1M-8          858µs ± 5%     701µs ± 5%  -18.34%  (p=0.000 n=10+10)
Encode10x4x1M-8         1.34ms ± 1%    1.16ms ± 1%  -13.19%    (p=0.000 n=9+9)
Encode50x20x1M-8        30.3ms ± 4%    25.0ms ± 2%  -17.51%   (p=0.000 n=10+8)
Encode17x3x16M-8        26.9ms ± 1%    24.5ms ± 4%   -9.13%   (p=0.000 n=8+10)

name                  old speed      new speed      delta
Encode10x2x10000-8    5.45GB/s ± 1%  6.22GB/s ± 1%  +14.20%    (p=0.000 n=9+9)
Encode100x20x10000-8  1.44GB/s ± 1%  1.64GB/s ± 1%  +13.77%  (p=0.000 n=10+10)
Encode17x3x1M-8       10.0GB/s ± 5%  12.0GB/s ± 1%  +19.88%  (p=0.000 n=10+10)
Encode10x4x16M-8      7.81GB/s ± 5%  8.56GB/s ± 5%   +9.58%   (p=0.000 n=10+9)
Encode5x2x1M-8        15.3GB/s ± 2%  19.6GB/s ± 2%  +28.57%   (p=0.000 n=9+10)
Encode10x2x1M-8       12.2GB/s ± 5%  15.0GB/s ± 5%  +22.45%  (p=0.000 n=10+10)
Encode10x4x1M-8       7.84GB/s ± 1%  9.03GB/s ± 1%  +15.19%    (p=0.000 n=9+9)
Encode50x20x1M-8      1.73GB/s ± 4%  2.09GB/s ± 4%  +20.59%   (p=0.000 n=10+9)
Encode17x3x16M-8      10.6GB/s ± 1%  11.7GB/s ± 4%  +10.12%   (p=0.000 n=8+10)
2017-11-18 22:00:55 +01:00
Klaus Post 61c22eab55 Cauchy Matrix option (#70)
* Experimental Cauchy Matrix

Experimental support for Cauchy style matrix

http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf

All matrices appear reversible.

* Remove Go 1.5 and 1.6 from CI tests.

* Fix comment.

* Increase max number of goroutines+docs.
2017-10-01 14:02:11 +02:00
chenzhongtao d78bf472d8 add Update parity function (#60)
Add Update parity function
2017-08-20 11:42:39 +02:00
Fred Akalin 18d548df63 Add support for PAR1 (#55)
PAR1 is a file format which uses a Reed-Solomon code similar
to the current one, except it uses a different (flawed) coding
matrix.

Add support for it via a WithPAR1Matrix option, so that this code
can be used to encode/decode PAR1 files. Also add the option to
existing tests, and add a test demonstrating the flaw in PAR1's
coding matrix.

Also fix an mistakenly inverted test in testOpts().

Incidentally, PAR1 is obsoleted by PAR2, which uses GF(2^16)
and tries to fix the flaw in the coding matrix; however, PAR2's
coding matrix is still flawed! The real solution is to build the
coding matrix like in this repository.

PAR1 spec:
http://parchive.sourceforge.net/docs/specifications/parity-volume-spec-1.0/article-spec.html

Paper describing the (flawed) Reed-Solomon code used by PAR1:
http://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.html
2017-06-20 20:24:57 +02:00
Klaus Post dde6ad55c5 Set correct field in WithMinSplitSize
Fixes #51
2017-05-28 12:38:06 +02:00
Klaus Post 5abf0ee302 Add options (#46)
* Add options

Make constants changeable as options.

The API remains backwards compatible.

* Update documentation.

* Fix line endings

* fmt

* fmt

* Use functions for parameters.

Much neater.
2017-02-19 11:13:22 +01:00