Commit Graph

51 Commits (0e7f9a6a6f2503191dce1116fb4a45c3c43a1d9c)

Author SHA1 Message Date
Shawn Zivontsis 0e7f9a6a6f
Allow zero parity shards (#161) 2021-03-08 16:13:24 +01:00
Klaus Post ab26eb4126
Add WithInversionCache and use pointer methods (#160)
There appears to be writes to value receivers.

Add `WithInversionCache(bool)` to disable cache.

Fixes #159
2021-01-13 10:21:28 +01:00
Klaus Post 7c8682430c
tests: Set full data size as number of bytes (#157)
* Clean up deps.
* tests: Set full data size as number of bytes

Use total data size (data+parity) as benchmark sizes for more consistent benchmarks.
2020-12-18 09:09:17 +01:00
Klaus Post 519603f6e1
Update packages (#154)
* Update packages

Update cpuid and clean up generated.
2020-12-09 22:56:01 +01:00
Klaus Post 653e76aa26
Faster AVX2 encoding (#153)
* Remove 50% of bounds checks when copying.
* Use RIP only addressing, free one register.

```
benchmark                                 old MB/s      new MB/s      speedup
BenchmarkGalois128K-32                    57663.49      58005.87      1.01x
BenchmarkGalois1M-32                      49479.31      49848.29      1.01x
BenchmarkGaloisXor128K-32                 46310.69      46501.88      1.00x
BenchmarkGaloisXor1M-32                   43804.86      43984.39      1.00x
BenchmarkEncode10x2x10000-32              25926.93      27457.75      1.06x
BenchmarkEncode100x20x10000-32            2635.82       2818.95       1.07x
BenchmarkEncode17x3x1M-32                 63215.11      61576.76      0.97x
BenchmarkEncode10x4x16M-32                19551.54      19505.07      1.00x
BenchmarkEncode5x2x1M-32                  79612.06      81985.14      1.03x
BenchmarkEncode10x2x1M-32                 121478.29     127739.41     1.05x
BenchmarkEncode10x4x1M-32                 70757.61      74423.67      1.05x
BenchmarkEncode50x20x1M-32                19811.96      20103.32      1.01x
BenchmarkEncode17x3x16M-32                27202.10      27825.34      1.02x
BenchmarkEncode_8x4x8M-32                 19029.04      19701.31      1.04x
BenchmarkEncode_12x4x12M-32               22449.87      22480.51      1.00x
BenchmarkEncode_16x4x16M-32               24536.74      24672.24      1.01x
BenchmarkEncode_16x4x32M-32               24381.34      24981.99      1.02x
BenchmarkEncode_16x4x64M-32               24717.69      25086.94      1.01x
BenchmarkEncode_8x5x8M-32                 16763.51      17154.04      1.02x
BenchmarkEncode_8x6x8M-32                 15067.22      15205.87      1.01x
BenchmarkEncode_8x7x8M-32                 13156.38      13589.40      1.03x
BenchmarkEncode_8x9x8M-32                 11363.74      11523.70      1.01x
BenchmarkEncode_8x10x8M-32                10359.37      10474.91      1.01x
BenchmarkEncode_8x11x8M-32                9627.07       9463.24       0.98x
BenchmarkEncode_8x8x05M-32                30104.80      32634.89      1.08x
BenchmarkEncode_8x8x1M-32                 36497.28      36425.88      1.00x
BenchmarkEncode_8x8x8M-32                 12186.19      11602.41      0.95x
BenchmarkEncode_8x8x32M-32                11670.72      11413.71      0.98x
BenchmarkEncode_24x8x24M-32               21709.83      21652.50      1.00x
BenchmarkEncode_24x8x48M-32               22494.40      22280.59      0.99x
BenchmarkVerify10x2x10000-32              10567.56      10483.91      0.99x
BenchmarkVerify50x5x50000-32              28102.84      27923.63      0.99x
BenchmarkVerify10x2x1M-32                 30298.33      30106.18      0.99x
BenchmarkVerify5x2x1M-32                  16115.91      15847.03      0.98x
BenchmarkVerify10x4x1M-32                 15382.13      14852.68      0.97x
BenchmarkVerify50x20x1M-32                8476.02       8466.24       1.00x
BenchmarkVerify10x4x16M-32                15101.03      15434.71      1.02x
BenchmarkReconstruct10x2x10000-32         26228.18      26960.19      1.03x
BenchmarkReconstruct50x5x50000-32         31091.42      30975.82      1.00x
BenchmarkReconstruct10x2x1M-32            58548.87      60281.92      1.03x
BenchmarkReconstruct5x2x1M-32             39499.23      41791.80      1.06x
BenchmarkReconstruct10x4x1M-32            41448.60      43053.15      1.04x
BenchmarkReconstruct50x20x1M-32           17185.99      17354.67      1.01x
BenchmarkReconstruct10x4x16M-32           18798.60      18847.43      1.00x
BenchmarkReconstructData10x2x10000-32     27208.48      27538.38      1.01x
BenchmarkReconstructData50x5x50000-32     32135.65      32078.91      1.00x
BenchmarkReconstructData10x2x1M-32        63180.19      67332.17      1.07x
BenchmarkReconstructData5x2x1M-32         47532.85      49932.17      1.05x
BenchmarkReconstructData10x4x1M-32        50059.14      52323.15      1.05x
BenchmarkReconstructData50x20x1M-32       26679.75      26714.11      1.00x
BenchmarkReconstructData10x4x16M-32       24854.99      24527.23      0.99x
BenchmarkReconstructP10x2x10000-32        115089.87     113229.75     0.98x
BenchmarkReconstructP10x5x20000-32        129838.75     132871.10     1.02x
BenchmarkParallel_8x8x64K-32              69951.43      69980.44      1.00x
BenchmarkParallel_8x8x05M-32              11752.94      11724.35      1.00x
BenchmarkParallel_20x10x05M-32            18553.93      18613.33      1.00x
BenchmarkParallel_8x8x1M-32               11639.19      11746.86      1.01x
BenchmarkParallel_8x8x8M-32               11799.36      11685.63      0.99x
BenchmarkParallel_8x8x32M-32              11510.94      11791.72      1.02x
BenchmarkParallel_8x3x1M-32               20268.92      20678.21      1.02x
BenchmarkParallel_8x4x1M-32               17616.05      17856.17      1.01x
BenchmarkParallel_8x5x1M-32               15590.87      15872.42      1.02x
BenchmarkStreamEncode10x2x10000-32        14917.08      15408.39      1.03x
BenchmarkStreamEncode100x20x10000-32      2014.81       2077.31       1.03x
BenchmarkStreamEncode17x3x1M-32           11839.37      12434.80      1.05x
BenchmarkStreamEncode10x4x16M-32          9151.14       9206.98       1.01x
BenchmarkStreamEncode5x2x1M-32            13598.55      13663.56      1.00x
BenchmarkStreamEncode10x2x1M-32           13192.91      13453.41      1.02x
BenchmarkStreamEncode10x4x1M-32           12109.90      12050.68      1.00x
BenchmarkStreamEncode50x20x1M-32          8640.73       8370.10       0.97x
BenchmarkStreamEncode17x3x16M-32          10473.17      10527.04      1.01x
BenchmarkStreamVerify10x2x10000-32        7032.23       7128.82       1.01x
BenchmarkStreamVerify50x5x50000-32        13023.46      13109.31      1.01x
BenchmarkStreamVerify10x2x1M-32           11941.63      11949.91      1.00x
BenchmarkStreamVerify5x2x1M-32            8029.93       8263.39       1.03x
BenchmarkStreamVerify10x4x1M-32           8137.82       8271.11       1.02x
BenchmarkStreamVerify50x20x1M-32          7378.87       7708.81       1.04x
BenchmarkStreamVerify10x4x16M-32          8973.18       8955.29       1.00x
```
2020-11-10 14:39:23 +01:00
Klaus Post 7daa20bf74
Generate AVX2 code (#141)
Replaces AVX2 up to 10x8 configurations with specific generated functions.

If code size is a concern `-tags=nogen` can be used.

Biggest speedup when not memory constrained.
```
benchmark                                old MB/s      new MB/s      speedup
BenchmarkEncode_8x5x8M                   5895.75       9648.18       1.64x
BenchmarkEncode_8x5x8M-4                 16773.41      17220.67      1.03x
BenchmarkEncode_8x5x8M-16                18263.12      17176.28      0.94x
BenchmarkEncode_8x6x8M                   5075.89       8548.39       1.68x
BenchmarkEncode_8x6x8M-4                 14559.83      15370.95      1.06x
BenchmarkEncode_8x6x8M-16                16183.37      15291.98      0.94x
BenchmarkEncode_8x7x8M                   4481.18       7015.60       1.57x
BenchmarkEncode_8x7x8M-4                 12835.35      13695.90      1.07x
BenchmarkEncode_8x7x8M-16                14246.94      13737.36      0.96x 
BenchmarkEncode_8x8x05M                  5569.95       7947.70       1.43x
BenchmarkEncode_8x8x05M-4                17334.91      25271.37      1.46x
BenchmarkEncode_8x8x05M-16               29349.42      35043.36      1.19x
BenchmarkEncode_8x8x1M                   4830.58       7891.32       1.63x
BenchmarkEncode_8x8x1M-4                 17531.36      27371.42      1.56x
BenchmarkEncode_8x8x1M-16                29593.98      39241.09      1.33x
BenchmarkEncode_8x8x8M                   3953.66       6584.26       1.67x
BenchmarkEncode_8x8x8M-4                 11527.34      12331.23      1.07x
BenchmarkEncode_8x8x8M-16                12718.89      12173.08      0.96x
BenchmarkEncode_8x8x32M                  3927.51       6195.91       1.58x
BenchmarkEncode_8x8x32M-4                11490.85      11424.39      0.99x
BenchmarkEncode_8x8x32M-16               12506.09      11888.55      0.95x

benchmark                          old MB/s     new MB/s     speedup
BenchmarkParallel_8x8x64K          5490.24      6959.57      1.27x
BenchmarkParallel_8x8x64K-4        21078.94     29557.51     1.40x
BenchmarkParallel_8x8x64K-16       57508.45     73672.54     1.28x
BenchmarkParallel_8x8x1M           4755.49      7667.84      1.61x
BenchmarkParallel_8x8x1M-4         11818.66     12013.49     1.02x
BenchmarkParallel_8x8x1M-16        12923.12     12109.42     0.94x
BenchmarkParallel_8x8x8M           3973.94      6525.85      1.64x
BenchmarkParallel_8x8x8M-4         11725.68     11312.46     0.96x
BenchmarkParallel_8x8x8M-16        12608.20     11484.98     0.91x
BenchmarkParallel_8x3x1M           14139.71     17993.04     1.27x
BenchmarkParallel_8x3x1M-4         21805.97     23053.92     1.06x
BenchmarkParallel_8x3x1M-16        24673.05     23596.71     0.96x
BenchmarkParallel_8x4x1M           10617.88     14474.54     1.36x
BenchmarkParallel_8x4x1M-4         18635.82     18965.65     1.02x
BenchmarkParallel_8x4x1M-16        21518.12     20171.47     0.94x
BenchmarkParallel_8x5x1M           8669.88      11833.96     1.36x
BenchmarkParallel_8x5x1M-4         16321.00     17500.30     1.07x
BenchmarkParallel_8x5x1M-16        17267.16     17191.04     1.00x
```
2020-05-20 12:48:34 +02:00
Klaus Post cf8495259a
Add pure XOR for 1 parity (#138)
WithFastOneParityMatrix will switch the matrix to a simple xor if there is only one parity shard.
The PAR1 matrix already has this property so it has little effect there.
2020-05-13 11:10:58 +02:00
Klaus Post 2df03bd4d1
Ci test more archs (#135)
* ci: test more architectures
2020-05-09 10:35:17 +02:00
Klaus Post 696c4018f8
bench: Fix reconstruct benchmarks (#133)
Always corrupt at least one shard and don't shuffle shards.
2020-05-06 15:42:49 +02:00
Frank Wessels 1b9e129671
Avx512 parallel81 (#131)
* AVX512 routine for 8x1 parallel processing (WIP)

* Testing and integration of Parallel81 assembly routine
2020-05-06 12:32:31 +02:00
Klaus Post cb7a0b5aef
Do fast by one multiplication (#130)
When multiplying by one we can use faster math.
2020-05-06 11:14:25 +02:00
Klaus Post 65df535980
Make single goroutine encodes more efficient (#122)
Calculate the optimal per round size to keep data in cache when not using WithAutoGoroutines.

```
λ benchcmp before.txt after.txt
benchmark                          old ns/op     new ns/op     delta
BenchmarkParallel_8x8x05M-16       675225        321053        -52.45%
BenchmarkParallel_20x10x05M-16     3471988       600740        -82.70%
BenchmarkParallel_8x8x1M-16        3948606       728093        -81.56%
BenchmarkParallel_8x8x8M-16        47361588      5976467       -87.38%
BenchmarkParallel_8x8x32M-16       195044200     24365474      -87.51%

benchmark                          old MB/s     new MB/s     speedup
BenchmarkParallel_8x8x05M-16       6211.71      13064.22     2.10x
BenchmarkParallel_20x10x05M-16     3020.10      17454.73     5.78x
BenchmarkParallel_8x8x1M-16        2124.45      11521.34     5.42x
BenchmarkParallel_8x8x8M-16        1416.95      11228.85     7.92x
BenchmarkParallel_8x8x32M-16       1376.28      11017.04     8.00x

```
2020-05-03 19:37:22 +02:00
Klaus Post d2cfcb8065
Add commandline arg to disable asm for tests. (#116)
* Add commandline test args
2020-04-22 15:38:21 +02:00
Klaus Post 0abe9de20c
Update tests (#115)
Don't create new slices.
2020-02-21 11:30:44 -08:00
dssysolyatin ec2eb9fb8c Split: Reduce memory allocation (#103)
* [Split] Reduce memory allocation in Split function
2019-06-25 16:28:24 +02:00
Klaus Post f5e73dcfe2 Split blocks into size divisible by 16
Older systems (typically without AVX2) are more sensitive to misaligned load+stores.

Add parameter to automatically set the number of goroutines.

name                  old time/op    new time/op    delta
Encode10x2x10000-8      18.4µs ± 1%    16.1µs ± 1%  -12.43%    (p=0.000 n=9+9)
Encode100x20x10000-8     692µs ± 1%     608µs ± 1%  -12.10%  (p=0.000 n=10+10)
Encode17x3x1M-8         1.78ms ± 5%    1.49ms ± 1%  -16.63%  (p=0.000 n=10+10)
Encode10x4x16M-8        21.5ms ± 5%    19.6ms ± 4%   -8.74%   (p=0.000 n=10+9)
Encode5x2x1M-8           343µs ± 2%     267µs ± 2%  -22.22%   (p=0.000 n=9+10)
Encode10x2x1M-8          858µs ± 5%     701µs ± 5%  -18.34%  (p=0.000 n=10+10)
Encode10x4x1M-8         1.34ms ± 1%    1.16ms ± 1%  -13.19%    (p=0.000 n=9+9)
Encode50x20x1M-8        30.3ms ± 4%    25.0ms ± 2%  -17.51%   (p=0.000 n=10+8)
Encode17x3x16M-8        26.9ms ± 1%    24.5ms ± 4%   -9.13%   (p=0.000 n=8+10)

name                  old speed      new speed      delta
Encode10x2x10000-8    5.45GB/s ± 1%  6.22GB/s ± 1%  +14.20%    (p=0.000 n=9+9)
Encode100x20x10000-8  1.44GB/s ± 1%  1.64GB/s ± 1%  +13.77%  (p=0.000 n=10+10)
Encode17x3x1M-8       10.0GB/s ± 5%  12.0GB/s ± 1%  +19.88%  (p=0.000 n=10+10)
Encode10x4x16M-8      7.81GB/s ± 5%  8.56GB/s ± 5%   +9.58%   (p=0.000 n=10+9)
Encode5x2x1M-8        15.3GB/s ± 2%  19.6GB/s ± 2%  +28.57%   (p=0.000 n=9+10)
Encode10x2x1M-8       12.2GB/s ± 5%  15.0GB/s ± 5%  +22.45%  (p=0.000 n=10+10)
Encode10x4x1M-8       7.84GB/s ± 1%  9.03GB/s ± 1%  +15.19%    (p=0.000 n=9+9)
Encode50x20x1M-8      1.73GB/s ± 4%  2.09GB/s ± 4%  +20.59%   (p=0.000 n=10+9)
Encode17x3x16M-8      10.6GB/s ± 1%  11.7GB/s ± 4%  +10.12%   (p=0.000 n=8+10)
2017-11-18 22:00:55 +01:00
Klaus Post 61c22eab55 Cauchy Matrix option (#70)
* Experimental Cauchy Matrix

Experimental support for Cauchy style matrix

http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf

All matrices appear reversible.

* Remove Go 1.5 and 1.6 from CI tests.

* Fix comment.

* Increase max number of goroutines+docs.
2017-10-01 14:02:11 +02:00
David Reiss ddcafc661e Allow reconstructing into pre-allocated memory. (#66)
This changes the interface of Reconstruct and ReconstructData to accept
slices of zero length but sufficient capacity for shards to reconstruct,
and reslices them instead of allocating new memory.
2017-09-20 21:08:24 +02:00
chenzhongtao d78bf472d8 add Update parity function (#60)
Add Update parity function
2017-08-20 11:42:39 +02:00
Klaus Post dc6af2dce5 Minor cleanup (#61)
* Remove some benchmarks
* Format tables a bit.
* Doc cleanup
2017-08-13 22:38:27 +02:00
Frank Wessels 0de37d7697 Add ReconstructData interface method (#57)
* Add ReconstructData interface method to allow reconstruction of any missing data shards
* Add support for just reconstructing data shards only to SteamEncoder.Reconstruct()
2017-07-20 12:15:46 +02:00
Fred Akalin 18d548df63 Add support for PAR1 (#55)
PAR1 is a file format which uses a Reed-Solomon code similar
to the current one, except it uses a different (flawed) coding
matrix.

Add support for it via a WithPAR1Matrix option, so that this code
can be used to encode/decode PAR1 files. Also add the option to
existing tests, and add a test demonstrating the flaw in PAR1's
coding matrix.

Also fix an mistakenly inverted test in testOpts().

Incidentally, PAR1 is obsoleted by PAR2, which uses GF(2^16)
and tries to fix the flaw in the coding matrix; however, PAR2's
coding matrix is still flawed! The real solution is to build the
coding matrix like in this repository.

PAR1 spec:
http://parchive.sourceforge.net/docs/specifications/parity-volume-spec-1.0/article-spec.html

Paper describing the (flawed) Reed-Solomon code used by PAR1:
http://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.html
2017-06-20 20:24:57 +02:00
Fred Akalin 87c4e5ae75 Allow 256 total shards (#54)
* Allow 256 total shards
2017-06-19 11:26:52 +02:00
Klaus Post 5abf0ee302 Add options (#46)
* Add options

Make constants changeable as options.

The API remains backwards compatible.

* Update documentation.

* Fix line endings

* fmt

* fmt

* Use functions for parameters.

Much neater.
2017-02-19 11:13:22 +01:00
Peter C c54154da9e Add Inverse Matrix caching in a Thread-Safe Lookup Tree (#36)
* Add matrix inversion caching
* Benchmark and Parallel Benchmark tests for Reconstruct
2016-09-12 21:31:07 +02:00
Christian Muehlhaeuser b1c8b4b073 Make Join return an error if a reconstruction is required first
If one or more required data shards are still nil and we can't correctly join
them before a reconstruction, return ErrReconstructRequired.
2016-08-05 19:23:08 +02:00
Harshavardhana ba30981088 Add checks for data and parity to not exceed 255 shards in total.
Fixes #16
2016-06-03 01:31:01 -07:00
xiaost 9f0bea8a29 Tests: backport go1.6 rand.Read for speedup tests 2016-04-07 18:34:47 +08:00
klauspost 976a24f33b Move examples to separate file/package
This makes the reedsolomon package prefix show up in the documentation examples.

+ StreamEncoder example.
2015-11-03 12:12:42 +01:00
lukechampine 86bd0f239b seed RNG in TestSplitJoin 2015-08-08 18:20:40 -04:00
lukechampine 458f451fc2 add codeSomeShardsP test 2015-08-08 13:52:00 -04:00
lukechampine bb7bd0036a fully test Split/Join functions 2015-08-08 13:51:11 -04:00
lukechampine 64b705bbf6 fully test Reconstruct function
Well, I can't figure out how to trigger the Invert error.
It may not be possible; need more domain knowledge to be sure.
2015-08-08 13:50:18 -04:00
lukechampine f81ea8daaf fully test Verify function 2015-08-08 13:50:18 -04:00
lukechampine 0238782585 fully test Encode function 2015-08-08 13:50:18 -04:00
lukechampine 10fbe96890 use slice literal 2015-08-06 22:56:32 -04:00
lukechampine 640ab74d9d fully test the New function 2015-08-06 22:47:11 -04:00
klauspost d31049df42 Add another example that shows that sets can be xor'ed and still remain valid. 2015-06-23 14:35:16 +02:00
klauspost 8ebf356efb The number of data shards must be below 257. Check that and update documentation. 2015-06-23 13:39:57 +02:00
klauspost 6861078d3b Add more information to example. 2015-06-22 15:52:10 +02:00
klauspost 0cb21eccc5 Rename example function. 2015-06-22 15:48:52 +02:00
klauspost 7794948a5b Add split/merge example. 2015-06-22 15:44:22 +02:00
Klaus Post 619e2b7d65 Add benchmark with 17 data shards and 3 parity shards with 16MB each, and correct comments. 2015-06-21 17:07:17 +02:00
Klaus Post ab50161bb9 Update benchmarks. 2015-06-20 20:51:26 +02:00
Klaus Post 36a0e57744 Begin docs. 2015-06-20 13:10:51 +02:00
Klaus Post d54843ee41 Add Encoder example (and test) 2015-06-20 11:29:26 +02:00
Klaus Post c5de03551c Minor adjustments for golint. 2015-06-20 10:11:33 +02:00
Klaus Post cf70107291 Add verification test that also tests failure. 2015-06-19 19:20:44 +02:00
Klaus Post e3aca6cd9d Shorten the variable names and make an encoder interface, so it isn't possible to create it without calling New. 2015-06-19 18:54:58 +02:00
Klaus Post 67f8d8b8c7 Add another benchmark. 2015-06-19 18:25:48 +02:00