Commit Graph

61 Commits (0e7f9a6a6f2503191dce1116fb4a45c3c43a1d9c)

Author SHA1 Message Date
Shawn Zivontsis 0e7f9a6a6f
Allow zero parity shards (#161) 2021-03-08 16:13:24 +01:00
Klaus Post ab26eb4126
Add WithInversionCache and use pointer methods (#160)
There appears to be writes to value receivers.

Add `WithInversionCache(bool)` to disable cache.

Fixes #159
2021-01-13 10:21:28 +01:00
Klaus Post 519603f6e1
Update packages (#154)
* Update packages

Update cpuid and clean up generated.
2020-12-09 22:56:01 +01:00
Klaus Post 653e76aa26
Faster AVX2 encoding (#153)
* Remove 50% of bounds checks when copying.
* Use RIP only addressing, free one register.

```
benchmark                                 old MB/s      new MB/s      speedup
BenchmarkGalois128K-32                    57663.49      58005.87      1.01x
BenchmarkGalois1M-32                      49479.31      49848.29      1.01x
BenchmarkGaloisXor128K-32                 46310.69      46501.88      1.00x
BenchmarkGaloisXor1M-32                   43804.86      43984.39      1.00x
BenchmarkEncode10x2x10000-32              25926.93      27457.75      1.06x
BenchmarkEncode100x20x10000-32            2635.82       2818.95       1.07x
BenchmarkEncode17x3x1M-32                 63215.11      61576.76      0.97x
BenchmarkEncode10x4x16M-32                19551.54      19505.07      1.00x
BenchmarkEncode5x2x1M-32                  79612.06      81985.14      1.03x
BenchmarkEncode10x2x1M-32                 121478.29     127739.41     1.05x
BenchmarkEncode10x4x1M-32                 70757.61      74423.67      1.05x
BenchmarkEncode50x20x1M-32                19811.96      20103.32      1.01x
BenchmarkEncode17x3x16M-32                27202.10      27825.34      1.02x
BenchmarkEncode_8x4x8M-32                 19029.04      19701.31      1.04x
BenchmarkEncode_12x4x12M-32               22449.87      22480.51      1.00x
BenchmarkEncode_16x4x16M-32               24536.74      24672.24      1.01x
BenchmarkEncode_16x4x32M-32               24381.34      24981.99      1.02x
BenchmarkEncode_16x4x64M-32               24717.69      25086.94      1.01x
BenchmarkEncode_8x5x8M-32                 16763.51      17154.04      1.02x
BenchmarkEncode_8x6x8M-32                 15067.22      15205.87      1.01x
BenchmarkEncode_8x7x8M-32                 13156.38      13589.40      1.03x
BenchmarkEncode_8x9x8M-32                 11363.74      11523.70      1.01x
BenchmarkEncode_8x10x8M-32                10359.37      10474.91      1.01x
BenchmarkEncode_8x11x8M-32                9627.07       9463.24       0.98x
BenchmarkEncode_8x8x05M-32                30104.80      32634.89      1.08x
BenchmarkEncode_8x8x1M-32                 36497.28      36425.88      1.00x
BenchmarkEncode_8x8x8M-32                 12186.19      11602.41      0.95x
BenchmarkEncode_8x8x32M-32                11670.72      11413.71      0.98x
BenchmarkEncode_24x8x24M-32               21709.83      21652.50      1.00x
BenchmarkEncode_24x8x48M-32               22494.40      22280.59      0.99x
BenchmarkVerify10x2x10000-32              10567.56      10483.91      0.99x
BenchmarkVerify50x5x50000-32              28102.84      27923.63      0.99x
BenchmarkVerify10x2x1M-32                 30298.33      30106.18      0.99x
BenchmarkVerify5x2x1M-32                  16115.91      15847.03      0.98x
BenchmarkVerify10x4x1M-32                 15382.13      14852.68      0.97x
BenchmarkVerify50x20x1M-32                8476.02       8466.24       1.00x
BenchmarkVerify10x4x16M-32                15101.03      15434.71      1.02x
BenchmarkReconstruct10x2x10000-32         26228.18      26960.19      1.03x
BenchmarkReconstruct50x5x50000-32         31091.42      30975.82      1.00x
BenchmarkReconstruct10x2x1M-32            58548.87      60281.92      1.03x
BenchmarkReconstruct5x2x1M-32             39499.23      41791.80      1.06x
BenchmarkReconstruct10x4x1M-32            41448.60      43053.15      1.04x
BenchmarkReconstruct50x20x1M-32           17185.99      17354.67      1.01x
BenchmarkReconstruct10x4x16M-32           18798.60      18847.43      1.00x
BenchmarkReconstructData10x2x10000-32     27208.48      27538.38      1.01x
BenchmarkReconstructData50x5x50000-32     32135.65      32078.91      1.00x
BenchmarkReconstructData10x2x1M-32        63180.19      67332.17      1.07x
BenchmarkReconstructData5x2x1M-32         47532.85      49932.17      1.05x
BenchmarkReconstructData10x4x1M-32        50059.14      52323.15      1.05x
BenchmarkReconstructData50x20x1M-32       26679.75      26714.11      1.00x
BenchmarkReconstructData10x4x16M-32       24854.99      24527.23      0.99x
BenchmarkReconstructP10x2x10000-32        115089.87     113229.75     0.98x
BenchmarkReconstructP10x5x20000-32        129838.75     132871.10     1.02x
BenchmarkParallel_8x8x64K-32              69951.43      69980.44      1.00x
BenchmarkParallel_8x8x05M-32              11752.94      11724.35      1.00x
BenchmarkParallel_20x10x05M-32            18553.93      18613.33      1.00x
BenchmarkParallel_8x8x1M-32               11639.19      11746.86      1.01x
BenchmarkParallel_8x8x8M-32               11799.36      11685.63      0.99x
BenchmarkParallel_8x8x32M-32              11510.94      11791.72      1.02x
BenchmarkParallel_8x3x1M-32               20268.92      20678.21      1.02x
BenchmarkParallel_8x4x1M-32               17616.05      17856.17      1.01x
BenchmarkParallel_8x5x1M-32               15590.87      15872.42      1.02x
BenchmarkStreamEncode10x2x10000-32        14917.08      15408.39      1.03x
BenchmarkStreamEncode100x20x10000-32      2014.81       2077.31       1.03x
BenchmarkStreamEncode17x3x1M-32           11839.37      12434.80      1.05x
BenchmarkStreamEncode10x4x16M-32          9151.14       9206.98       1.01x
BenchmarkStreamEncode5x2x1M-32            13598.55      13663.56      1.00x
BenchmarkStreamEncode10x2x1M-32           13192.91      13453.41      1.02x
BenchmarkStreamEncode10x4x1M-32           12109.90      12050.68      1.00x
BenchmarkStreamEncode50x20x1M-32          8640.73       8370.10       0.97x
BenchmarkStreamEncode17x3x16M-32          10473.17      10527.04      1.01x
BenchmarkStreamVerify10x2x10000-32        7032.23       7128.82       1.01x
BenchmarkStreamVerify50x5x50000-32        13023.46      13109.31      1.01x
BenchmarkStreamVerify10x2x1M-32           11941.63      11949.91      1.00x
BenchmarkStreamVerify5x2x1M-32            8029.93       8263.39       1.03x
BenchmarkStreamVerify10x4x1M-32           8137.82       8271.11       1.02x
BenchmarkStreamVerify50x20x1M-32          7378.87       7708.81       1.04x
BenchmarkStreamVerify10x4x16M-32          8973.18       8955.29       1.00x
```
2020-11-10 14:39:23 +01:00
Klaus Post 7daa20bf74
Generate AVX2 code (#141)
Replaces AVX2 up to 10x8 configurations with specific generated functions.

If code size is a concern `-tags=nogen` can be used.

Biggest speedup when not memory constrained.
```
benchmark                                old MB/s      new MB/s      speedup
BenchmarkEncode_8x5x8M                   5895.75       9648.18       1.64x
BenchmarkEncode_8x5x8M-4                 16773.41      17220.67      1.03x
BenchmarkEncode_8x5x8M-16                18263.12      17176.28      0.94x
BenchmarkEncode_8x6x8M                   5075.89       8548.39       1.68x
BenchmarkEncode_8x6x8M-4                 14559.83      15370.95      1.06x
BenchmarkEncode_8x6x8M-16                16183.37      15291.98      0.94x
BenchmarkEncode_8x7x8M                   4481.18       7015.60       1.57x
BenchmarkEncode_8x7x8M-4                 12835.35      13695.90      1.07x
BenchmarkEncode_8x7x8M-16                14246.94      13737.36      0.96x 
BenchmarkEncode_8x8x05M                  5569.95       7947.70       1.43x
BenchmarkEncode_8x8x05M-4                17334.91      25271.37      1.46x
BenchmarkEncode_8x8x05M-16               29349.42      35043.36      1.19x
BenchmarkEncode_8x8x1M                   4830.58       7891.32       1.63x
BenchmarkEncode_8x8x1M-4                 17531.36      27371.42      1.56x
BenchmarkEncode_8x8x1M-16                29593.98      39241.09      1.33x
BenchmarkEncode_8x8x8M                   3953.66       6584.26       1.67x
BenchmarkEncode_8x8x8M-4                 11527.34      12331.23      1.07x
BenchmarkEncode_8x8x8M-16                12718.89      12173.08      0.96x
BenchmarkEncode_8x8x32M                  3927.51       6195.91       1.58x
BenchmarkEncode_8x8x32M-4                11490.85      11424.39      0.99x
BenchmarkEncode_8x8x32M-16               12506.09      11888.55      0.95x

benchmark                          old MB/s     new MB/s     speedup
BenchmarkParallel_8x8x64K          5490.24      6959.57      1.27x
BenchmarkParallel_8x8x64K-4        21078.94     29557.51     1.40x
BenchmarkParallel_8x8x64K-16       57508.45     73672.54     1.28x
BenchmarkParallel_8x8x1M           4755.49      7667.84      1.61x
BenchmarkParallel_8x8x1M-4         11818.66     12013.49     1.02x
BenchmarkParallel_8x8x1M-16        12923.12     12109.42     0.94x
BenchmarkParallel_8x8x8M           3973.94      6525.85      1.64x
BenchmarkParallel_8x8x8M-4         11725.68     11312.46     0.96x
BenchmarkParallel_8x8x8M-16        12608.20     11484.98     0.91x
BenchmarkParallel_8x3x1M           14139.71     17993.04     1.27x
BenchmarkParallel_8x3x1M-4         21805.97     23053.92     1.06x
BenchmarkParallel_8x3x1M-16        24673.05     23596.71     0.96x
BenchmarkParallel_8x4x1M           10617.88     14474.54     1.36x
BenchmarkParallel_8x4x1M-4         18635.82     18965.65     1.02x
BenchmarkParallel_8x4x1M-16        21518.12     20171.47     0.94x
BenchmarkParallel_8x5x1M           8669.88      11833.96     1.36x
BenchmarkParallel_8x5x1M-4         16321.00     17500.30     1.07x
BenchmarkParallel_8x5x1M-16        17267.16     17191.04     1.00x
```
2020-05-20 12:48:34 +02:00
Klaus Post cf8495259a
Add pure XOR for 1 parity (#138)
WithFastOneParityMatrix will switch the matrix to a simple xor if there is only one parity shard.
The PAR1 matrix already has this property so it has little effect there.
2020-05-13 11:10:58 +02:00
Klaus Post 151d8c7a05
Tweak concurrency (#132) 2020-05-06 15:42:30 +02:00
Klaus Post cb7a0b5aef
Do fast by one multiplication (#130)
When multiplying by one we can use faster math.
2020-05-06 11:14:25 +02:00
Klaus Post 0e9e10435f
avx2: Add 64 bytes per loop processing (#128)
* avx2: Add 64 bytes per loop processing

Not super clean benchmark run, but `BenchmarkGalois` is consistently faster.

```
benchmark                                 old ns/op     new ns/op     delta
BenchmarkGalois128K-32                    2551          2261          -11.37%
BenchmarkGalois1M-32                      22492         21107         -6.16%
BenchmarkGaloisXor128K-32                 2972          2808          -5.52%
BenchmarkGaloisXor1M-32                   25181         23951         -4.88%
BenchmarkEncode10x2x10000-32              5081          4722          -7.07%
BenchmarkEncode100x20x10000-32            383800        346655        -9.68%
BenchmarkEncode17x3x1M-32                 264806        263191        -0.61%
BenchmarkEncode10x4x16M-32                8337857       8376910       +0.47%
BenchmarkEncode5x2x1M-32                  77119         73598         -4.57%
BenchmarkEncode10x2x1M-32                 108424        102423        -5.53%
BenchmarkEncode10x4x1M-32                 194427        184301        -5.21%
BenchmarkEncode50x20x1M-32                3870301       3747639       -3.17%
BenchmarkEncode17x3x16M-32                10617586      10602449      -0.14%
BenchmarkEncode_8x4x8M-32                 3227254       3229451       +0.07%
BenchmarkEncode_12x4x12M-32               6841898       6847261       +0.08%
BenchmarkEncode_16x4x16M-32               11153469      11048738      -0.94%
BenchmarkEncode_16x4x32M-32               21947506      21826647      -0.55%
BenchmarkEncode_16x4x64M-32               43163608      42971338      -0.45%
BenchmarkEncode_8x5x8M-32                 3856675       3780730       -1.97%
BenchmarkEncode_8x6x8M-32                 4322023       4437109       +2.66%
BenchmarkEncode_8x7x8M-32                 5011434       4959623       -1.03%
BenchmarkEncode_8x9x8M-32                 6243694       6098824       -2.32%
BenchmarkEncode_8x10x8M-32                6724456       6657099       -1.00%
BenchmarkEncode_8x11x8M-32                7207693       7340332       +1.84%
BenchmarkEncode_8x8x05M-32                176877        172183        -2.65%
BenchmarkEncode_8x8x1M-32                 309716        301743        -2.57%
BenchmarkEncode_8x8x8M-32                 5498952       5489078       -0.18%
BenchmarkEncode_8x8x32M-32                22630195      22557074      -0.32%
BenchmarkEncode_24x8x24M-32               28488886      28220702      -0.94%
BenchmarkEncode_24x8x48M-32               56124735      54862495      -2.25%
BenchmarkVerify10x2x10000-32              9874          9356          -5.25%
BenchmarkVerify50x5x50000-32              175610        159735        -9.04%
BenchmarkVerify10x2x1M-32                 331276        311726        -5.90%
BenchmarkVerify5x2x1M-32                  265466        248075        -6.55%
BenchmarkVerify10x4x1M-32                 701627        606420        -13.57%
BenchmarkVerify50x20x1M-32                4338171       4245635       -2.13%
BenchmarkVerify10x4x16M-32                12312830      11932698      -3.09%
BenchmarkReconstruct10x2x10000-32         1594          1504          -5.65%
BenchmarkReconstruct50x5x50000-32         95101         79558         -16.34%
BenchmarkReconstruct10x2x1M-32            38479         37225         -3.26%
BenchmarkReconstruct5x2x1M-32             30968         30013         -3.08%
BenchmarkReconstruct10x4x1M-32            81630         75350         -7.69%
BenchmarkReconstruct50x20x1M-32           1136952       1040156       -8.51%
BenchmarkReconstruct10x4x16M-32           685408        656484        -4.22%
BenchmarkReconstructData10x2x10000-32     1609          1486          -7.64%
BenchmarkReconstructData50x5x50000-32     87090         71512         -17.89%
BenchmarkReconstructData10x2x1M-32        31497         30347         -3.65%
BenchmarkReconstructData5x2x1M-32         23379         22611         -3.28%
BenchmarkReconstructData10x4x1M-32        63853         61035         -4.41%
BenchmarkReconstructData50x20x1M-32       1048807       966201        -7.88%
BenchmarkReconstructData10x4x16M-32       866658        892252        +2.95%
BenchmarkReconstructP10x2x10000-32        544           540           -0.74%
BenchmarkReconstructP10x5x20000-32        1242          1206          -2.90%
BenchmarkSplit10x4x160M-32                2735508       2743214       +0.28%
BenchmarkSplit5x2x5M-32                   276232        288523        +4.45%
BenchmarkSplit10x2x1M-32                  44389         45517         +2.54%
BenchmarkSplit10x4x10M-32                 477282        460888        -3.43%
BenchmarkSplit50x20x50M-32                1608821       1602105       -0.42%
BenchmarkSplit17x3x272M-32                2035932       2034705       -0.06%
BenchmarkParallel_8x8x05M-32              346733        351837        +1.47%
BenchmarkParallel_20x10x05M-32            577127        586232        +1.58%
BenchmarkParallel_8x8x1M-32               722453        729294        +0.95%
BenchmarkParallel_8x8x8M-32               5717650       5817130       +1.74%
BenchmarkParallel_8x8x32M-32              22914260      24132696      +5.32%
BenchmarkStreamEncode10x2x10000-32        6703131       7141021       +6.53%
BenchmarkStreamEncode100x20x10000-32      38175873      39767386      +4.17%
BenchmarkStreamEncode17x3x1M-32           8920549       9218973       +3.35%
BenchmarkStreamEncode10x4x16M-32          21841702      21784898      -0.26%
BenchmarkStreamEncode5x2x1M-32            4088001       3247404       -20.56%
BenchmarkStreamEncode10x2x1M-32           5860652       5932381       +1.22%
BenchmarkStreamEncode10x4x1M-32           7555172       7589960       +0.46%
BenchmarkStreamEncode50x20x1M-32          30006814      30250054      +0.81%
BenchmarkStreamEncode17x3x16M-32          32757489      32818254      +0.19%
BenchmarkStreamVerify10x2x10000-32        6714996       6831093       +1.73%
BenchmarkStreamVerify50x5x50000-32        18525904      18761767      +1.27%
BenchmarkStreamVerify10x2x1M-32           5232278       5444148       +4.05%
BenchmarkStreamVerify5x2x1M-32            3673843       3755283       +2.22%
BenchmarkStreamVerify10x4x1M-32           7184419       7185293       +0.01%
BenchmarkStreamVerify50x20x1M-32          28441187      28574766      +0.47%
BenchmarkStreamVerify10x4x16M-32          8538440       8668614       +1.52%

benchmark                                 old MB/s      new MB/s      speedup
BenchmarkGalois128K-32                    51374.59      57976.36      1.13x
BenchmarkGalois1M-32                      46620.03      49679.10      1.07x
BenchmarkGaloisXor128K-32                 44106.22      46671.56      1.06x
BenchmarkGaloisXor1M-32                   41641.82      43779.89      1.05x
BenchmarkEncode10x2x10000-32              19682.61      21176.81      1.08x
BenchmarkEncode100x20x10000-32            2605.52       2884.71       1.11x
BenchmarkEncode17x3x1M-32                 67316.54      67729.50      1.01x
BenchmarkEncode10x4x16M-32                20121.74      20027.93      1.00x
BenchmarkEncode5x2x1M-32                  67984.17      71236.47      1.05x
BenchmarkEncode10x2x1M-32                 96710.29      102377.00     1.06x
BenchmarkEncode10x4x1M-32                 53931.74      56894.82      1.05x
BenchmarkEncode50x20x1M-32                13546.44      13989.82      1.03x
BenchmarkEncode17x3x16M-32                26862.29      26900.64      1.00x
BenchmarkEncode_8x4x8M-32                 20794.42      20780.27      1.00x
BenchmarkEncode_12x4x12M-32               22069.16      22051.88      1.00x
BenchmarkEncode_16x4x16M-32               24067.44      24295.58      1.01x
BenchmarkEncode_16x4x32M-32               24461.59      24597.04      1.01x
BenchmarkEncode_16x4x64M-32               24876.09      24987.40      1.00x
BenchmarkEncode_8x5x8M-32                 17400.71      17750.24      1.02x
BenchmarkEncode_8x6x8M-32                 15527.19      15124.46      0.97x
BenchmarkEncode_8x7x8M-32                 13391.15      13531.04      1.01x
BenchmarkEncode_8x9x8M-32                 10748.26      11003.58      1.02x
BenchmarkEncode_8x10x8M-32                9979.82       10080.80      1.01x
BenchmarkEncode_8x11x8M-32                9310.73       9142.48       0.98x
BenchmarkEncode_8x8x05M-32                23713.12      24359.50      1.03x
BenchmarkEncode_8x8x1M-32                 27084.87      27800.50      1.03x
BenchmarkEncode_8x8x8M-32                 12203.94      12225.89      1.00x
BenchmarkEncode_8x8x32M-32                11861.83      11900.28      1.00x
BenchmarkEncode_24x8x24M-32               21200.54      21402.01      1.01x
BenchmarkEncode_24x8x48M-32               21522.77      22017.95      1.02x
BenchmarkVerify10x2x10000-32              10127.24      10688.01      1.06x
BenchmarkVerify50x5x50000-32              28472.25      31301.75      1.10x
BenchmarkVerify10x2x1M-32                 31652.63      33637.74      1.06x
BenchmarkVerify5x2x1M-32                  19749.74      21134.27      1.07x
BenchmarkVerify10x4x1M-32                 14944.92      17291.25      1.16x
BenchmarkVerify50x20x1M-32                12085.46      12348.87      1.02x
BenchmarkVerify10x4x16M-32                13625.80      14059.87      1.03x
BenchmarkReconstruct10x2x10000-32         62723.68      66470.81      1.06x
BenchmarkReconstruct50x5x50000-32         52575.87      62847.32      1.20x
BenchmarkReconstruct10x2x1M-32            272507.04     281685.84     1.03x
BenchmarkReconstruct5x2x1M-32             169299.03     174685.39     1.03x
BenchmarkReconstruct10x4x1M-32            128455.17     139161.42     1.08x
BenchmarkReconstruct50x20x1M-32           46113.48      50404.73      1.09x
BenchmarkReconstruct10x4x16M-32           244777.11     255561.72     1.04x
BenchmarkReconstructData10x2x10000-32     62160.46      67305.98      1.08x
BenchmarkReconstructData50x5x50000-32     57411.81      69917.97      1.22x
BenchmarkReconstructData10x2x1M-32        332909.82     345526.29     1.04x
BenchmarkReconstructData5x2x1M-32         224254.60     231868.74     1.03x
BenchmarkReconstructData10x4x1M-32        164216.61     171799.68     1.05x
BenchmarkReconstructData50x20x1M-32       49988.98      54262.82      1.09x
BenchmarkReconstructData10x4x16M-32       193585.15     188032.29     0.97x
BenchmarkReconstructP10x2x10000-32        183806.57     185284.57     1.01x
BenchmarkReconstructP10x5x20000-32        160985.46     165852.51     1.03x
BenchmarkParallel_8x8x05M-32              12096.63      11921.17      0.99x
BenchmarkParallel_20x10x05M-32            18168.91      17886.72      0.98x
BenchmarkParallel_8x8x1M-32               11611.28      11502.36      0.99x
BenchmarkParallel_8x8x8M-32               11737.14      11536.42      0.98x
BenchmarkParallel_8x8x32M-32              11714.78      11123.31      0.95x
BenchmarkStreamEncode10x2x10000-32        14.92         14.00         0.94x
BenchmarkStreamEncode100x20x10000-32      26.19         25.15         0.96x
BenchmarkStreamEncode17x3x1M-32           1998.28       1933.60       0.97x
BenchmarkStreamEncode10x4x16M-32          7681.28       7701.31       1.00x
BenchmarkStreamEncode5x2x1M-32            1282.50       1614.48       1.26x
BenchmarkStreamEncode10x2x1M-32           1789.18       1767.55       0.99x
BenchmarkStreamEncode10x4x1M-32           1387.89       1381.53       1.00x
BenchmarkStreamEncode50x20x1M-32          1747.23       1733.18       0.99x
BenchmarkStreamEncode17x3x16M-32          8706.79       8690.67       1.00x
BenchmarkStreamVerify10x2x10000-32        14.89         14.64         0.98x
BenchmarkStreamVerify50x5x50000-32        269.89        266.50        0.99x
BenchmarkStreamVerify10x2x1M-32           2004.05       1926.06       0.96x
BenchmarkStreamVerify5x2x1M-32            1427.08       1396.13       0.98x
BenchmarkStreamVerify10x4x1M-32           1459.51       1459.34       1.00x
BenchmarkStreamVerify50x20x1M-32          1843.41       1834.79       1.00x
BenchmarkStreamVerify10x4x16M-32          19649.04      19353.98      0.98x
```
2020-05-05 16:36:01 +02:00
Klaus Post de70cc155f
AVX512 parallel processing (#120)
Do concurrent processing in AVX512 mode and split jobs by cache size.
2020-05-04 09:17:40 +02:00
Klaus Post 65df535980
Make single goroutine encodes more efficient (#122)
Calculate the optimal per round size to keep data in cache when not using WithAutoGoroutines.

```
λ benchcmp before.txt after.txt
benchmark                          old ns/op     new ns/op     delta
BenchmarkParallel_8x8x05M-16       675225        321053        -52.45%
BenchmarkParallel_20x10x05M-16     3471988       600740        -82.70%
BenchmarkParallel_8x8x1M-16        3948606       728093        -81.56%
BenchmarkParallel_8x8x8M-16        47361588      5976467       -87.38%
BenchmarkParallel_8x8x32M-16       195044200     24365474      -87.51%

benchmark                          old MB/s     new MB/s     speedup
BenchmarkParallel_8x8x05M-16       6211.71      13064.22     2.10x
BenchmarkParallel_20x10x05M-16     3020.10      17454.73     5.78x
BenchmarkParallel_8x8x1M-16        2124.45      11521.34     5.42x
BenchmarkParallel_8x8x8M-16        1416.95      11228.85     7.92x
BenchmarkParallel_8x8x32M-16       1376.28      11017.04     8.00x

```
2020-05-03 19:37:22 +02:00
Klaus Post c3634dce94
Use CPU cache to set minSplitSize (#117)
Use L1 cache size to set default split size.
2020-04-22 16:12:18 +02:00
Andreas Auernhammer 1f1369aa84 limit capacity of shards to shard size (#109)
This commit limits the capacity (additionally
to the length) of each shard to the shard size.

Before this change the following code behaves in
an unexpected way:
```
shards := encoder.Split(buffer)
// ...
shards[0] = shards[0][:cap(shards[0])
```

Instead of restoring the length of `shards[0]` to
the shard size, it assigns the entire memory of `buffer`
to `shards[0]`.
2019-09-27 16:30:26 -07:00
dssysolyatin 7890684129 Improve quick check for case when dataOnly is true (#105) 2019-06-25 16:30:44 +02:00
dssysolyatin ec2eb9fb8c Split: Reduce memory allocation (#103)
* [Split] Reduce memory allocation in Split function
2019-06-25 16:28:24 +02:00
Klaus Post a9588190c0
Optimize pure Go version. (#96)
* Optimize pure Go version.
* Update docs. Add Go 1.12 CI

* Avoid dst bounds check when using noasm ~ 40-50% faster.
* Convert multiply table to a slice whenever used.
* Split on 32 byte boundaries instead of 16 byte.
2019-03-08 10:49:27 +01:00
Klaus Post 09979cdf93 Start documentation with method name.
Replaces #92
2019-02-15 15:31:43 +01:00
Frank Wessels 79aee05119 AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91)
The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table:

```
$ benchcmp avx2.txt avx512.txt
benchmark                      AVX2 MB/s    AVX512 MB/s   speedup
BenchmarkEncode8x8x1M-72       1681.35      4125.64       2.45x
BenchmarkEncode8x4x8M-72       1529.36      5507.97       3.60x
BenchmarkEncode8x8x8M-72        791.16      2952.29       3.73x
BenchmarkEncode8x8x32M-72       573.26      2168.61       3.78x
BenchmarkEncode12x4x12M-72     1234.41      4912.37       3.98x
BenchmarkEncode16x4x16M-72     1189.59      5138.01       4.32x
BenchmarkEncode24x8x24M-72      690.68      2583.70       3.74x
BenchmarkEncode24x8x48M-72      674.20      2643.31       3.92x
```
2019-02-10 11:17:23 +01:00
Klaus Post 278ba25f43 Pre-slice input. 2018-11-16 00:23:56 +01:00
Klaus Post f5e73dcfe2 Split blocks into size divisible by 16
Older systems (typically without AVX2) are more sensitive to misaligned load+stores.

Add parameter to automatically set the number of goroutines.

name                  old time/op    new time/op    delta
Encode10x2x10000-8      18.4µs ± 1%    16.1µs ± 1%  -12.43%    (p=0.000 n=9+9)
Encode100x20x10000-8     692µs ± 1%     608µs ± 1%  -12.10%  (p=0.000 n=10+10)
Encode17x3x1M-8         1.78ms ± 5%    1.49ms ± 1%  -16.63%  (p=0.000 n=10+10)
Encode10x4x16M-8        21.5ms ± 5%    19.6ms ± 4%   -8.74%   (p=0.000 n=10+9)
Encode5x2x1M-8           343µs ± 2%     267µs ± 2%  -22.22%   (p=0.000 n=9+10)
Encode10x2x1M-8          858µs ± 5%     701µs ± 5%  -18.34%  (p=0.000 n=10+10)
Encode10x4x1M-8         1.34ms ± 1%    1.16ms ± 1%  -13.19%    (p=0.000 n=9+9)
Encode50x20x1M-8        30.3ms ± 4%    25.0ms ± 2%  -17.51%   (p=0.000 n=10+8)
Encode17x3x16M-8        26.9ms ± 1%    24.5ms ± 4%   -9.13%   (p=0.000 n=8+10)

name                  old speed      new speed      delta
Encode10x2x10000-8    5.45GB/s ± 1%  6.22GB/s ± 1%  +14.20%    (p=0.000 n=9+9)
Encode100x20x10000-8  1.44GB/s ± 1%  1.64GB/s ± 1%  +13.77%  (p=0.000 n=10+10)
Encode17x3x1M-8       10.0GB/s ± 5%  12.0GB/s ± 1%  +19.88%  (p=0.000 n=10+10)
Encode10x4x16M-8      7.81GB/s ± 5%  8.56GB/s ± 5%   +9.58%   (p=0.000 n=10+9)
Encode5x2x1M-8        15.3GB/s ± 2%  19.6GB/s ± 2%  +28.57%   (p=0.000 n=9+10)
Encode10x2x1M-8       12.2GB/s ± 5%  15.0GB/s ± 5%  +22.45%  (p=0.000 n=10+10)
Encode10x4x1M-8       7.84GB/s ± 1%  9.03GB/s ± 1%  +15.19%    (p=0.000 n=9+9)
Encode50x20x1M-8      1.73GB/s ± 4%  2.09GB/s ± 4%  +20.59%   (p=0.000 n=10+9)
Encode17x3x16M-8      10.6GB/s ± 1%  11.7GB/s ± 4%  +10.12%   (p=0.000 n=8+10)
2017-11-18 22:00:55 +01:00
Klaus Post 61c22eab55 Cauchy Matrix option (#70)
* Experimental Cauchy Matrix

Experimental support for Cauchy style matrix

http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf

All matrices appear reversible.

* Remove Go 1.5 and 1.6 from CI tests.

* Fix comment.

* Increase max number of goroutines+docs.
2017-10-01 14:02:11 +02:00
David Reiss ddcafc661e Allow reconstructing into pre-allocated memory. (#66)
This changes the interface of Reconstruct and ReconstructData to accept
slices of zero length but sufficient capacity for shards to reconstruct,
and reslices them instead of allocating new memory.
2017-09-20 21:08:24 +02:00
chenzhongtao d78bf472d8 add Update parity function (#60)
Add Update parity function
2017-08-20 11:42:39 +02:00
Andreas Auernhammer 48a4fd05f1 fix unnecessary memory alloc in Split (#59)
Split divided the data into `DataShards` blocks and allocates all parity blocks.

This change adds a check whether the capacity of data is large enough to hold all
data and parity blocks. It only allocates parity blocks if necessary.
2017-07-22 16:16:58 +02:00
Frank Wessels 0de37d7697 Add ReconstructData interface method (#57)
* Add ReconstructData interface method to allow reconstruction of any missing data shards
* Add support for just reconstructing data shards only to SteamEncoder.Reconstruct()
2017-07-20 12:15:46 +02:00
Klaus Post 0dd0a0e50c Fix error grammar
Fixes #56
2017-07-16 17:00:58 +02:00
Fred Akalin 18d548df63 Add support for PAR1 (#55)
PAR1 is a file format which uses a Reed-Solomon code similar
to the current one, except it uses a different (flawed) coding
matrix.

Add support for it via a WithPAR1Matrix option, so that this code
can be used to encode/decode PAR1 files. Also add the option to
existing tests, and add a test demonstrating the flaw in PAR1's
coding matrix.

Also fix an mistakenly inverted test in testOpts().

Incidentally, PAR1 is obsoleted by PAR2, which uses GF(2^16)
and tries to fix the flaw in the coding matrix; however, PAR2's
coding matrix is still flawed! The real solution is to build the
coding matrix like in this repository.

PAR1 spec:
http://parchive.sourceforge.net/docs/specifications/parity-volume-spec-1.0/article-spec.html

Paper describing the (flawed) Reed-Solomon code used by PAR1:
http://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.html
2017-06-20 20:24:57 +02:00
Fred Akalin 87c4e5ae75 Allow 256 total shards (#54)
* Allow 256 total shards
2017-06-19 11:26:52 +02:00
Klaus Post 5abf0ee302 Add options (#46)
* Add options

Make constants changeable as options.

The API remains backwards compatible.

* Update documentation.

* Fix line endings

* fmt

* fmt

* Use functions for parameters.

Much neater.
2017-02-19 11:13:22 +01:00
Peter C c54154da9e Add Inverse Matrix caching in a Thread-Safe Lookup Tree (#36)
* Add matrix inversion caching
* Benchmark and Parallel Benchmark tests for Reconstruct
2016-09-12 21:31:07 +02:00
Christian Muehlhaeuser b1c8b4b073 Make Join return an error if a reconstruction is required first
If one or more required data shards are still nil and we can't correctly join
them before a reconstruction, return ErrReconstructRequired.
2016-08-05 19:23:08 +02:00
Harshavardhana ba30981088 Add checks for data and parity to not exceed 255 shards in total.
Fixes #16
2016-06-03 01:31:01 -07:00
Klaus Post 4fadad8564 Update reedsolomon.go
Fix comment
2016-05-01 12:00:51 +02:00
Klaus Post ed06f926b9 Merge pull request #20 from harshavardhana/fix
ErrShortData shouldn't be returned for data less than dataShards.
2016-05-01 11:58:47 +02:00
Harshavardhana df175d2921 ErrShortData shouldn't be returned for data less than dataShards.
The reasoning behind this is that if we have a data block number
of 10, and parity of 10.  Restricting input such that files of
size < 10Bytes should be errored out doesn't seem like the right
approach.

Most erasure subsystems will have static data and parity blocks,
in such case erroring out is not correct since reedsolomon itself
doesn't provide this limitation (please correct me here if i am
wrong :-)).

So removing the check itself is not a problem since most of the
data after the split would be padded with zeros, which is okay
and should be left as application optimization if they wish to
pack small files in this range.

ErrShortData will be still returned in case if the size of data
is empty, or in case of streaming if the size == 0.
2016-04-29 20:38:45 -07:00
Harshavardhana 0b630aea27 use bytes.Equal rather than bytes.Compare 2016-04-29 14:12:03 -07:00
xiaost 4048a541c8 Optimized encoding & decoding goroutines number
hardware: E5-2630 v2  (Intel x86-64 with ssse3)
software: linux, go1.6, GOMAXPROCS=2

Performances                          before          after         change

BenchmarkEncode10x2x10000-2           2884.95 MB/s    2837.93 MB/s  0.98x
BenchmarkEncode100x20x10000-2          593.93 MB/s     577.17 MB/s  0.97x
BenchmarkEncode17x3x1M-2              2903.74 MB/s    5197.99 MB/s  1.80x
BenchmarkEncode10x4x16M-2             1992.13 MB/s    3689.69 MB/s  1.85x
BenchmarkEncode5x2x1M-2               2883.78 MB/s    7506.19 MB/s  2.60x
BenchmarkEncode10x2x1M-2              3205.63 MB/s    7848.12 MB/s  2.45x
BenchmarkEncode10x4x1M-2              2218.35 MB/s    3998.35 MB/s  1.80x
BenchmarkEncode50x20x1M-2              579.24 MB/s     641.08 MB/s  1.11x
BenchmarkEncode17x3x16M-2             2652.36 MB/s    4775.41 MB/s  1.80x
BenchmarkVerify10x2x10000-2           1327.27 MB/s    1837.41 MB/s  1.38x
BenchmarkVerify50x5x50000-2           1481.89 MB/s    2684.57 MB/s  1.81x
BenchmarkVerify10x2x1M-2              1553.91 MB/s    5704.71 MB/s  3.67x
BenchmarkVerify5x2x1M-2                939.90 MB/s    4949.30 MB/s  5.26x
BenchmarkVerify10x4x1M-2               956.89 MB/s    3191.01 MB/s  3.33x
BenchmarkVerify50x20x1M-2              490.49 MB/s     823.46 MB/s  1.68x
BenchmarkVerify10x4x16M-2             1078.03 MB/s    3196.97 MB/s  2.97x
BenchmarkStreamEncode10x2x10000-2        2.40 MB/s      12.10 MB/s  5.04x
BenchmarkStreamEncode100x20x10000-2      6.72 MB/s      10.72 MB/s  1.60x
BenchmarkStreamEncode17x3x1M-2         390.75 MB/s     845.08 MB/s  2.16x
BenchmarkStreamEncode10x4x16M-2       1175.93 MB/s    1803.71 MB/s  1.53x
BenchmarkStreamEncode5x2x1M-2          207.85 MB/s     790.02 MB/s  3.80x
BenchmarkStreamEncode10x2x1M-2         296.77 MB/s     872.41 MB/s  2.94x
BenchmarkStreamEncode10x4x1M-2         264.43 MB/s     699.25 MB/s  2.64x
BenchmarkStreamEncode50x20x1M-2        284.93 MB/s     414.65 MB/s  1.46x
BenchmarkStreamEncode17x3x16M-2       1439.13 MB/s    1933.42 MB/s  1.34x
BenchmarkStreamVerify10x2x10000-2        2.33 MB/s      12.07 MB/s  5.18x
BenchmarkStreamVerify50x5x50000-2       86.53 MB/s     136.02 MB/s  1.57x
BenchmarkStreamVerify10x2x1M-2         315.65 MB/s     909.44 MB/s  2.88x
BenchmarkStreamVerify5x2x1M-2          180.45 MB/s     772.42 MB/s  4.28x
BenchmarkStreamVerify10x4x1M-2         310.35 MB/s     779.26 MB/s  2.51x
BenchmarkStreamVerify50x20x1M-2        547.23 MB/s     773.74 MB/s  1.41x
BenchmarkStreamVerify10x4x16M-2       4128.01 MB/s    6606.43 MB/s  1.60x
2016-04-12 15:41:22 +08:00
klauspost 180472d98f Make documentation conform to go vet. 2015-11-03 12:09:36 +01:00
lukechampine 295bf27a3d fix Split panic 2015-08-08 16:38:55 -04:00
lukechampine 0bd572bc5b tweak Split/Join functions 2015-08-08 13:51:12 -04:00
lukechampine 64b705bbf6 fully test Reconstruct function
Well, I can't figure out how to trigger the Invert error.
It may not be possible; need more domain knowledge to be sure.
2015-08-08 13:50:18 -04:00
lukechampine cf985d4451 remove unreachable checkShards case
this case would be caught by shardSize anyway
2015-08-08 13:50:18 -04:00
lukechampine 5784cfa7ff remove impossible errors 2015-08-06 22:46:27 -04:00
klauspost 8ebf356efb The number of data shards must be below 257. Check that and update documentation. 2015-06-23 13:39:57 +02:00
klauspost 5c2ef3ae72 Always check/return errors. 2015-06-23 12:16:26 +02:00
klauspost 7381e0b7b5 - Only run multiple goroutines if size is bigger than splitsize.
- Update docs
2015-06-23 11:18:29 +02:00
klauspost 83703c37ac Add package documentation and clarify interface docs. 2015-06-22 15:12:05 +02:00
Klaus Post 5aa37c3492 Add AMD64 SSE3 Galois multiplication. Approximately 5-10x faster.
BenchmarkEncode10x2x10000         333.31       5827.17      17.48x
BenchmarkEncode10x2x10000-2       431.20       2802.53      6.50x
BenchmarkEncode10x2x10000-4       553.98       2432.95      4.39x
BenchmarkEncode10x2x10000-8       585.79       3469.61      5.92x
BenchmarkEncode100x20x10000       32.59        583.40       17.90x
BenchmarkEncode100x20x10000-2     59.52        726.70       12.21x
BenchmarkEncode100x20x10000-4     108.04       1363.25      12.62x
BenchmarkEncode100x20x10000-8     113.76       1274.62      11.20x
BenchmarkEncode17x3x1M            215.28       3141.85      14.59x
BenchmarkEncode17x3x1M-2          398.76       3650.12      9.15x
BenchmarkEncode17x3x1M-4          655.32       6071.11      9.26x
BenchmarkEncode17x3x1M-8          832.16       6616.47      7.95x
BenchmarkEncode10x4x16M           154.48       1357.30      8.79x
BenchmarkEncode10x4x16M-2         295.62       2377.92      8.04x
BenchmarkEncode10x4x16M-4         529.89       3519.49      6.64x
BenchmarkEncode10x4x16M-8         632.11       4521.90      7.15x
BenchmarkEncode5x2x1M             327.87       4879.09      14.88x
BenchmarkEncode5x2x1M-2           576.11       2599.20      4.51x
BenchmarkEncode5x2x1M-4           1043.65      3559.12      3.41x
BenchmarkEncode5x2x1M-8           1227.77      4255.34      3.47x
BenchmarkEncode10x2x1M            321.24       4574.68      14.24x
BenchmarkEncode10x2x1M-2          587.73       3100.28      5.28x
BenchmarkEncode10x2x1M-4          1101.96      4770.32      4.33x
BenchmarkEncode10x2x1M-8          1217.08      5812.17      4.78x
BenchmarkEncode10x4x1M            155.34       2037.27      13.11x
BenchmarkEncode10x4x1M-2          298.38       2470.97      8.28x
BenchmarkEncode10x4x1M-4          548.67       3603.15      6.57x
BenchmarkEncode10x4x1M-8          625.23       4827.42      7.72x
BenchmarkEncode50x20x1M           31.37        347.65       11.08x
BenchmarkEncode50x20x1M-2         59.81        713.28       11.93x
BenchmarkEncode50x20x1M-4         105.34       1175.47      11.16x
BenchmarkEncode50x20x1M-8         123.84       1491.91      12.05x
BenchmarkEncode17x3x16M           209.55       1861.59      8.88x
BenchmarkEncode17x3x16M-2         394.19       3331.73      8.45x
BenchmarkEncode17x3x16M-4         643.30       4942.74      7.68x
BenchmarkEncode17x3x16M-8         839.64       6213.43      7.40x
2015-06-21 21:23:22 +02:00
Klaus Post 17e9fa30f0 Add Join function for join data shards. 2015-06-21 13:25:12 +02:00
Klaus Post 437e364842 Adjust splitsize:
benchmark                         old ns/op     new ns/op     delta
BenchmarkEncode10x2x10000-2       243613        229413        -5.83%
BenchmarkEncode100x20x10000-2     23041318      19311104      -16.19%
BenchmarkEncode17x3x1M-2          54469780      49602836      -8.94%
BenchmarkEncode10x4x16M-2         674538600     647037000     -4.08%

Bigger sizes (1024) yeilds slower less speedup.
2015-06-20 20:32:52 +02:00