Commit Graph

9 Commits (jerasure-matrix)

Author SHA1 Message Date
Klaus Post cb7a0b5aef
Do fast by one multiplication (#130)
When multiplying by one we can use faster math.
2020-05-06 11:14:25 +02:00
Klaus Post d069fb1019
Remove a bounds check in pure Go (#123)
40% faster on the pure operation.

```
benchmark                                old ns/op     new ns/op     delta
BenchmarkParallel_8x8x05M-8              2990849       2763554       -7.60%
BenchmarkParallel_8x8x1M-8               4941575       5061619       +2.43%
BenchmarkParallel_8x8x8M-8               34257722      33192541      -3.11%
BenchmarkParallel_8x8x32M-8              143157262     131654688     -8.03%
BenchmarkGalois128K-8                    64201         38374         -40.23%
BenchmarkGalois1M-8                      507053        307236        -39.41%
BenchmarkGaloisXor128K-8                 63815         63157         -1.03%
BenchmarkGaloisXor1M-8                   506369        505641        -0.14%
BenchmarkEncode10x2x10000-8              96414         92781         -3.77%
BenchmarkEncode100x20x10000-8            3188549       3238299       +1.56%
BenchmarkEncode17x3x1M-8                 3741349       3633535       -2.88%
BenchmarkEncode10x4x16M-8                41628596      40306100      -3.18%
BenchmarkEncode5x2x1M-8                  724162        699137        -3.46%
BenchmarkEncode10x2x1M-8                 1451401       1423224       -1.94%
BenchmarkEncode10x4x1M-8                 2839382       2740249       -3.49%
BenchmarkEncode50x20x1M-8                68415407      67015156      -2.05%
BenchmarkEncode17x3x16M-8                53734221      51784418      -3.63%
BenchmarkEncode_8x4x8M-8                 16826004      16013691      -4.83%
BenchmarkEncode_12x4x12M-8               37544203      36392439      -3.07%
BenchmarkEncode_16x4x16M-8               66070450      69062838      +4.53%
BenchmarkEncode_16x4x32M-8               133905200     130529500     -2.52%
BenchmarkEncode_16x4x64M-8               281313400     265809900     -5.51%
BenchmarkEncode_8x5x8M-8                 20789000      19866553      -4.44%
BenchmarkEncode_8x6x8M-8                 25027385      25087290      +0.24%
BenchmarkEncode_8x7x8M-8                 29156578      28231372      -3.17%
BenchmarkEncode_8x9x8M-8                 37286413      37383431      +0.26%
BenchmarkEncode_8x10x8M-8                41722722      39786752      -4.64%
BenchmarkEncode_8x11x8M-8                45692118      43409812      -4.99%
BenchmarkEncode_8x8x05M-8                2358946       2298631       -2.56%
BenchmarkEncode_8x8x1M-8                 4551026       4357599       -4.25%
BenchmarkEncode_8x8x8M-8                 33596074      31951653      -4.89%
BenchmarkEncode_8x8x32M-8                135030488     127382850     -5.66%
BenchmarkEncode_24x8x24M-8               297317050     301777575     +1.50%
BenchmarkEncode_24x8x48M-8               611638100     596134400     -2.53%
BenchmarkVerify10x2x10000-8              103723        103523        -0.19%
BenchmarkVerify50x5x50000-8              2170780       2148170       -1.04%
BenchmarkVerify10x2x1M-8                 1693351       1676973       -0.97%
BenchmarkVerify5x2x1M-8                  997721        995888        -0.18%
BenchmarkVerify10x4x1M-8                 3354687       3296939       -1.72%
BenchmarkVerify50x20x1M-8                67491300      66890056      -0.89%
BenchmarkVerify10x4x16M-8                44195152      44356146      +0.36%
BenchmarkReconstruct10x2x10000-8         24720         23373         -5.45%
BenchmarkReconstruct50x5x50000-8         880988        858684        -2.53%
BenchmarkReconstruct10x2x1M-8            387655        368900        -4.84%
BenchmarkReconstruct5x2x1M-8             191067        175841        -7.97%
BenchmarkReconstruct10x4x1M-8            1040639       1004731       -3.45%
BenchmarkReconstruct50x20x1M-8           28507103      28467956      -0.14%
BenchmarkReconstruct10x4x16M-8           15829872      15225654      -3.82%
BenchmarkReconstructData10x2x10000-8     24369         23374         -4.08%
BenchmarkReconstructData50x5x50000-8     865039        852456        -1.45%
BenchmarkReconstructData10x2x1M-8        383240        366751        -4.30%
BenchmarkReconstructData5x2x1M-8         183644        170444        -7.19%
BenchmarkReconstructData10x4x1M-8        1010537       969151        -4.10%
BenchmarkReconstructData50x20x1M-8       28288428      28051051      -0.84%
BenchmarkReconstructData10x4x16M-8       15048840      14443250      -4.02%
BenchmarkReconstructP10x2x10000-8        3219          3122          -3.01%
BenchmarkReconstructP10x5x20000-8        23574         22704         -3.69%
BenchmarkSplit10x4x160M-8                2822150       2735071       -3.09%
BenchmarkSplit5x2x5M-8                   409699        311346        -24.01%
BenchmarkSplit10x2x1M-8                  43767         40247         -8.04%
BenchmarkSplit10x4x10M-8                 741097        566888        -23.51%
BenchmarkSplit50x20x50M-8                1913475       1682060       -12.09%
BenchmarkSplit17x3x272M-8                2059505       2095628       +1.75%
BenchmarkStreamEncode10x2x10000-8        8517255       5226284       -38.64%
BenchmarkStreamEncode100x20x10000-8      41903836      40969212      -2.23%
BenchmarkStreamEncode17x3x1M-8           12038007      14129765      +17.38%
BenchmarkStreamEncode10x4x16M-8          56512840      54821895      -2.99%
BenchmarkStreamEncode5x2x1M-8            5326508       3966411       -25.53%
BenchmarkStreamEncode10x2x1M-8           6924358       6589396       -4.84%
BenchmarkStreamEncode10x4x1M-8           9016080       8459049       -6.18%
BenchmarkStreamEncode50x20x1M-8          93583042      94021200      +0.47%
BenchmarkStreamEncode17x3x16M-8          76643714      74750193      -2.47%
BenchmarkStreamVerify10x2x10000-8        8311646       5162179       -37.89%
BenchmarkStreamVerify50x5x50000-8        19015944      18352626      -3.49%
BenchmarkStreamVerify10x2x1M-8           5738380       5441592       -5.17%
BenchmarkStreamVerify5x2x1M-8            3462751       3328057       -3.89%
BenchmarkStreamVerify10x4x1M-8           6735717       6381116       -5.26%
BenchmarkStreamVerify50x20x1M-8          29844543      29416921      -1.43%
BenchmarkStreamVerify10x4x16M-8          8512699       8375778       -1.61%

benchmark                                old MB/s     new MB/s     speedup
BenchmarkParallel_8x8x05M-8              1402.38      1517.72      1.08x
BenchmarkParallel_8x8x1M-8               1697.56      1657.30      0.98x
BenchmarkParallel_8x8x8M-8               1958.94      2021.81      1.03x
BenchmarkParallel_8x8x32M-8              1875.11      2038.94      1.09x
BenchmarkGalois128K-8                    2041.59      3415.64      1.67x
BenchmarkGalois1M-8                      2067.98      3412.93      1.65x
BenchmarkGaloisXor128K-8                 2053.92      2075.33      1.01x
BenchmarkGaloisXor1M-8                   2070.77      2073.76      1.00x
BenchmarkEncode10x2x10000-8              1037.19      1077.81      1.04x
BenchmarkEncode100x20x10000-8            313.62       308.80       0.98x
BenchmarkEncode17x3x1M-8                 4764.54      4905.91      1.03x
BenchmarkEncode10x4x16M-8                4030.21      4162.45      1.03x
BenchmarkEncode5x2x1M-8                  7239.93      7499.07      1.04x
BenchmarkEncode10x2x1M-8                 7224.58      7367.61      1.02x
BenchmarkEncode10x4x1M-8                 3692.97      3826.57      1.04x
BenchmarkEncode50x20x1M-8                766.33       782.34       1.02x
BenchmarkEncode17x3x16M-8                5307.84      5507.69      1.04x
BenchmarkEncode_8x4x8M-8                 3988.40      4190.72      1.05x
BenchmarkEncode_12x4x12M-8               4021.79      4149.07      1.03x
BenchmarkEncode_16x4x16M-8               4062.87      3886.83      0.96x
BenchmarkEncode_16x4x32M-8               4009.34      4113.02      1.03x
BenchmarkEncode_16x4x64M-8               3816.89      4039.51      1.06x
BenchmarkEncode_8x5x8M-8                 3228.09      3377.98      1.05x
BenchmarkEncode_8x6x8M-8                 2681.42      2675.01      1.00x
BenchmarkEncode_8x7x8M-8                 2301.67      2377.10      1.03x
BenchmarkEncode_8x9x8M-8                 1799.82      1795.15      1.00x
BenchmarkEncode_8x10x8M-8                1608.45      1686.71      1.05x
BenchmarkEncode_8x11x8M-8                1468.72      1545.94      1.05x
BenchmarkEncode_8x8x05M-8                1778.04      1824.70      1.03x
BenchmarkEncode_8x8x1M-8                 1843.23      1925.05      1.04x
BenchmarkEncode_8x8x8M-8                 1997.52      2100.33      1.05x
BenchmarkEncode_8x8x32M-8                1987.96      2107.31      1.06x
BenchmarkEncode_24x8x24M-8               2031.43      2001.41      0.99x
BenchmarkEncode_24x8x48M-8               1974.96      2026.32      1.03x
BenchmarkVerify10x2x10000-8              964.10       965.97       1.00x
BenchmarkVerify50x5x50000-8              2303.32      2327.56      1.01x
BenchmarkVerify10x2x1M-8                 6192.31      6252.79      1.01x
BenchmarkVerify5x2x1M-8                  5254.86      5264.53      1.00x
BenchmarkVerify10x4x1M-8                 3125.70      3180.45      1.02x
BenchmarkVerify50x20x1M-8                776.82       783.81       1.01x
BenchmarkVerify10x4x16M-8                3796.17      3782.39      1.00x
BenchmarkReconstruct10x2x10000-8         4045.30      4278.40      1.06x
BenchmarkReconstruct50x5x50000-8         5675.45      5822.87      1.03x
BenchmarkReconstruct10x2x1M-8            27049.21     28424.40     1.05x
BenchmarkReconstruct5x2x1M-8             27440.02     29815.96     1.09x
BenchmarkReconstruct10x4x1M-8            10076.27     10436.39     1.04x
BenchmarkReconstruct50x20x1M-8           1839.15      1841.68      1.00x
BenchmarkReconstruct10x4x16M-8           10598.45     11019.04     1.04x
BenchmarkReconstructData10x2x10000-8     4103.60      4278.25      1.04x
BenchmarkReconstructData50x5x50000-8     5780.09      5865.40      1.01x
BenchmarkReconstructData10x2x1M-8        27360.79     28590.95     1.04x
BenchmarkReconstructData5x2x1M-8         28549.19     30760.16     1.08x
BenchmarkReconstructData10x4x1M-8        10376.42     10819.53     1.04x
BenchmarkReconstructData50x20x1M-8       1853.37      1869.05      1.01x
BenchmarkReconstructData10x4x16M-8       11148.51     11615.96     1.04x
BenchmarkReconstructP10x2x10000-8        31068.70     32026.22     1.03x
BenchmarkReconstructP10x5x20000-8        8484.08      8808.93      1.04x
BenchmarkStreamEncode10x2x10000-8        11.74        19.13        1.63x
BenchmarkStreamEncode100x20x10000-8      23.86        24.41        1.02x
BenchmarkStreamEncode17x3x1M-8           1480.79      1261.58      0.85x
BenchmarkStreamEncode10x4x16M-8          2968.74      3060.31      1.03x
BenchmarkStreamEncode5x2x1M-8            984.30       1321.82      1.34x
BenchmarkStreamEncode10x2x1M-8           1514.33      1591.31      1.05x
BenchmarkStreamEncode10x4x1M-8           1163.01      1239.59      1.07x
BenchmarkStreamEncode50x20x1M-8          560.24       557.63       1.00x
BenchmarkStreamEncode17x3x16M-8          3721.28      3815.54      1.03x
BenchmarkStreamVerify10x2x10000-8        12.03        19.37        1.61x
BenchmarkStreamVerify50x5x50000-8        262.94       272.44       1.04x
BenchmarkStreamVerify10x2x1M-8           1827.30      1926.97      1.05x
BenchmarkStreamVerify5x2x1M-8            1514.08      1575.36      1.04x
BenchmarkStreamVerify10x4x1M-8           1556.74      1643.25      1.06x
BenchmarkStreamVerify50x20x1M-8          1756.73      1782.27      1.01x
BenchmarkStreamVerify10x4x16M-8          19708.46     20030.64     1.02x
```
2020-05-03 19:38:55 +02:00
Frank Wessels 79aee05119 AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91)
The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table:

```
$ benchcmp avx2.txt avx512.txt
benchmark                      AVX2 MB/s    AVX512 MB/s   speedup
BenchmarkEncode8x8x1M-72       1681.35      4125.64       2.45x
BenchmarkEncode8x4x8M-72       1529.36      5507.97       3.60x
BenchmarkEncode8x8x8M-72        791.16      2952.29       3.73x
BenchmarkEncode8x8x32M-72       573.26      2168.61       3.78x
BenchmarkEncode12x4x12M-72     1234.41      4912.37       3.98x
BenchmarkEncode16x4x16M-72     1189.59      5138.01       4.32x
BenchmarkEncode24x8x24M-72      690.68      2583.70       3.74x
BenchmarkEncode24x8x48M-72      674.20      2643.31       3.92x
```
2019-02-10 11:17:23 +01:00
Frank Wessels 3610933d2f Use AVX2 SIMD assembly instructions in favor of BYTE sequences. (#73)
* Use AVX2 SIMD assembly instructions in favor of BYTE sequences.
2017-11-18 16:17:10 +01:00
Frank Wessels 7b88f42e61 Add NEON support for ARM64 (#62)
* Add support for arm64 using NEON instructions

Specifically using the PMULL/PMULL2 polynomial multiplication instructions followed by a reduction step (actually two steps).

* Add ARM performance numbers

* Formatting for performance table

* Refactoring of NEON version and 256-bit wide version

* Expand test slice beyond 32 (for AVX2 and NEON) and test galMulSliceXor explicitly.

* Fix ARM code with missing function.

* Fix missing newline
2017-08-26 11:47:42 +02:00
chenzhongtao d78bf472d8 add Update parity function (#60)
Add Update parity function
2017-08-20 11:42:39 +02:00
Klaus Post 5abf0ee302 Add options (#46)
* Add options

Make constants changeable as options.

The API remains backwards compatible.

* Update documentation.

* Fix line endings

* fmt

* fmt

* Use functions for parameters.

Much neater.
2017-02-19 11:13:22 +01:00
klauspost 1d6cefa204 Test array multiply. 2015-06-22 13:05:29 +02:00
klauspost 2b4171b9e6 Initial version 2015-06-19 16:31:24 +02:00