Klaus Post
cb7a0b5aef
Do fast by one multiplication ( #130 )
...
When multiplying by one we can use faster math.
2020-05-06 11:14:25 +02:00
Klaus Post
d069fb1019
Remove a bounds check in pure Go ( #123 )
...
40% faster on the pure operation.
```
benchmark old ns/op new ns/op delta
BenchmarkParallel_8x8x05M-8 2990849 2763554 -7.60%
BenchmarkParallel_8x8x1M-8 4941575 5061619 +2.43%
BenchmarkParallel_8x8x8M-8 34257722 33192541 -3.11%
BenchmarkParallel_8x8x32M-8 143157262 131654688 -8.03%
BenchmarkGalois128K-8 64201 38374 -40.23%
BenchmarkGalois1M-8 507053 307236 -39.41%
BenchmarkGaloisXor128K-8 63815 63157 -1.03%
BenchmarkGaloisXor1M-8 506369 505641 -0.14%
BenchmarkEncode10x2x10000-8 96414 92781 -3.77%
BenchmarkEncode100x20x10000-8 3188549 3238299 +1.56%
BenchmarkEncode17x3x1M-8 3741349 3633535 -2.88%
BenchmarkEncode10x4x16M-8 41628596 40306100 -3.18%
BenchmarkEncode5x2x1M-8 724162 699137 -3.46%
BenchmarkEncode10x2x1M-8 1451401 1423224 -1.94%
BenchmarkEncode10x4x1M-8 2839382 2740249 -3.49%
BenchmarkEncode50x20x1M-8 68415407 67015156 -2.05%
BenchmarkEncode17x3x16M-8 53734221 51784418 -3.63%
BenchmarkEncode_8x4x8M-8 16826004 16013691 -4.83%
BenchmarkEncode_12x4x12M-8 37544203 36392439 -3.07%
BenchmarkEncode_16x4x16M-8 66070450 69062838 +4.53%
BenchmarkEncode_16x4x32M-8 133905200 130529500 -2.52%
BenchmarkEncode_16x4x64M-8 281313400 265809900 -5.51%
BenchmarkEncode_8x5x8M-8 20789000 19866553 -4.44%
BenchmarkEncode_8x6x8M-8 25027385 25087290 +0.24%
BenchmarkEncode_8x7x8M-8 29156578 28231372 -3.17%
BenchmarkEncode_8x9x8M-8 37286413 37383431 +0.26%
BenchmarkEncode_8x10x8M-8 41722722 39786752 -4.64%
BenchmarkEncode_8x11x8M-8 45692118 43409812 -4.99%
BenchmarkEncode_8x8x05M-8 2358946 2298631 -2.56%
BenchmarkEncode_8x8x1M-8 4551026 4357599 -4.25%
BenchmarkEncode_8x8x8M-8 33596074 31951653 -4.89%
BenchmarkEncode_8x8x32M-8 135030488 127382850 -5.66%
BenchmarkEncode_24x8x24M-8 297317050 301777575 +1.50%
BenchmarkEncode_24x8x48M-8 611638100 596134400 -2.53%
BenchmarkVerify10x2x10000-8 103723 103523 -0.19%
BenchmarkVerify50x5x50000-8 2170780 2148170 -1.04%
BenchmarkVerify10x2x1M-8 1693351 1676973 -0.97%
BenchmarkVerify5x2x1M-8 997721 995888 -0.18%
BenchmarkVerify10x4x1M-8 3354687 3296939 -1.72%
BenchmarkVerify50x20x1M-8 67491300 66890056 -0.89%
BenchmarkVerify10x4x16M-8 44195152 44356146 +0.36%
BenchmarkReconstruct10x2x10000-8 24720 23373 -5.45%
BenchmarkReconstruct50x5x50000-8 880988 858684 -2.53%
BenchmarkReconstruct10x2x1M-8 387655 368900 -4.84%
BenchmarkReconstruct5x2x1M-8 191067 175841 -7.97%
BenchmarkReconstruct10x4x1M-8 1040639 1004731 -3.45%
BenchmarkReconstruct50x20x1M-8 28507103 28467956 -0.14%
BenchmarkReconstruct10x4x16M-8 15829872 15225654 -3.82%
BenchmarkReconstructData10x2x10000-8 24369 23374 -4.08%
BenchmarkReconstructData50x5x50000-8 865039 852456 -1.45%
BenchmarkReconstructData10x2x1M-8 383240 366751 -4.30%
BenchmarkReconstructData5x2x1M-8 183644 170444 -7.19%
BenchmarkReconstructData10x4x1M-8 1010537 969151 -4.10%
BenchmarkReconstructData50x20x1M-8 28288428 28051051 -0.84%
BenchmarkReconstructData10x4x16M-8 15048840 14443250 -4.02%
BenchmarkReconstructP10x2x10000-8 3219 3122 -3.01%
BenchmarkReconstructP10x5x20000-8 23574 22704 -3.69%
BenchmarkSplit10x4x160M-8 2822150 2735071 -3.09%
BenchmarkSplit5x2x5M-8 409699 311346 -24.01%
BenchmarkSplit10x2x1M-8 43767 40247 -8.04%
BenchmarkSplit10x4x10M-8 741097 566888 -23.51%
BenchmarkSplit50x20x50M-8 1913475 1682060 -12.09%
BenchmarkSplit17x3x272M-8 2059505 2095628 +1.75%
BenchmarkStreamEncode10x2x10000-8 8517255 5226284 -38.64%
BenchmarkStreamEncode100x20x10000-8 41903836 40969212 -2.23%
BenchmarkStreamEncode17x3x1M-8 12038007 14129765 +17.38%
BenchmarkStreamEncode10x4x16M-8 56512840 54821895 -2.99%
BenchmarkStreamEncode5x2x1M-8 5326508 3966411 -25.53%
BenchmarkStreamEncode10x2x1M-8 6924358 6589396 -4.84%
BenchmarkStreamEncode10x4x1M-8 9016080 8459049 -6.18%
BenchmarkStreamEncode50x20x1M-8 93583042 94021200 +0.47%
BenchmarkStreamEncode17x3x16M-8 76643714 74750193 -2.47%
BenchmarkStreamVerify10x2x10000-8 8311646 5162179 -37.89%
BenchmarkStreamVerify50x5x50000-8 19015944 18352626 -3.49%
BenchmarkStreamVerify10x2x1M-8 5738380 5441592 -5.17%
BenchmarkStreamVerify5x2x1M-8 3462751 3328057 -3.89%
BenchmarkStreamVerify10x4x1M-8 6735717 6381116 -5.26%
BenchmarkStreamVerify50x20x1M-8 29844543 29416921 -1.43%
BenchmarkStreamVerify10x4x16M-8 8512699 8375778 -1.61%
benchmark old MB/s new MB/s speedup
BenchmarkParallel_8x8x05M-8 1402.38 1517.72 1.08x
BenchmarkParallel_8x8x1M-8 1697.56 1657.30 0.98x
BenchmarkParallel_8x8x8M-8 1958.94 2021.81 1.03x
BenchmarkParallel_8x8x32M-8 1875.11 2038.94 1.09x
BenchmarkGalois128K-8 2041.59 3415.64 1.67x
BenchmarkGalois1M-8 2067.98 3412.93 1.65x
BenchmarkGaloisXor128K-8 2053.92 2075.33 1.01x
BenchmarkGaloisXor1M-8 2070.77 2073.76 1.00x
BenchmarkEncode10x2x10000-8 1037.19 1077.81 1.04x
BenchmarkEncode100x20x10000-8 313.62 308.80 0.98x
BenchmarkEncode17x3x1M-8 4764.54 4905.91 1.03x
BenchmarkEncode10x4x16M-8 4030.21 4162.45 1.03x
BenchmarkEncode5x2x1M-8 7239.93 7499.07 1.04x
BenchmarkEncode10x2x1M-8 7224.58 7367.61 1.02x
BenchmarkEncode10x4x1M-8 3692.97 3826.57 1.04x
BenchmarkEncode50x20x1M-8 766.33 782.34 1.02x
BenchmarkEncode17x3x16M-8 5307.84 5507.69 1.04x
BenchmarkEncode_8x4x8M-8 3988.40 4190.72 1.05x
BenchmarkEncode_12x4x12M-8 4021.79 4149.07 1.03x
BenchmarkEncode_16x4x16M-8 4062.87 3886.83 0.96x
BenchmarkEncode_16x4x32M-8 4009.34 4113.02 1.03x
BenchmarkEncode_16x4x64M-8 3816.89 4039.51 1.06x
BenchmarkEncode_8x5x8M-8 3228.09 3377.98 1.05x
BenchmarkEncode_8x6x8M-8 2681.42 2675.01 1.00x
BenchmarkEncode_8x7x8M-8 2301.67 2377.10 1.03x
BenchmarkEncode_8x9x8M-8 1799.82 1795.15 1.00x
BenchmarkEncode_8x10x8M-8 1608.45 1686.71 1.05x
BenchmarkEncode_8x11x8M-8 1468.72 1545.94 1.05x
BenchmarkEncode_8x8x05M-8 1778.04 1824.70 1.03x
BenchmarkEncode_8x8x1M-8 1843.23 1925.05 1.04x
BenchmarkEncode_8x8x8M-8 1997.52 2100.33 1.05x
BenchmarkEncode_8x8x32M-8 1987.96 2107.31 1.06x
BenchmarkEncode_24x8x24M-8 2031.43 2001.41 0.99x
BenchmarkEncode_24x8x48M-8 1974.96 2026.32 1.03x
BenchmarkVerify10x2x10000-8 964.10 965.97 1.00x
BenchmarkVerify50x5x50000-8 2303.32 2327.56 1.01x
BenchmarkVerify10x2x1M-8 6192.31 6252.79 1.01x
BenchmarkVerify5x2x1M-8 5254.86 5264.53 1.00x
BenchmarkVerify10x4x1M-8 3125.70 3180.45 1.02x
BenchmarkVerify50x20x1M-8 776.82 783.81 1.01x
BenchmarkVerify10x4x16M-8 3796.17 3782.39 1.00x
BenchmarkReconstruct10x2x10000-8 4045.30 4278.40 1.06x
BenchmarkReconstruct50x5x50000-8 5675.45 5822.87 1.03x
BenchmarkReconstruct10x2x1M-8 27049.21 28424.40 1.05x
BenchmarkReconstruct5x2x1M-8 27440.02 29815.96 1.09x
BenchmarkReconstruct10x4x1M-8 10076.27 10436.39 1.04x
BenchmarkReconstruct50x20x1M-8 1839.15 1841.68 1.00x
BenchmarkReconstruct10x4x16M-8 10598.45 11019.04 1.04x
BenchmarkReconstructData10x2x10000-8 4103.60 4278.25 1.04x
BenchmarkReconstructData50x5x50000-8 5780.09 5865.40 1.01x
BenchmarkReconstructData10x2x1M-8 27360.79 28590.95 1.04x
BenchmarkReconstructData5x2x1M-8 28549.19 30760.16 1.08x
BenchmarkReconstructData10x4x1M-8 10376.42 10819.53 1.04x
BenchmarkReconstructData50x20x1M-8 1853.37 1869.05 1.01x
BenchmarkReconstructData10x4x16M-8 11148.51 11615.96 1.04x
BenchmarkReconstructP10x2x10000-8 31068.70 32026.22 1.03x
BenchmarkReconstructP10x5x20000-8 8484.08 8808.93 1.04x
BenchmarkStreamEncode10x2x10000-8 11.74 19.13 1.63x
BenchmarkStreamEncode100x20x10000-8 23.86 24.41 1.02x
BenchmarkStreamEncode17x3x1M-8 1480.79 1261.58 0.85x
BenchmarkStreamEncode10x4x16M-8 2968.74 3060.31 1.03x
BenchmarkStreamEncode5x2x1M-8 984.30 1321.82 1.34x
BenchmarkStreamEncode10x2x1M-8 1514.33 1591.31 1.05x
BenchmarkStreamEncode10x4x1M-8 1163.01 1239.59 1.07x
BenchmarkStreamEncode50x20x1M-8 560.24 557.63 1.00x
BenchmarkStreamEncode17x3x16M-8 3721.28 3815.54 1.03x
BenchmarkStreamVerify10x2x10000-8 12.03 19.37 1.61x
BenchmarkStreamVerify50x5x50000-8 262.94 272.44 1.04x
BenchmarkStreamVerify10x2x1M-8 1827.30 1926.97 1.05x
BenchmarkStreamVerify5x2x1M-8 1514.08 1575.36 1.04x
BenchmarkStreamVerify10x4x1M-8 1556.74 1643.25 1.06x
BenchmarkStreamVerify50x20x1M-8 1756.73 1782.27 1.01x
BenchmarkStreamVerify10x4x16M-8 19708.46 20030.64 1.02x
```
2020-05-03 19:38:55 +02:00
Frank Wessels
79aee05119
AVX512 accelerated version resulting in a 4x speed improvement over AVX2 ( #91 )
...
The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table:
```
$ benchcmp avx2.txt avx512.txt
benchmark AVX2 MB/s AVX512 MB/s speedup
BenchmarkEncode8x8x1M-72 1681.35 4125.64 2.45x
BenchmarkEncode8x4x8M-72 1529.36 5507.97 3.60x
BenchmarkEncode8x8x8M-72 791.16 2952.29 3.73x
BenchmarkEncode8x8x32M-72 573.26 2168.61 3.78x
BenchmarkEncode12x4x12M-72 1234.41 4912.37 3.98x
BenchmarkEncode16x4x16M-72 1189.59 5138.01 4.32x
BenchmarkEncode24x8x24M-72 690.68 2583.70 3.74x
BenchmarkEncode24x8x48M-72 674.20 2643.31 3.92x
```
2019-02-10 11:17:23 +01:00
Frank Wessels
3610933d2f
Use AVX2 SIMD assembly instructions in favor of BYTE sequences. ( #73 )
...
* Use AVX2 SIMD assembly instructions in favor of BYTE sequences.
2017-11-18 16:17:10 +01:00
Frank Wessels
7b88f42e61
Add NEON support for ARM64 ( #62 )
...
* Add support for arm64 using NEON instructions
Specifically using the PMULL/PMULL2 polynomial multiplication instructions followed by a reduction step (actually two steps).
* Add ARM performance numbers
* Formatting for performance table
* Refactoring of NEON version and 256-bit wide version
* Expand test slice beyond 32 (for AVX2 and NEON) and test galMulSliceXor explicitly.
* Fix ARM code with missing function.
* Fix missing newline
2017-08-26 11:47:42 +02:00
chenzhongtao
d78bf472d8
add Update parity function ( #60 )
...
Add Update parity function
2017-08-20 11:42:39 +02:00
Klaus Post
5abf0ee302
Add options ( #46 )
...
* Add options
Make constants changeable as options.
The API remains backwards compatible.
* Update documentation.
* Fix line endings
* fmt
* fmt
* Use functions for parameters.
Much neater.
2017-02-19 11:13:22 +01:00
klauspost
1d6cefa204
Test array multiply.
2015-06-22 13:05:29 +02:00
klauspost
2b4171b9e6
Initial version
2015-06-19 16:31:24 +02:00