Commit Graph

9 Commits (dc6af2dce5c0cc2306cc53b3cfcc2a40a3d10de7)

Author SHA1 Message Date
Frank 467733eb9c Add generated byte assembler using asm2plan9s
Add recompilable assembler using asm2plan9s
2016-07-06 21:06:00 +02:00
frankw d4000061f2 Removed unnecessary JMP instruction 2016-07-06 09:39:02 +02:00
klauspost efb98c83c7 Update asmfmt. 2016-01-11 14:44:44 +01:00
klauspost a3ee8967cb asmfmt assembler. 2015-12-14 14:57:49 +01:00
klauspost 627f48f59e Add AVX2 assembler functions.
Benchmarks on a VM (therefore a bit more noisy)

benchmark                             old ns/op     new ns/op     delta
BenchmarkEncode10x2x10000-8           58372         47421         -18.76%
BenchmarkEncode100x20x10000-8         2635444       1550511       -41.17%
BenchmarkEncode17x3x1M-8              3885495       2231034       -42.58%
BenchmarkEncode10x4x16M-8             24180221      21467661      -11.22%
BenchmarkEncode5x2x1M-8               2395287       2261452       -5.59%
BenchmarkEncode10x2x1M-8              2571278       2566560       -0.18%
BenchmarkEncode10x4x1M-8              3396774       3431916       +1.03%
BenchmarkEncode50x20x1M-8             27004601      20325731      -24.73%
BenchmarkEncode17x3x16M-8             29671393      23668596      -20.23%
BenchmarkVerify10x2x10000-8           109730        101519        -7.48%
BenchmarkVerify50x5x50000-8           3904166       3101568       -20.56%
BenchmarkVerify10x2x1M-8              4398490       4721719       +7.35%
BenchmarkVerify5x2x1M-8               3174574       3296440       +3.84%
BenchmarkVerify10x4x1M-8              5247394       5346667       +1.89%
BenchmarkVerify50x20x1M-8             35742777      26154681      -26.83%
BenchmarkVerify10x4x16M-8             52873512      54931253      +3.89%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkEncode10x2x10000-8           1713.14      2108.73      1.23x
BenchmarkEncode100x20x10000-8         379.44       644.95       1.70x
BenchmarkEncode17x3x1M-8              4587.78      7989.92      1.74x
BenchmarkEncode10x4x16M-8             6938.40      7815.11      1.13x
BenchmarkEncode5x2x1M-8               2188.83      2318.37      1.06x
BenchmarkEncode10x2x1M-8              4078.03      4085.53      1.00x
BenchmarkEncode10x4x1M-8              3086.98      3055.37      0.99x
BenchmarkEncode50x20x1M-8             1941.48      2579.43      1.33x
BenchmarkEncode17x3x16M-8             9612.38      12050.26     1.25x
BenchmarkVerify10x2x10000-8           911.32       985.03       1.08x
BenchmarkVerify50x5x50000-8           1280.68      1612.09      1.26x
BenchmarkVerify10x2x1M-8              2383.94      2220.75      0.93x
BenchmarkVerify5x2x1M-8               1651.52      1590.47      0.96x
BenchmarkVerify10x4x1M-8              1998.28      1961.18      0.98x
BenchmarkVerify50x20x1M-8             1466.84      2004.57      1.37x
BenchmarkVerify10x4x16M-8             3173.09      3054.22      0.96x
2015-12-14 14:12:09 +01:00
klauspost dc9cd67c8c PSHUFB is S(upplemental)-SSE3, not plain SSE3. 2015-06-24 16:57:38 +02:00
Klaus Post f1c2cf4160 Don't use assembler on app engine. 2015-06-21 22:54:13 +02:00
Klaus Post 1388bd44c4 Remove comma. Apparently that is a problem on Go tip. 2015-06-21 21:27:32 +02:00
Klaus Post 5aa37c3492 Add AMD64 SSE3 Galois multiplication. Approximately 5-10x faster.
BenchmarkEncode10x2x10000         333.31       5827.17      17.48x
BenchmarkEncode10x2x10000-2       431.20       2802.53      6.50x
BenchmarkEncode10x2x10000-4       553.98       2432.95      4.39x
BenchmarkEncode10x2x10000-8       585.79       3469.61      5.92x
BenchmarkEncode100x20x10000       32.59        583.40       17.90x
BenchmarkEncode100x20x10000-2     59.52        726.70       12.21x
BenchmarkEncode100x20x10000-4     108.04       1363.25      12.62x
BenchmarkEncode100x20x10000-8     113.76       1274.62      11.20x
BenchmarkEncode17x3x1M            215.28       3141.85      14.59x
BenchmarkEncode17x3x1M-2          398.76       3650.12      9.15x
BenchmarkEncode17x3x1M-4          655.32       6071.11      9.26x
BenchmarkEncode17x3x1M-8          832.16       6616.47      7.95x
BenchmarkEncode10x4x16M           154.48       1357.30      8.79x
BenchmarkEncode10x4x16M-2         295.62       2377.92      8.04x
BenchmarkEncode10x4x16M-4         529.89       3519.49      6.64x
BenchmarkEncode10x4x16M-8         632.11       4521.90      7.15x
BenchmarkEncode5x2x1M             327.87       4879.09      14.88x
BenchmarkEncode5x2x1M-2           576.11       2599.20      4.51x
BenchmarkEncode5x2x1M-4           1043.65      3559.12      3.41x
BenchmarkEncode5x2x1M-8           1227.77      4255.34      3.47x
BenchmarkEncode10x2x1M            321.24       4574.68      14.24x
BenchmarkEncode10x2x1M-2          587.73       3100.28      5.28x
BenchmarkEncode10x2x1M-4          1101.96      4770.32      4.33x
BenchmarkEncode10x2x1M-8          1217.08      5812.17      4.78x
BenchmarkEncode10x4x1M            155.34       2037.27      13.11x
BenchmarkEncode10x4x1M-2          298.38       2470.97      8.28x
BenchmarkEncode10x4x1M-4          548.67       3603.15      6.57x
BenchmarkEncode10x4x1M-8          625.23       4827.42      7.72x
BenchmarkEncode50x20x1M           31.37        347.65       11.08x
BenchmarkEncode50x20x1M-2         59.81        713.28       11.93x
BenchmarkEncode50x20x1M-4         105.34       1175.47      11.16x
BenchmarkEncode50x20x1M-8         123.84       1491.91      12.05x
BenchmarkEncode17x3x16M           209.55       1861.59      8.88x
BenchmarkEncode17x3x16M-2         394.19       3331.73      8.45x
BenchmarkEncode17x3x16M-4         643.30       4942.74      7.68x
BenchmarkEncode17x3x16M-8         839.64       6213.43      7.40x
2015-06-21 21:23:22 +02:00