Frank
|
467733eb9c
|
Add generated byte assembler using asm2plan9s
Add recompilable assembler using asm2plan9s
|
2016-07-06 21:06:00 +02:00 |
frankw
|
d4000061f2
|
Removed unnecessary JMP instruction
|
2016-07-06 09:39:02 +02:00 |
klauspost
|
efb98c83c7
|
Update asmfmt.
|
2016-01-11 14:44:44 +01:00 |
klauspost
|
a3ee8967cb
|
asmfmt assembler.
|
2015-12-14 14:57:49 +01:00 |
klauspost
|
627f48f59e
|
Add AVX2 assembler functions.
Benchmarks on a VM (therefore a bit more noisy)
benchmark old ns/op new ns/op delta
BenchmarkEncode10x2x10000-8 58372 47421 -18.76%
BenchmarkEncode100x20x10000-8 2635444 1550511 -41.17%
BenchmarkEncode17x3x1M-8 3885495 2231034 -42.58%
BenchmarkEncode10x4x16M-8 24180221 21467661 -11.22%
BenchmarkEncode5x2x1M-8 2395287 2261452 -5.59%
BenchmarkEncode10x2x1M-8 2571278 2566560 -0.18%
BenchmarkEncode10x4x1M-8 3396774 3431916 +1.03%
BenchmarkEncode50x20x1M-8 27004601 20325731 -24.73%
BenchmarkEncode17x3x16M-8 29671393 23668596 -20.23%
BenchmarkVerify10x2x10000-8 109730 101519 -7.48%
BenchmarkVerify50x5x50000-8 3904166 3101568 -20.56%
BenchmarkVerify10x2x1M-8 4398490 4721719 +7.35%
BenchmarkVerify5x2x1M-8 3174574 3296440 +3.84%
BenchmarkVerify10x4x1M-8 5247394 5346667 +1.89%
BenchmarkVerify50x20x1M-8 35742777 26154681 -26.83%
BenchmarkVerify10x4x16M-8 52873512 54931253 +3.89%
benchmark old MB/s new MB/s speedup
BenchmarkEncode10x2x10000-8 1713.14 2108.73 1.23x
BenchmarkEncode100x20x10000-8 379.44 644.95 1.70x
BenchmarkEncode17x3x1M-8 4587.78 7989.92 1.74x
BenchmarkEncode10x4x16M-8 6938.40 7815.11 1.13x
BenchmarkEncode5x2x1M-8 2188.83 2318.37 1.06x
BenchmarkEncode10x2x1M-8 4078.03 4085.53 1.00x
BenchmarkEncode10x4x1M-8 3086.98 3055.37 0.99x
BenchmarkEncode50x20x1M-8 1941.48 2579.43 1.33x
BenchmarkEncode17x3x16M-8 9612.38 12050.26 1.25x
BenchmarkVerify10x2x10000-8 911.32 985.03 1.08x
BenchmarkVerify50x5x50000-8 1280.68 1612.09 1.26x
BenchmarkVerify10x2x1M-8 2383.94 2220.75 0.93x
BenchmarkVerify5x2x1M-8 1651.52 1590.47 0.96x
BenchmarkVerify10x4x1M-8 1998.28 1961.18 0.98x
BenchmarkVerify50x20x1M-8 1466.84 2004.57 1.37x
BenchmarkVerify10x4x16M-8 3173.09 3054.22 0.96x
|
2015-12-14 14:12:09 +01:00 |
klauspost
|
dc9cd67c8c
|
PSHUFB is S(upplemental)-SSE3, not plain SSE3.
|
2015-06-24 16:57:38 +02:00 |
Klaus Post
|
f1c2cf4160
|
Don't use assembler on app engine.
|
2015-06-21 22:54:13 +02:00 |
Klaus Post
|
1388bd44c4
|
Remove comma. Apparently that is a problem on Go tip.
|
2015-06-21 21:27:32 +02:00 |
Klaus Post
|
5aa37c3492
|
Add AMD64 SSE3 Galois multiplication. Approximately 5-10x faster.
BenchmarkEncode10x2x10000 333.31 5827.17 17.48x
BenchmarkEncode10x2x10000-2 431.20 2802.53 6.50x
BenchmarkEncode10x2x10000-4 553.98 2432.95 4.39x
BenchmarkEncode10x2x10000-8 585.79 3469.61 5.92x
BenchmarkEncode100x20x10000 32.59 583.40 17.90x
BenchmarkEncode100x20x10000-2 59.52 726.70 12.21x
BenchmarkEncode100x20x10000-4 108.04 1363.25 12.62x
BenchmarkEncode100x20x10000-8 113.76 1274.62 11.20x
BenchmarkEncode17x3x1M 215.28 3141.85 14.59x
BenchmarkEncode17x3x1M-2 398.76 3650.12 9.15x
BenchmarkEncode17x3x1M-4 655.32 6071.11 9.26x
BenchmarkEncode17x3x1M-8 832.16 6616.47 7.95x
BenchmarkEncode10x4x16M 154.48 1357.30 8.79x
BenchmarkEncode10x4x16M-2 295.62 2377.92 8.04x
BenchmarkEncode10x4x16M-4 529.89 3519.49 6.64x
BenchmarkEncode10x4x16M-8 632.11 4521.90 7.15x
BenchmarkEncode5x2x1M 327.87 4879.09 14.88x
BenchmarkEncode5x2x1M-2 576.11 2599.20 4.51x
BenchmarkEncode5x2x1M-4 1043.65 3559.12 3.41x
BenchmarkEncode5x2x1M-8 1227.77 4255.34 3.47x
BenchmarkEncode10x2x1M 321.24 4574.68 14.24x
BenchmarkEncode10x2x1M-2 587.73 3100.28 5.28x
BenchmarkEncode10x2x1M-4 1101.96 4770.32 4.33x
BenchmarkEncode10x2x1M-8 1217.08 5812.17 4.78x
BenchmarkEncode10x4x1M 155.34 2037.27 13.11x
BenchmarkEncode10x4x1M-2 298.38 2470.97 8.28x
BenchmarkEncode10x4x1M-4 548.67 3603.15 6.57x
BenchmarkEncode10x4x1M-8 625.23 4827.42 7.72x
BenchmarkEncode50x20x1M 31.37 347.65 11.08x
BenchmarkEncode50x20x1M-2 59.81 713.28 11.93x
BenchmarkEncode50x20x1M-4 105.34 1175.47 11.16x
BenchmarkEncode50x20x1M-8 123.84 1491.91 12.05x
BenchmarkEncode17x3x16M 209.55 1861.59 8.88x
BenchmarkEncode17x3x16M-2 394.19 3331.73 8.45x
BenchmarkEncode17x3x16M-4 643.30 4942.74 7.68x
BenchmarkEncode17x3x16M-8 839.64 6213.43 7.40x
|
2015-06-21 21:23:22 +02:00 |