Klaus Post
7761c8f7cd
Use Workflows ( #169 )
...
* Use Workflows
* Go 1.17 build tags
* Do races separately.
2021-09-01 18:55:02 +02:00
Klaus Post
653e76aa26
Faster AVX2 encoding ( #153 )
...
* Remove 50% of bounds checks when copying.
* Use RIP only addressing, free one register.
```
benchmark old MB/s new MB/s speedup
BenchmarkGalois128K-32 57663.49 58005.87 1.01x
BenchmarkGalois1M-32 49479.31 49848.29 1.01x
BenchmarkGaloisXor128K-32 46310.69 46501.88 1.00x
BenchmarkGaloisXor1M-32 43804.86 43984.39 1.00x
BenchmarkEncode10x2x10000-32 25926.93 27457.75 1.06x
BenchmarkEncode100x20x10000-32 2635.82 2818.95 1.07x
BenchmarkEncode17x3x1M-32 63215.11 61576.76 0.97x
BenchmarkEncode10x4x16M-32 19551.54 19505.07 1.00x
BenchmarkEncode5x2x1M-32 79612.06 81985.14 1.03x
BenchmarkEncode10x2x1M-32 121478.29 127739.41 1.05x
BenchmarkEncode10x4x1M-32 70757.61 74423.67 1.05x
BenchmarkEncode50x20x1M-32 19811.96 20103.32 1.01x
BenchmarkEncode17x3x16M-32 27202.10 27825.34 1.02x
BenchmarkEncode_8x4x8M-32 19029.04 19701.31 1.04x
BenchmarkEncode_12x4x12M-32 22449.87 22480.51 1.00x
BenchmarkEncode_16x4x16M-32 24536.74 24672.24 1.01x
BenchmarkEncode_16x4x32M-32 24381.34 24981.99 1.02x
BenchmarkEncode_16x4x64M-32 24717.69 25086.94 1.01x
BenchmarkEncode_8x5x8M-32 16763.51 17154.04 1.02x
BenchmarkEncode_8x6x8M-32 15067.22 15205.87 1.01x
BenchmarkEncode_8x7x8M-32 13156.38 13589.40 1.03x
BenchmarkEncode_8x9x8M-32 11363.74 11523.70 1.01x
BenchmarkEncode_8x10x8M-32 10359.37 10474.91 1.01x
BenchmarkEncode_8x11x8M-32 9627.07 9463.24 0.98x
BenchmarkEncode_8x8x05M-32 30104.80 32634.89 1.08x
BenchmarkEncode_8x8x1M-32 36497.28 36425.88 1.00x
BenchmarkEncode_8x8x8M-32 12186.19 11602.41 0.95x
BenchmarkEncode_8x8x32M-32 11670.72 11413.71 0.98x
BenchmarkEncode_24x8x24M-32 21709.83 21652.50 1.00x
BenchmarkEncode_24x8x48M-32 22494.40 22280.59 0.99x
BenchmarkVerify10x2x10000-32 10567.56 10483.91 0.99x
BenchmarkVerify50x5x50000-32 28102.84 27923.63 0.99x
BenchmarkVerify10x2x1M-32 30298.33 30106.18 0.99x
BenchmarkVerify5x2x1M-32 16115.91 15847.03 0.98x
BenchmarkVerify10x4x1M-32 15382.13 14852.68 0.97x
BenchmarkVerify50x20x1M-32 8476.02 8466.24 1.00x
BenchmarkVerify10x4x16M-32 15101.03 15434.71 1.02x
BenchmarkReconstruct10x2x10000-32 26228.18 26960.19 1.03x
BenchmarkReconstruct50x5x50000-32 31091.42 30975.82 1.00x
BenchmarkReconstruct10x2x1M-32 58548.87 60281.92 1.03x
BenchmarkReconstruct5x2x1M-32 39499.23 41791.80 1.06x
BenchmarkReconstruct10x4x1M-32 41448.60 43053.15 1.04x
BenchmarkReconstruct50x20x1M-32 17185.99 17354.67 1.01x
BenchmarkReconstruct10x4x16M-32 18798.60 18847.43 1.00x
BenchmarkReconstructData10x2x10000-32 27208.48 27538.38 1.01x
BenchmarkReconstructData50x5x50000-32 32135.65 32078.91 1.00x
BenchmarkReconstructData10x2x1M-32 63180.19 67332.17 1.07x
BenchmarkReconstructData5x2x1M-32 47532.85 49932.17 1.05x
BenchmarkReconstructData10x4x1M-32 50059.14 52323.15 1.05x
BenchmarkReconstructData50x20x1M-32 26679.75 26714.11 1.00x
BenchmarkReconstructData10x4x16M-32 24854.99 24527.23 0.99x
BenchmarkReconstructP10x2x10000-32 115089.87 113229.75 0.98x
BenchmarkReconstructP10x5x20000-32 129838.75 132871.10 1.02x
BenchmarkParallel_8x8x64K-32 69951.43 69980.44 1.00x
BenchmarkParallel_8x8x05M-32 11752.94 11724.35 1.00x
BenchmarkParallel_20x10x05M-32 18553.93 18613.33 1.00x
BenchmarkParallel_8x8x1M-32 11639.19 11746.86 1.01x
BenchmarkParallel_8x8x8M-32 11799.36 11685.63 0.99x
BenchmarkParallel_8x8x32M-32 11510.94 11791.72 1.02x
BenchmarkParallel_8x3x1M-32 20268.92 20678.21 1.02x
BenchmarkParallel_8x4x1M-32 17616.05 17856.17 1.01x
BenchmarkParallel_8x5x1M-32 15590.87 15872.42 1.02x
BenchmarkStreamEncode10x2x10000-32 14917.08 15408.39 1.03x
BenchmarkStreamEncode100x20x10000-32 2014.81 2077.31 1.03x
BenchmarkStreamEncode17x3x1M-32 11839.37 12434.80 1.05x
BenchmarkStreamEncode10x4x16M-32 9151.14 9206.98 1.01x
BenchmarkStreamEncode5x2x1M-32 13598.55 13663.56 1.00x
BenchmarkStreamEncode10x2x1M-32 13192.91 13453.41 1.02x
BenchmarkStreamEncode10x4x1M-32 12109.90 12050.68 1.00x
BenchmarkStreamEncode50x20x1M-32 8640.73 8370.10 0.97x
BenchmarkStreamEncode17x3x16M-32 10473.17 10527.04 1.01x
BenchmarkStreamVerify10x2x10000-32 7032.23 7128.82 1.01x
BenchmarkStreamVerify50x5x50000-32 13023.46 13109.31 1.01x
BenchmarkStreamVerify10x2x1M-32 11941.63 11949.91 1.00x
BenchmarkStreamVerify5x2x1M-32 8029.93 8263.39 1.03x
BenchmarkStreamVerify10x4x1M-32 8137.82 8271.11 1.02x
BenchmarkStreamVerify50x20x1M-32 7378.87 7708.81 1.04x
BenchmarkStreamVerify10x4x16M-32 8973.18 8955.29 1.00x
```
2020-11-10 14:39:23 +01:00
Klaus Post
7daa20bf74
Generate AVX2 code ( #141 )
...
Replaces AVX2 up to 10x8 configurations with specific generated functions.
If code size is a concern `-tags=nogen` can be used.
Biggest speedup when not memory constrained.
```
benchmark old MB/s new MB/s speedup
BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x
BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x
BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x
BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x
BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x
BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x
BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x
BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x
BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x
BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x
BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x
BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x
BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x
BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x
BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x
BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x
BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x
BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x
BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x
BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x
BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x
benchmark old MB/s new MB/s speedup
BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x
BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x
BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x
BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x
BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x
BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x
BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x
BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x
BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x
BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x
BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x
BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x
BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x
BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x
BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x
BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x
BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x
BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x
```
2020-05-20 12:48:34 +02:00
Klaus Post
d2cfcb8065
Add commandline arg to disable asm for tests. ( #116 )
...
* Add commandline test args
2020-04-22 15:38:21 +02:00
Christian Muehlhaeuser
993c27a5ba
Avoid unnecessary conversion ( #107 )
...
No need to convert to byte here.
2019-09-27 16:30:54 -07:00
Klaus Post
0883d2f011
Only enable AVX512 on AMD64
...
Fixes #102
2019-05-26 12:12:55 +02:00
Klaus Post
61c22eab55
Cauchy Matrix option ( #70 )
...
* Experimental Cauchy Matrix
Experimental support for Cauchy style matrix
http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf
All matrices appear reversible.
* Remove Go 1.5 and 1.6 from CI tests.
* Fix comment.
* Increase max number of goroutines+docs.
2017-10-01 14:02:11 +02:00
Klaus Post
dc6af2dce5
Minor cleanup ( #61 )
...
* Remove some benchmarks
* Format tables a bit.
* Doc cleanup
2017-08-13 22:38:27 +02:00
Klaus Post
50a83296f4
Restructure to make one of the galois multiplication parts constant for the main loop.
2015-06-20 18:46:06 +02:00
Klaus Post
3add2c1c30
Precalculate the results of galMultiply. Approx 30% faster overall performance.
2015-06-19 20:07:57 +02:00
klauspost
2b4171b9e6
Initial version
2015-06-19 16:31:24 +02:00