Klaus Post
1bb4d699e1
avx2: Improve speed when > 10 input or output shards. ( #174 )
...
Speeds are including a limiting the number of goroutines with all AVX2 paths,
Before/after
```
benchmark old ns/op new ns/op delta
BenchmarkGalois128K-32 2240 2240 +0.00%
BenchmarkGalois1M-32 19578 18891 -3.51%
BenchmarkGaloisXor128K-32 2798 2852 +1.93%
BenchmarkGaloisXor1M-32 23334 23345 +0.05%
BenchmarkEncode2x1x1M-32 34357 34370 +0.04%
BenchmarkEncode10x2x10000-32 3210 3093 -3.64%
BenchmarkEncode100x20x10000-32 362925 148214 -59.16%
BenchmarkEncode17x3x1M-32 323767 224157 -30.77%
BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00%
BenchmarkEncode5x2x1M-32 68365 66861 -2.20%
BenchmarkEncode10x2x1M-32 101407 93023 -8.27%
BenchmarkEncode10x4x1M-32 171880 155477 -9.54%
BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62%
BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68%
BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25%
BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76%
BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25%
BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28%
BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11%
BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34%
BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19%
BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71%
BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62%
BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52%
BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83%
BenchmarkEncode_8x8x05M-32 150857 136820 -9.30%
BenchmarkEncode_8x8x1M-32 256722 254854 -0.73%
BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27%
BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44%
BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39%
BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59%
BenchmarkVerify10x2x10000-32 8769 8154 -7.01%
BenchmarkVerify10x2x1M-32 516149 476180 -7.74%
BenchmarkVerify5x2x1M-32 443888 419541 -5.48%
BenchmarkVerify10x4x1M-32 1030299 948021 -7.99%
BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19%
BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52%
BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86%
BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33%
BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96%
BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22%
BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06%
BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02%
BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56%
BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86%
BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06%
BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43%
BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62%
BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49%
BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33%
BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04%
BenchmarkReconstructP10x2x10000-32 924 937 +1.35%
BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90%
BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74%
BenchmarkSplit5x2x5M-32 380415 221446 -41.79%
BenchmarkSplit10x2x1M-32 58761 53335 -9.23%
BenchmarkSplit10x4x10M-32 643188 410959 -36.11%
BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67%
BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93%
BenchmarkParallel_8x8x64K-32 7022 6630 -5.58%
BenchmarkParallel_8x8x05M-32 348308 348369 +0.02%
BenchmarkParallel_20x10x05M-32 575672 581028 +0.93%
BenchmarkParallel_8x8x1M-32 716033 697167 -2.63%
BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74%
BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44%
BenchmarkParallel_8x3x1M-32 406839 399125 -1.90%
BenchmarkParallel_8x4x1M-32 459107 463890 +1.04%
BenchmarkParallel_8x5x1M-32 527488 520334 -1.36%
BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25%
BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75%
BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86%
BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12%
BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48%
BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70%
BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31%
BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60%
BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92%
BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88%
BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51%
BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52%
BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67%
BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68%
BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72%
BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27%
benchmark old MB/s new MB/s speedup
BenchmarkGalois128K-32 58518.53 58510.17 1.00x
BenchmarkGalois1M-32 53558.10 55507.44 1.04x
BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x
BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x
BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x
BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x
BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x
BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x
BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x
BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x
BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x
BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x
BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x
BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x
BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x
BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x
BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x
BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x
BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x
BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x
BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x
BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x
BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x
BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x
BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x
BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x
BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x
BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x
BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x
BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x
BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x
BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x
BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x
BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x
BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x
BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x
BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x
BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x
BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x
BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x
BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x
BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x
BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x
BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x
BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x
BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x
BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x
BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x
BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x
BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x
BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x
BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x
BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x
BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x
BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x
BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x
BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x
BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x
BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x
BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x
BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x
BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x
BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x
BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x
BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x
BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x
BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x
BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x
BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x
BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x
BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x
BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x
BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x
BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x
BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x
BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x
BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x
BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x
```
2021-12-09 12:28:44 +01:00
jiangfucheng
62053e3cca
fix: avoid the padding of last data shard is not zero ( #173 )
...
Co-authored-by: jiangfucheng <jiangfucheng@yidian-inc.com>
2021-12-01 13:49:05 +01:00
Klaus Post
0eef97bb02
Add progressive erasure shard encoding ( #170 )
...
* Add progressive erasure shard encoding
Allow progressively building erasure shards.
2021-10-26 14:39:44 +02:00
Klaus Post
7bd22796ec
Wider AVX2 loops and less usage. ( #162 )
...
* Experiment with 64 bytes/loop AVX2
* Only reduce when doing 64.
* Use no more than 8 goroutines for avx2 codegen.
```
name old speed new speed delta
Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10)
Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10)
Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10)
Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10)
Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10)
Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10)
Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10)
Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10)
Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9)
Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10)
Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9)
Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9)
Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10)
Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8)
Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8)
Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10)
Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10)
Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10)
Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9)
Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8)
Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10)
Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9)
Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10)
Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10)
Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9)
Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7)
StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9)
StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10)
StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9)
StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10)
StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10)
StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7)
```
2021-06-21 15:15:23 +02:00
Shawn Zivontsis
0e7f9a6a6f
Allow zero parity shards ( #161 )
2021-03-08 16:13:24 +01:00
Klaus Post
ab26eb4126
Add WithInversionCache and use pointer methods ( #160 )
...
There appears to be writes to value receivers.
Add `WithInversionCache(bool)` to disable cache.
Fixes #159
2021-01-13 10:21:28 +01:00
Klaus Post
519603f6e1
Update packages ( #154 )
...
* Update packages
Update cpuid and clean up generated.
2020-12-09 22:56:01 +01:00
Klaus Post
653e76aa26
Faster AVX2 encoding ( #153 )
...
* Remove 50% of bounds checks when copying.
* Use RIP only addressing, free one register.
```
benchmark old MB/s new MB/s speedup
BenchmarkGalois128K-32 57663.49 58005.87 1.01x
BenchmarkGalois1M-32 49479.31 49848.29 1.01x
BenchmarkGaloisXor128K-32 46310.69 46501.88 1.00x
BenchmarkGaloisXor1M-32 43804.86 43984.39 1.00x
BenchmarkEncode10x2x10000-32 25926.93 27457.75 1.06x
BenchmarkEncode100x20x10000-32 2635.82 2818.95 1.07x
BenchmarkEncode17x3x1M-32 63215.11 61576.76 0.97x
BenchmarkEncode10x4x16M-32 19551.54 19505.07 1.00x
BenchmarkEncode5x2x1M-32 79612.06 81985.14 1.03x
BenchmarkEncode10x2x1M-32 121478.29 127739.41 1.05x
BenchmarkEncode10x4x1M-32 70757.61 74423.67 1.05x
BenchmarkEncode50x20x1M-32 19811.96 20103.32 1.01x
BenchmarkEncode17x3x16M-32 27202.10 27825.34 1.02x
BenchmarkEncode_8x4x8M-32 19029.04 19701.31 1.04x
BenchmarkEncode_12x4x12M-32 22449.87 22480.51 1.00x
BenchmarkEncode_16x4x16M-32 24536.74 24672.24 1.01x
BenchmarkEncode_16x4x32M-32 24381.34 24981.99 1.02x
BenchmarkEncode_16x4x64M-32 24717.69 25086.94 1.01x
BenchmarkEncode_8x5x8M-32 16763.51 17154.04 1.02x
BenchmarkEncode_8x6x8M-32 15067.22 15205.87 1.01x
BenchmarkEncode_8x7x8M-32 13156.38 13589.40 1.03x
BenchmarkEncode_8x9x8M-32 11363.74 11523.70 1.01x
BenchmarkEncode_8x10x8M-32 10359.37 10474.91 1.01x
BenchmarkEncode_8x11x8M-32 9627.07 9463.24 0.98x
BenchmarkEncode_8x8x05M-32 30104.80 32634.89 1.08x
BenchmarkEncode_8x8x1M-32 36497.28 36425.88 1.00x
BenchmarkEncode_8x8x8M-32 12186.19 11602.41 0.95x
BenchmarkEncode_8x8x32M-32 11670.72 11413.71 0.98x
BenchmarkEncode_24x8x24M-32 21709.83 21652.50 1.00x
BenchmarkEncode_24x8x48M-32 22494.40 22280.59 0.99x
BenchmarkVerify10x2x10000-32 10567.56 10483.91 0.99x
BenchmarkVerify50x5x50000-32 28102.84 27923.63 0.99x
BenchmarkVerify10x2x1M-32 30298.33 30106.18 0.99x
BenchmarkVerify5x2x1M-32 16115.91 15847.03 0.98x
BenchmarkVerify10x4x1M-32 15382.13 14852.68 0.97x
BenchmarkVerify50x20x1M-32 8476.02 8466.24 1.00x
BenchmarkVerify10x4x16M-32 15101.03 15434.71 1.02x
BenchmarkReconstruct10x2x10000-32 26228.18 26960.19 1.03x
BenchmarkReconstruct50x5x50000-32 31091.42 30975.82 1.00x
BenchmarkReconstruct10x2x1M-32 58548.87 60281.92 1.03x
BenchmarkReconstruct5x2x1M-32 39499.23 41791.80 1.06x
BenchmarkReconstruct10x4x1M-32 41448.60 43053.15 1.04x
BenchmarkReconstruct50x20x1M-32 17185.99 17354.67 1.01x
BenchmarkReconstruct10x4x16M-32 18798.60 18847.43 1.00x
BenchmarkReconstructData10x2x10000-32 27208.48 27538.38 1.01x
BenchmarkReconstructData50x5x50000-32 32135.65 32078.91 1.00x
BenchmarkReconstructData10x2x1M-32 63180.19 67332.17 1.07x
BenchmarkReconstructData5x2x1M-32 47532.85 49932.17 1.05x
BenchmarkReconstructData10x4x1M-32 50059.14 52323.15 1.05x
BenchmarkReconstructData50x20x1M-32 26679.75 26714.11 1.00x
BenchmarkReconstructData10x4x16M-32 24854.99 24527.23 0.99x
BenchmarkReconstructP10x2x10000-32 115089.87 113229.75 0.98x
BenchmarkReconstructP10x5x20000-32 129838.75 132871.10 1.02x
BenchmarkParallel_8x8x64K-32 69951.43 69980.44 1.00x
BenchmarkParallel_8x8x05M-32 11752.94 11724.35 1.00x
BenchmarkParallel_20x10x05M-32 18553.93 18613.33 1.00x
BenchmarkParallel_8x8x1M-32 11639.19 11746.86 1.01x
BenchmarkParallel_8x8x8M-32 11799.36 11685.63 0.99x
BenchmarkParallel_8x8x32M-32 11510.94 11791.72 1.02x
BenchmarkParallel_8x3x1M-32 20268.92 20678.21 1.02x
BenchmarkParallel_8x4x1M-32 17616.05 17856.17 1.01x
BenchmarkParallel_8x5x1M-32 15590.87 15872.42 1.02x
BenchmarkStreamEncode10x2x10000-32 14917.08 15408.39 1.03x
BenchmarkStreamEncode100x20x10000-32 2014.81 2077.31 1.03x
BenchmarkStreamEncode17x3x1M-32 11839.37 12434.80 1.05x
BenchmarkStreamEncode10x4x16M-32 9151.14 9206.98 1.01x
BenchmarkStreamEncode5x2x1M-32 13598.55 13663.56 1.00x
BenchmarkStreamEncode10x2x1M-32 13192.91 13453.41 1.02x
BenchmarkStreamEncode10x4x1M-32 12109.90 12050.68 1.00x
BenchmarkStreamEncode50x20x1M-32 8640.73 8370.10 0.97x
BenchmarkStreamEncode17x3x16M-32 10473.17 10527.04 1.01x
BenchmarkStreamVerify10x2x10000-32 7032.23 7128.82 1.01x
BenchmarkStreamVerify50x5x50000-32 13023.46 13109.31 1.01x
BenchmarkStreamVerify10x2x1M-32 11941.63 11949.91 1.00x
BenchmarkStreamVerify5x2x1M-32 8029.93 8263.39 1.03x
BenchmarkStreamVerify10x4x1M-32 8137.82 8271.11 1.02x
BenchmarkStreamVerify50x20x1M-32 7378.87 7708.81 1.04x
BenchmarkStreamVerify10x4x16M-32 8973.18 8955.29 1.00x
```
2020-11-10 14:39:23 +01:00
Klaus Post
7daa20bf74
Generate AVX2 code ( #141 )
...
Replaces AVX2 up to 10x8 configurations with specific generated functions.
If code size is a concern `-tags=nogen` can be used.
Biggest speedup when not memory constrained.
```
benchmark old MB/s new MB/s speedup
BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x
BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x
BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x
BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x
BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x
BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x
BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x
BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x
BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x
BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x
BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x
BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x
BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x
BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x
BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x
BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x
BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x
BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x
BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x
BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x
BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x
benchmark old MB/s new MB/s speedup
BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x
BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x
BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x
BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x
BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x
BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x
BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x
BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x
BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x
BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x
BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x
BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x
BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x
BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x
BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x
BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x
BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x
BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x
```
2020-05-20 12:48:34 +02:00
Klaus Post
cf8495259a
Add pure XOR for 1 parity ( #138 )
...
WithFastOneParityMatrix will switch the matrix to a simple xor if there is only one parity shard.
The PAR1 matrix already has this property so it has little effect there.
2020-05-13 11:10:58 +02:00
Klaus Post
151d8c7a05
Tweak concurrency ( #132 )
2020-05-06 15:42:30 +02:00
Klaus Post
cb7a0b5aef
Do fast by one multiplication ( #130 )
...
When multiplying by one we can use faster math.
2020-05-06 11:14:25 +02:00
Klaus Post
0e9e10435f
avx2: Add 64 bytes per loop processing ( #128 )
...
* avx2: Add 64 bytes per loop processing
Not super clean benchmark run, but `BenchmarkGalois` is consistently faster.
```
benchmark old ns/op new ns/op delta
BenchmarkGalois128K-32 2551 2261 -11.37%
BenchmarkGalois1M-32 22492 21107 -6.16%
BenchmarkGaloisXor128K-32 2972 2808 -5.52%
BenchmarkGaloisXor1M-32 25181 23951 -4.88%
BenchmarkEncode10x2x10000-32 5081 4722 -7.07%
BenchmarkEncode100x20x10000-32 383800 346655 -9.68%
BenchmarkEncode17x3x1M-32 264806 263191 -0.61%
BenchmarkEncode10x4x16M-32 8337857 8376910 +0.47%
BenchmarkEncode5x2x1M-32 77119 73598 -4.57%
BenchmarkEncode10x2x1M-32 108424 102423 -5.53%
BenchmarkEncode10x4x1M-32 194427 184301 -5.21%
BenchmarkEncode50x20x1M-32 3870301 3747639 -3.17%
BenchmarkEncode17x3x16M-32 10617586 10602449 -0.14%
BenchmarkEncode_8x4x8M-32 3227254 3229451 +0.07%
BenchmarkEncode_12x4x12M-32 6841898 6847261 +0.08%
BenchmarkEncode_16x4x16M-32 11153469 11048738 -0.94%
BenchmarkEncode_16x4x32M-32 21947506 21826647 -0.55%
BenchmarkEncode_16x4x64M-32 43163608 42971338 -0.45%
BenchmarkEncode_8x5x8M-32 3856675 3780730 -1.97%
BenchmarkEncode_8x6x8M-32 4322023 4437109 +2.66%
BenchmarkEncode_8x7x8M-32 5011434 4959623 -1.03%
BenchmarkEncode_8x9x8M-32 6243694 6098824 -2.32%
BenchmarkEncode_8x10x8M-32 6724456 6657099 -1.00%
BenchmarkEncode_8x11x8M-32 7207693 7340332 +1.84%
BenchmarkEncode_8x8x05M-32 176877 172183 -2.65%
BenchmarkEncode_8x8x1M-32 309716 301743 -2.57%
BenchmarkEncode_8x8x8M-32 5498952 5489078 -0.18%
BenchmarkEncode_8x8x32M-32 22630195 22557074 -0.32%
BenchmarkEncode_24x8x24M-32 28488886 28220702 -0.94%
BenchmarkEncode_24x8x48M-32 56124735 54862495 -2.25%
BenchmarkVerify10x2x10000-32 9874 9356 -5.25%
BenchmarkVerify50x5x50000-32 175610 159735 -9.04%
BenchmarkVerify10x2x1M-32 331276 311726 -5.90%
BenchmarkVerify5x2x1M-32 265466 248075 -6.55%
BenchmarkVerify10x4x1M-32 701627 606420 -13.57%
BenchmarkVerify50x20x1M-32 4338171 4245635 -2.13%
BenchmarkVerify10x4x16M-32 12312830 11932698 -3.09%
BenchmarkReconstruct10x2x10000-32 1594 1504 -5.65%
BenchmarkReconstruct50x5x50000-32 95101 79558 -16.34%
BenchmarkReconstruct10x2x1M-32 38479 37225 -3.26%
BenchmarkReconstruct5x2x1M-32 30968 30013 -3.08%
BenchmarkReconstruct10x4x1M-32 81630 75350 -7.69%
BenchmarkReconstruct50x20x1M-32 1136952 1040156 -8.51%
BenchmarkReconstruct10x4x16M-32 685408 656484 -4.22%
BenchmarkReconstructData10x2x10000-32 1609 1486 -7.64%
BenchmarkReconstructData50x5x50000-32 87090 71512 -17.89%
BenchmarkReconstructData10x2x1M-32 31497 30347 -3.65%
BenchmarkReconstructData5x2x1M-32 23379 22611 -3.28%
BenchmarkReconstructData10x4x1M-32 63853 61035 -4.41%
BenchmarkReconstructData50x20x1M-32 1048807 966201 -7.88%
BenchmarkReconstructData10x4x16M-32 866658 892252 +2.95%
BenchmarkReconstructP10x2x10000-32 544 540 -0.74%
BenchmarkReconstructP10x5x20000-32 1242 1206 -2.90%
BenchmarkSplit10x4x160M-32 2735508 2743214 +0.28%
BenchmarkSplit5x2x5M-32 276232 288523 +4.45%
BenchmarkSplit10x2x1M-32 44389 45517 +2.54%
BenchmarkSplit10x4x10M-32 477282 460888 -3.43%
BenchmarkSplit50x20x50M-32 1608821 1602105 -0.42%
BenchmarkSplit17x3x272M-32 2035932 2034705 -0.06%
BenchmarkParallel_8x8x05M-32 346733 351837 +1.47%
BenchmarkParallel_20x10x05M-32 577127 586232 +1.58%
BenchmarkParallel_8x8x1M-32 722453 729294 +0.95%
BenchmarkParallel_8x8x8M-32 5717650 5817130 +1.74%
BenchmarkParallel_8x8x32M-32 22914260 24132696 +5.32%
BenchmarkStreamEncode10x2x10000-32 6703131 7141021 +6.53%
BenchmarkStreamEncode100x20x10000-32 38175873 39767386 +4.17%
BenchmarkStreamEncode17x3x1M-32 8920549 9218973 +3.35%
BenchmarkStreamEncode10x4x16M-32 21841702 21784898 -0.26%
BenchmarkStreamEncode5x2x1M-32 4088001 3247404 -20.56%
BenchmarkStreamEncode10x2x1M-32 5860652 5932381 +1.22%
BenchmarkStreamEncode10x4x1M-32 7555172 7589960 +0.46%
BenchmarkStreamEncode50x20x1M-32 30006814 30250054 +0.81%
BenchmarkStreamEncode17x3x16M-32 32757489 32818254 +0.19%
BenchmarkStreamVerify10x2x10000-32 6714996 6831093 +1.73%
BenchmarkStreamVerify50x5x50000-32 18525904 18761767 +1.27%
BenchmarkStreamVerify10x2x1M-32 5232278 5444148 +4.05%
BenchmarkStreamVerify5x2x1M-32 3673843 3755283 +2.22%
BenchmarkStreamVerify10x4x1M-32 7184419 7185293 +0.01%
BenchmarkStreamVerify50x20x1M-32 28441187 28574766 +0.47%
BenchmarkStreamVerify10x4x16M-32 8538440 8668614 +1.52%
benchmark old MB/s new MB/s speedup
BenchmarkGalois128K-32 51374.59 57976.36 1.13x
BenchmarkGalois1M-32 46620.03 49679.10 1.07x
BenchmarkGaloisXor128K-32 44106.22 46671.56 1.06x
BenchmarkGaloisXor1M-32 41641.82 43779.89 1.05x
BenchmarkEncode10x2x10000-32 19682.61 21176.81 1.08x
BenchmarkEncode100x20x10000-32 2605.52 2884.71 1.11x
BenchmarkEncode17x3x1M-32 67316.54 67729.50 1.01x
BenchmarkEncode10x4x16M-32 20121.74 20027.93 1.00x
BenchmarkEncode5x2x1M-32 67984.17 71236.47 1.05x
BenchmarkEncode10x2x1M-32 96710.29 102377.00 1.06x
BenchmarkEncode10x4x1M-32 53931.74 56894.82 1.05x
BenchmarkEncode50x20x1M-32 13546.44 13989.82 1.03x
BenchmarkEncode17x3x16M-32 26862.29 26900.64 1.00x
BenchmarkEncode_8x4x8M-32 20794.42 20780.27 1.00x
BenchmarkEncode_12x4x12M-32 22069.16 22051.88 1.00x
BenchmarkEncode_16x4x16M-32 24067.44 24295.58 1.01x
BenchmarkEncode_16x4x32M-32 24461.59 24597.04 1.01x
BenchmarkEncode_16x4x64M-32 24876.09 24987.40 1.00x
BenchmarkEncode_8x5x8M-32 17400.71 17750.24 1.02x
BenchmarkEncode_8x6x8M-32 15527.19 15124.46 0.97x
BenchmarkEncode_8x7x8M-32 13391.15 13531.04 1.01x
BenchmarkEncode_8x9x8M-32 10748.26 11003.58 1.02x
BenchmarkEncode_8x10x8M-32 9979.82 10080.80 1.01x
BenchmarkEncode_8x11x8M-32 9310.73 9142.48 0.98x
BenchmarkEncode_8x8x05M-32 23713.12 24359.50 1.03x
BenchmarkEncode_8x8x1M-32 27084.87 27800.50 1.03x
BenchmarkEncode_8x8x8M-32 12203.94 12225.89 1.00x
BenchmarkEncode_8x8x32M-32 11861.83 11900.28 1.00x
BenchmarkEncode_24x8x24M-32 21200.54 21402.01 1.01x
BenchmarkEncode_24x8x48M-32 21522.77 22017.95 1.02x
BenchmarkVerify10x2x10000-32 10127.24 10688.01 1.06x
BenchmarkVerify50x5x50000-32 28472.25 31301.75 1.10x
BenchmarkVerify10x2x1M-32 31652.63 33637.74 1.06x
BenchmarkVerify5x2x1M-32 19749.74 21134.27 1.07x
BenchmarkVerify10x4x1M-32 14944.92 17291.25 1.16x
BenchmarkVerify50x20x1M-32 12085.46 12348.87 1.02x
BenchmarkVerify10x4x16M-32 13625.80 14059.87 1.03x
BenchmarkReconstruct10x2x10000-32 62723.68 66470.81 1.06x
BenchmarkReconstruct50x5x50000-32 52575.87 62847.32 1.20x
BenchmarkReconstruct10x2x1M-32 272507.04 281685.84 1.03x
BenchmarkReconstruct5x2x1M-32 169299.03 174685.39 1.03x
BenchmarkReconstruct10x4x1M-32 128455.17 139161.42 1.08x
BenchmarkReconstruct50x20x1M-32 46113.48 50404.73 1.09x
BenchmarkReconstruct10x4x16M-32 244777.11 255561.72 1.04x
BenchmarkReconstructData10x2x10000-32 62160.46 67305.98 1.08x
BenchmarkReconstructData50x5x50000-32 57411.81 69917.97 1.22x
BenchmarkReconstructData10x2x1M-32 332909.82 345526.29 1.04x
BenchmarkReconstructData5x2x1M-32 224254.60 231868.74 1.03x
BenchmarkReconstructData10x4x1M-32 164216.61 171799.68 1.05x
BenchmarkReconstructData50x20x1M-32 49988.98 54262.82 1.09x
BenchmarkReconstructData10x4x16M-32 193585.15 188032.29 0.97x
BenchmarkReconstructP10x2x10000-32 183806.57 185284.57 1.01x
BenchmarkReconstructP10x5x20000-32 160985.46 165852.51 1.03x
BenchmarkParallel_8x8x05M-32 12096.63 11921.17 0.99x
BenchmarkParallel_20x10x05M-32 18168.91 17886.72 0.98x
BenchmarkParallel_8x8x1M-32 11611.28 11502.36 0.99x
BenchmarkParallel_8x8x8M-32 11737.14 11536.42 0.98x
BenchmarkParallel_8x8x32M-32 11714.78 11123.31 0.95x
BenchmarkStreamEncode10x2x10000-32 14.92 14.00 0.94x
BenchmarkStreamEncode100x20x10000-32 26.19 25.15 0.96x
BenchmarkStreamEncode17x3x1M-32 1998.28 1933.60 0.97x
BenchmarkStreamEncode10x4x16M-32 7681.28 7701.31 1.00x
BenchmarkStreamEncode5x2x1M-32 1282.50 1614.48 1.26x
BenchmarkStreamEncode10x2x1M-32 1789.18 1767.55 0.99x
BenchmarkStreamEncode10x4x1M-32 1387.89 1381.53 1.00x
BenchmarkStreamEncode50x20x1M-32 1747.23 1733.18 0.99x
BenchmarkStreamEncode17x3x16M-32 8706.79 8690.67 1.00x
BenchmarkStreamVerify10x2x10000-32 14.89 14.64 0.98x
BenchmarkStreamVerify50x5x50000-32 269.89 266.50 0.99x
BenchmarkStreamVerify10x2x1M-32 2004.05 1926.06 0.96x
BenchmarkStreamVerify5x2x1M-32 1427.08 1396.13 0.98x
BenchmarkStreamVerify10x4x1M-32 1459.51 1459.34 1.00x
BenchmarkStreamVerify50x20x1M-32 1843.41 1834.79 1.00x
BenchmarkStreamVerify10x4x16M-32 19649.04 19353.98 0.98x
```
2020-05-05 16:36:01 +02:00
Klaus Post
de70cc155f
AVX512 parallel processing ( #120 )
...
Do concurrent processing in AVX512 mode and split jobs by cache size.
2020-05-04 09:17:40 +02:00
Klaus Post
65df535980
Make single goroutine encodes more efficient ( #122 )
...
Calculate the optimal per round size to keep data in cache when not using WithAutoGoroutines.
```
λ benchcmp before.txt after.txt
benchmark old ns/op new ns/op delta
BenchmarkParallel_8x8x05M-16 675225 321053 -52.45%
BenchmarkParallel_20x10x05M-16 3471988 600740 -82.70%
BenchmarkParallel_8x8x1M-16 3948606 728093 -81.56%
BenchmarkParallel_8x8x8M-16 47361588 5976467 -87.38%
BenchmarkParallel_8x8x32M-16 195044200 24365474 -87.51%
benchmark old MB/s new MB/s speedup
BenchmarkParallel_8x8x05M-16 6211.71 13064.22 2.10x
BenchmarkParallel_20x10x05M-16 3020.10 17454.73 5.78x
BenchmarkParallel_8x8x1M-16 2124.45 11521.34 5.42x
BenchmarkParallel_8x8x8M-16 1416.95 11228.85 7.92x
BenchmarkParallel_8x8x32M-16 1376.28 11017.04 8.00x
```
2020-05-03 19:37:22 +02:00
Klaus Post
c3634dce94
Use CPU cache to set minSplitSize ( #117 )
...
Use L1 cache size to set default split size.
2020-04-22 16:12:18 +02:00
Andreas Auernhammer
1f1369aa84
limit capacity of shards to shard size ( #109 )
...
This commit limits the capacity (additionally
to the length) of each shard to the shard size.
Before this change the following code behaves in
an unexpected way:
```
shards := encoder.Split(buffer)
// ...
shards[0] = shards[0][:cap(shards[0])
```
Instead of restoring the length of `shards[0]` to
the shard size, it assigns the entire memory of `buffer`
to `shards[0]`.
2019-09-27 16:30:26 -07:00
dssysolyatin
7890684129
Improve quick check for case when dataOnly is true ( #105 )
2019-06-25 16:30:44 +02:00
dssysolyatin
ec2eb9fb8c
Split: Reduce memory allocation ( #103 )
...
* [Split] Reduce memory allocation in Split function
2019-06-25 16:28:24 +02:00
Klaus Post
a9588190c0
Optimize pure Go version. ( #96 )
...
* Optimize pure Go version.
* Update docs. Add Go 1.12 CI
* Avoid dst bounds check when using noasm ~ 40-50% faster.
* Convert multiply table to a slice whenever used.
* Split on 32 byte boundaries instead of 16 byte.
2019-03-08 10:49:27 +01:00
Klaus Post
09979cdf93
Start documentation with method name.
...
Replaces #92
2019-02-15 15:31:43 +01:00
Frank Wessels
79aee05119
AVX512 accelerated version resulting in a 4x speed improvement over AVX2 ( #91 )
...
The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table:
```
$ benchcmp avx2.txt avx512.txt
benchmark AVX2 MB/s AVX512 MB/s speedup
BenchmarkEncode8x8x1M-72 1681.35 4125.64 2.45x
BenchmarkEncode8x4x8M-72 1529.36 5507.97 3.60x
BenchmarkEncode8x8x8M-72 791.16 2952.29 3.73x
BenchmarkEncode8x8x32M-72 573.26 2168.61 3.78x
BenchmarkEncode12x4x12M-72 1234.41 4912.37 3.98x
BenchmarkEncode16x4x16M-72 1189.59 5138.01 4.32x
BenchmarkEncode24x8x24M-72 690.68 2583.70 3.74x
BenchmarkEncode24x8x48M-72 674.20 2643.31 3.92x
```
2019-02-10 11:17:23 +01:00
Klaus Post
278ba25f43
Pre-slice input.
2018-11-16 00:23:56 +01:00
Klaus Post
f5e73dcfe2
Split blocks into size divisible by 16
...
Older systems (typically without AVX2) are more sensitive to misaligned load+stores.
Add parameter to automatically set the number of goroutines.
name old time/op new time/op delta
Encode10x2x10000-8 18.4µs ± 1% 16.1µs ± 1% -12.43% (p=0.000 n=9+9)
Encode100x20x10000-8 692µs ± 1% 608µs ± 1% -12.10% (p=0.000 n=10+10)
Encode17x3x1M-8 1.78ms ± 5% 1.49ms ± 1% -16.63% (p=0.000 n=10+10)
Encode10x4x16M-8 21.5ms ± 5% 19.6ms ± 4% -8.74% (p=0.000 n=10+9)
Encode5x2x1M-8 343µs ± 2% 267µs ± 2% -22.22% (p=0.000 n=9+10)
Encode10x2x1M-8 858µs ± 5% 701µs ± 5% -18.34% (p=0.000 n=10+10)
Encode10x4x1M-8 1.34ms ± 1% 1.16ms ± 1% -13.19% (p=0.000 n=9+9)
Encode50x20x1M-8 30.3ms ± 4% 25.0ms ± 2% -17.51% (p=0.000 n=10+8)
Encode17x3x16M-8 26.9ms ± 1% 24.5ms ± 4% -9.13% (p=0.000 n=8+10)
name old speed new speed delta
Encode10x2x10000-8 5.45GB/s ± 1% 6.22GB/s ± 1% +14.20% (p=0.000 n=9+9)
Encode100x20x10000-8 1.44GB/s ± 1% 1.64GB/s ± 1% +13.77% (p=0.000 n=10+10)
Encode17x3x1M-8 10.0GB/s ± 5% 12.0GB/s ± 1% +19.88% (p=0.000 n=10+10)
Encode10x4x16M-8 7.81GB/s ± 5% 8.56GB/s ± 5% +9.58% (p=0.000 n=10+9)
Encode5x2x1M-8 15.3GB/s ± 2% 19.6GB/s ± 2% +28.57% (p=0.000 n=9+10)
Encode10x2x1M-8 12.2GB/s ± 5% 15.0GB/s ± 5% +22.45% (p=0.000 n=10+10)
Encode10x4x1M-8 7.84GB/s ± 1% 9.03GB/s ± 1% +15.19% (p=0.000 n=9+9)
Encode50x20x1M-8 1.73GB/s ± 4% 2.09GB/s ± 4% +20.59% (p=0.000 n=10+9)
Encode17x3x16M-8 10.6GB/s ± 1% 11.7GB/s ± 4% +10.12% (p=0.000 n=8+10)
2017-11-18 22:00:55 +01:00
Klaus Post
61c22eab55
Cauchy Matrix option ( #70 )
...
* Experimental Cauchy Matrix
Experimental support for Cauchy style matrix
http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf
All matrices appear reversible.
* Remove Go 1.5 and 1.6 from CI tests.
* Fix comment.
* Increase max number of goroutines+docs.
2017-10-01 14:02:11 +02:00
David Reiss
ddcafc661e
Allow reconstructing into pre-allocated memory. ( #66 )
...
This changes the interface of Reconstruct and ReconstructData to accept
slices of zero length but sufficient capacity for shards to reconstruct,
and reslices them instead of allocating new memory.
2017-09-20 21:08:24 +02:00
chenzhongtao
d78bf472d8
add Update parity function ( #60 )
...
Add Update parity function
2017-08-20 11:42:39 +02:00
Andreas Auernhammer
48a4fd05f1
fix unnecessary memory alloc in Split ( #59 )
...
Split divided the data into `DataShards` blocks and allocates all parity blocks.
This change adds a check whether the capacity of data is large enough to hold all
data and parity blocks. It only allocates parity blocks if necessary.
2017-07-22 16:16:58 +02:00
Frank Wessels
0de37d7697
Add ReconstructData interface method ( #57 )
...
* Add ReconstructData interface method to allow reconstruction of any missing data shards
* Add support for just reconstructing data shards only to SteamEncoder.Reconstruct()
2017-07-20 12:15:46 +02:00
Klaus Post
0dd0a0e50c
Fix error grammar
...
Fixes #56
2017-07-16 17:00:58 +02:00
Fred Akalin
18d548df63
Add support for PAR1 ( #55 )
...
PAR1 is a file format which uses a Reed-Solomon code similar
to the current one, except it uses a different (flawed) coding
matrix.
Add support for it via a WithPAR1Matrix option, so that this code
can be used to encode/decode PAR1 files. Also add the option to
existing tests, and add a test demonstrating the flaw in PAR1's
coding matrix.
Also fix an mistakenly inverted test in testOpts().
Incidentally, PAR1 is obsoleted by PAR2, which uses GF(2^16)
and tries to fix the flaw in the coding matrix; however, PAR2's
coding matrix is still flawed! The real solution is to build the
coding matrix like in this repository.
PAR1 spec:
http://parchive.sourceforge.net/docs/specifications/parity-volume-spec-1.0/article-spec.html
Paper describing the (flawed) Reed-Solomon code used by PAR1:
http://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.html
2017-06-20 20:24:57 +02:00
Fred Akalin
87c4e5ae75
Allow 256 total shards ( #54 )
...
* Allow 256 total shards
2017-06-19 11:26:52 +02:00
Klaus Post
5abf0ee302
Add options ( #46 )
...
* Add options
Make constants changeable as options.
The API remains backwards compatible.
* Update documentation.
* Fix line endings
* fmt
* fmt
* Use functions for parameters.
Much neater.
2017-02-19 11:13:22 +01:00
Peter C
c54154da9e
Add Inverse Matrix caching in a Thread-Safe Lookup Tree ( #36 )
...
* Add matrix inversion caching
* Benchmark and Parallel Benchmark tests for Reconstruct
2016-09-12 21:31:07 +02:00
Christian Muehlhaeuser
b1c8b4b073
Make Join return an error if a reconstruction is required first
...
If one or more required data shards are still nil and we can't correctly join
them before a reconstruction, return ErrReconstructRequired.
2016-08-05 19:23:08 +02:00
Harshavardhana
ba30981088
Add checks for data and parity to not exceed 255 shards in total.
...
Fixes #16
2016-06-03 01:31:01 -07:00
Klaus Post
4fadad8564
Update reedsolomon.go
...
Fix comment
2016-05-01 12:00:51 +02:00
Klaus Post
ed06f926b9
Merge pull request #20 from harshavardhana/fix
...
ErrShortData shouldn't be returned for data less than dataShards.
2016-05-01 11:58:47 +02:00
Harshavardhana
df175d2921
ErrShortData shouldn't be returned for data less than dataShards.
...
The reasoning behind this is that if we have a data block number
of 10, and parity of 10. Restricting input such that files of
size < 10Bytes should be errored out doesn't seem like the right
approach.
Most erasure subsystems will have static data and parity blocks,
in such case erroring out is not correct since reedsolomon itself
doesn't provide this limitation (please correct me here if i am
wrong :-)).
So removing the check itself is not a problem since most of the
data after the split would be padded with zeros, which is okay
and should be left as application optimization if they wish to
pack small files in this range.
ErrShortData will be still returned in case if the size of data
is empty, or in case of streaming if the size == 0.
2016-04-29 20:38:45 -07:00
Harshavardhana
0b630aea27
use bytes.Equal rather than bytes.Compare
2016-04-29 14:12:03 -07:00
xiaost
4048a541c8
Optimized encoding & decoding goroutines number
...
hardware: E5-2630 v2 (Intel x86-64 with ssse3)
software: linux, go1.6, GOMAXPROCS=2
Performances before after change
BenchmarkEncode10x2x10000-2 2884.95 MB/s 2837.93 MB/s 0.98x
BenchmarkEncode100x20x10000-2 593.93 MB/s 577.17 MB/s 0.97x
BenchmarkEncode17x3x1M-2 2903.74 MB/s 5197.99 MB/s 1.80x
BenchmarkEncode10x4x16M-2 1992.13 MB/s 3689.69 MB/s 1.85x
BenchmarkEncode5x2x1M-2 2883.78 MB/s 7506.19 MB/s 2.60x
BenchmarkEncode10x2x1M-2 3205.63 MB/s 7848.12 MB/s 2.45x
BenchmarkEncode10x4x1M-2 2218.35 MB/s 3998.35 MB/s 1.80x
BenchmarkEncode50x20x1M-2 579.24 MB/s 641.08 MB/s 1.11x
BenchmarkEncode17x3x16M-2 2652.36 MB/s 4775.41 MB/s 1.80x
BenchmarkVerify10x2x10000-2 1327.27 MB/s 1837.41 MB/s 1.38x
BenchmarkVerify50x5x50000-2 1481.89 MB/s 2684.57 MB/s 1.81x
BenchmarkVerify10x2x1M-2 1553.91 MB/s 5704.71 MB/s 3.67x
BenchmarkVerify5x2x1M-2 939.90 MB/s 4949.30 MB/s 5.26x
BenchmarkVerify10x4x1M-2 956.89 MB/s 3191.01 MB/s 3.33x
BenchmarkVerify50x20x1M-2 490.49 MB/s 823.46 MB/s 1.68x
BenchmarkVerify10x4x16M-2 1078.03 MB/s 3196.97 MB/s 2.97x
BenchmarkStreamEncode10x2x10000-2 2.40 MB/s 12.10 MB/s 5.04x
BenchmarkStreamEncode100x20x10000-2 6.72 MB/s 10.72 MB/s 1.60x
BenchmarkStreamEncode17x3x1M-2 390.75 MB/s 845.08 MB/s 2.16x
BenchmarkStreamEncode10x4x16M-2 1175.93 MB/s 1803.71 MB/s 1.53x
BenchmarkStreamEncode5x2x1M-2 207.85 MB/s 790.02 MB/s 3.80x
BenchmarkStreamEncode10x2x1M-2 296.77 MB/s 872.41 MB/s 2.94x
BenchmarkStreamEncode10x4x1M-2 264.43 MB/s 699.25 MB/s 2.64x
BenchmarkStreamEncode50x20x1M-2 284.93 MB/s 414.65 MB/s 1.46x
BenchmarkStreamEncode17x3x16M-2 1439.13 MB/s 1933.42 MB/s 1.34x
BenchmarkStreamVerify10x2x10000-2 2.33 MB/s 12.07 MB/s 5.18x
BenchmarkStreamVerify50x5x50000-2 86.53 MB/s 136.02 MB/s 1.57x
BenchmarkStreamVerify10x2x1M-2 315.65 MB/s 909.44 MB/s 2.88x
BenchmarkStreamVerify5x2x1M-2 180.45 MB/s 772.42 MB/s 4.28x
BenchmarkStreamVerify10x4x1M-2 310.35 MB/s 779.26 MB/s 2.51x
BenchmarkStreamVerify50x20x1M-2 547.23 MB/s 773.74 MB/s 1.41x
BenchmarkStreamVerify10x4x16M-2 4128.01 MB/s 6606.43 MB/s 1.60x
2016-04-12 15:41:22 +08:00
klauspost
180472d98f
Make documentation conform to go vet.
2015-11-03 12:09:36 +01:00
lukechampine
295bf27a3d
fix Split panic
2015-08-08 16:38:55 -04:00
lukechampine
0bd572bc5b
tweak Split/Join functions
2015-08-08 13:51:12 -04:00
lukechampine
64b705bbf6
fully test Reconstruct function
...
Well, I can't figure out how to trigger the Invert error.
It may not be possible; need more domain knowledge to be sure.
2015-08-08 13:50:18 -04:00
lukechampine
cf985d4451
remove unreachable checkShards case
...
this case would be caught by shardSize anyway
2015-08-08 13:50:18 -04:00
lukechampine
5784cfa7ff
remove impossible errors
2015-08-06 22:46:27 -04:00
klauspost
8ebf356efb
The number of data shards must be below 257. Check that and update documentation.
2015-06-23 13:39:57 +02:00
klauspost
5c2ef3ae72
Always check/return errors.
2015-06-23 12:16:26 +02:00
klauspost
7381e0b7b5
- Only run multiple goroutines if size is bigger than splitsize.
...
- Update docs
2015-06-23 11:18:29 +02:00