reedsolomon-go

Commit Graph

Author	SHA1	Message	Date
Vitaliy Filippov	b933ef1add	Implement jerasure algorithm of matrix generation for interoperability	2022-08-15 14:30:30 +03:00
Klaus Post	3a82d28edb	Add GF16 AVX2, AVX512 and SSSE3 (#193 ) * Add GF16 AVX2 * Add SSSE3 fallback. * Fix reconstruction was skipped if first shard was empty. * Combine lookups in pure Go * Faster xor on pure Go. * Add 4way butterfly AVX2. * Add fftDIT4 avx2. Add avx512 version. Add noescape. * Remove +build space. Do size varied 800x200 bench. * Use VPTERNLOGD for avx512. * Remove refMulAdd inner loop bounds checks. ~10-20% faster	2022-07-26 12:37:28 +02:00
Elias Naur	49be604db0	Add an efficient implementation of shard counts up to 65536 (#191 ) This is a O(n*log n) implementation of Reed-Solomon codes, ported from the C++ library https://github.com/catid/leopard. The implementation is based on the paper "Novel Polynomial Basis with Fast Fourier Transform and Its Application to Reed-Solomon Erasure Codes" Several TODOs are left for future commits: - Performance optimizations, in particular SIMD and multiple goroutines - Documentation - Detailed Tests - Merging of reedSolomonFF16 and reedSolomon types - Turn the straight C++ port into more idiomatic Go This change also bumps one race testing timeout, to ensure adequate time on CI.	2022-07-21 14:27:10 +02:00
Vitaliy Filippov	e97f38d38d	TestReconstructData was probably meant to call testReconstructData() (#188 ) Co-authored-by: Vitaliy Filippov <vitalif@yourcmc.ru>	2022-06-17 11:58:38 +02:00
Vitaliy Filippov	19a04effc5	Implement ReconstructSome() to reconstruct only specific data shards (#189 ) Co-authored-by: Vitaliy Filippov <vitalif@yourcmc.ru>	2022-06-17 11:58:26 +02:00
Vitaliy Filippov	10e7890be7	Add custom coding matrix support (#187 ) Co-authored-by: Vitaliy Filippov <vitalif@yourcmc.ru>	2022-06-17 11:43:51 +02:00
Vitaliy Filippov	1ef513248a	Publish withSSE/withAVX options (#186 ) Co-authored-by: Vitaliy Filippov <vitalif@yourcmc.ru>	2022-06-17 11:42:53 +02:00
Klaus Post	8e17d64e52	Report allocs for reconstruct (#181 )	2022-02-09 09:53:36 +01:00
Klaus Post	1bb4d699e1	avx2: Improve speed when > 10 input or output shards. (#174 ) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```	2021-12-09 12:28:44 +01:00
Klaus Post	5593e2b2dd	Unroll pure go xor loop (#172 ) * Unroll pure go xor loop Testing with `go test -bench=x1x -tags=noasm -short` ``` before: BenchmarkEncode2x1x1M-32 13658 87980 ns/op 35754.96 MB/s after: BenchmarkEncode2x1x1M-32 21633 55498 ns/op 56682.24 MB/s ```	2021-12-02 16:39:56 +01:00
Klaus Post	0eef97bb02	Add progressive erasure shard encoding (#170 ) * Add progressive erasure shard encoding Allow progressively building erasure shards.	2021-10-26 14:39:44 +02:00
Klaus Post	7bd22796ec	Wider AVX2 loops and less usage. (#162 ) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```	2021-06-21 15:15:23 +02:00
Shawn Zivontsis	0e7f9a6a6f	Allow zero parity shards (#161 )	2021-03-08 16:13:24 +01:00
Klaus Post	ab26eb4126	Add WithInversionCache and use pointer methods (#160 ) There appears to be writes to value receivers. Add `WithInversionCache(bool)` to disable cache. Fixes #159	2021-01-13 10:21:28 +01:00
Klaus Post	7c8682430c	tests: Set full data size as number of bytes (#157 ) * Clean up deps. * tests: Set full data size as number of bytes Use total data size (data+parity) as benchmark sizes for more consistent benchmarks.	2020-12-18 09:09:17 +01:00
Klaus Post	519603f6e1	Update packages (#154 ) * Update packages Update cpuid and clean up generated.	2020-12-09 22:56:01 +01:00
Klaus Post	653e76aa26	Faster AVX2 encoding (#153 ) * Remove 50% of bounds checks when copying. * Use RIP only addressing, free one register. ``` benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 57663.49 58005.87 1.01x BenchmarkGalois1M-32 49479.31 49848.29 1.01x BenchmarkGaloisXor128K-32 46310.69 46501.88 1.00x BenchmarkGaloisXor1M-32 43804.86 43984.39 1.00x BenchmarkEncode10x2x10000-32 25926.93 27457.75 1.06x BenchmarkEncode100x20x10000-32 2635.82 2818.95 1.07x BenchmarkEncode17x3x1M-32 63215.11 61576.76 0.97x BenchmarkEncode10x4x16M-32 19551.54 19505.07 1.00x BenchmarkEncode5x2x1M-32 79612.06 81985.14 1.03x BenchmarkEncode10x2x1M-32 121478.29 127739.41 1.05x BenchmarkEncode10x4x1M-32 70757.61 74423.67 1.05x BenchmarkEncode50x20x1M-32 19811.96 20103.32 1.01x BenchmarkEncode17x3x16M-32 27202.10 27825.34 1.02x BenchmarkEncode_8x4x8M-32 19029.04 19701.31 1.04x BenchmarkEncode_12x4x12M-32 22449.87 22480.51 1.00x BenchmarkEncode_16x4x16M-32 24536.74 24672.24 1.01x BenchmarkEncode_16x4x32M-32 24381.34 24981.99 1.02x BenchmarkEncode_16x4x64M-32 24717.69 25086.94 1.01x BenchmarkEncode_8x5x8M-32 16763.51 17154.04 1.02x BenchmarkEncode_8x6x8M-32 15067.22 15205.87 1.01x BenchmarkEncode_8x7x8M-32 13156.38 13589.40 1.03x BenchmarkEncode_8x9x8M-32 11363.74 11523.70 1.01x BenchmarkEncode_8x10x8M-32 10359.37 10474.91 1.01x BenchmarkEncode_8x11x8M-32 9627.07 9463.24 0.98x BenchmarkEncode_8x8x05M-32 30104.80 32634.89 1.08x BenchmarkEncode_8x8x1M-32 36497.28 36425.88 1.00x BenchmarkEncode_8x8x8M-32 12186.19 11602.41 0.95x BenchmarkEncode_8x8x32M-32 11670.72 11413.71 0.98x BenchmarkEncode_24x8x24M-32 21709.83 21652.50 1.00x BenchmarkEncode_24x8x48M-32 22494.40 22280.59 0.99x BenchmarkVerify10x2x10000-32 10567.56 10483.91 0.99x BenchmarkVerify50x5x50000-32 28102.84 27923.63 0.99x BenchmarkVerify10x2x1M-32 30298.33 30106.18 0.99x BenchmarkVerify5x2x1M-32 16115.91 15847.03 0.98x BenchmarkVerify10x4x1M-32 15382.13 14852.68 0.97x BenchmarkVerify50x20x1M-32 8476.02 8466.24 1.00x BenchmarkVerify10x4x16M-32 15101.03 15434.71 1.02x BenchmarkReconstruct10x2x10000-32 26228.18 26960.19 1.03x BenchmarkReconstruct50x5x50000-32 31091.42 30975.82 1.00x BenchmarkReconstruct10x2x1M-32 58548.87 60281.92 1.03x BenchmarkReconstruct5x2x1M-32 39499.23 41791.80 1.06x BenchmarkReconstruct10x4x1M-32 41448.60 43053.15 1.04x BenchmarkReconstruct50x20x1M-32 17185.99 17354.67 1.01x BenchmarkReconstruct10x4x16M-32 18798.60 18847.43 1.00x BenchmarkReconstructData10x2x10000-32 27208.48 27538.38 1.01x BenchmarkReconstructData50x5x50000-32 32135.65 32078.91 1.00x BenchmarkReconstructData10x2x1M-32 63180.19 67332.17 1.07x BenchmarkReconstructData5x2x1M-32 47532.85 49932.17 1.05x BenchmarkReconstructData10x4x1M-32 50059.14 52323.15 1.05x BenchmarkReconstructData50x20x1M-32 26679.75 26714.11 1.00x BenchmarkReconstructData10x4x16M-32 24854.99 24527.23 0.99x BenchmarkReconstructP10x2x10000-32 115089.87 113229.75 0.98x BenchmarkReconstructP10x5x20000-32 129838.75 132871.10 1.02x BenchmarkParallel_8x8x64K-32 69951.43 69980.44 1.00x BenchmarkParallel_8x8x05M-32 11752.94 11724.35 1.00x BenchmarkParallel_20x10x05M-32 18553.93 18613.33 1.00x BenchmarkParallel_8x8x1M-32 11639.19 11746.86 1.01x BenchmarkParallel_8x8x8M-32 11799.36 11685.63 0.99x BenchmarkParallel_8x8x32M-32 11510.94 11791.72 1.02x BenchmarkParallel_8x3x1M-32 20268.92 20678.21 1.02x BenchmarkParallel_8x4x1M-32 17616.05 17856.17 1.01x BenchmarkParallel_8x5x1M-32 15590.87 15872.42 1.02x BenchmarkStreamEncode10x2x10000-32 14917.08 15408.39 1.03x BenchmarkStreamEncode100x20x10000-32 2014.81 2077.31 1.03x BenchmarkStreamEncode17x3x1M-32 11839.37 12434.80 1.05x BenchmarkStreamEncode10x4x16M-32 9151.14 9206.98 1.01x BenchmarkStreamEncode5x2x1M-32 13598.55 13663.56 1.00x BenchmarkStreamEncode10x2x1M-32 13192.91 13453.41 1.02x BenchmarkStreamEncode10x4x1M-32 12109.90 12050.68 1.00x BenchmarkStreamEncode50x20x1M-32 8640.73 8370.10 0.97x BenchmarkStreamEncode17x3x16M-32 10473.17 10527.04 1.01x BenchmarkStreamVerify10x2x10000-32 7032.23 7128.82 1.01x BenchmarkStreamVerify50x5x50000-32 13023.46 13109.31 1.01x BenchmarkStreamVerify10x2x1M-32 11941.63 11949.91 1.00x BenchmarkStreamVerify5x2x1M-32 8029.93 8263.39 1.03x BenchmarkStreamVerify10x4x1M-32 8137.82 8271.11 1.02x BenchmarkStreamVerify50x20x1M-32 7378.87 7708.81 1.04x BenchmarkStreamVerify10x4x16M-32 8973.18 8955.29 1.00x ```	2020-11-10 14:39:23 +01:00
Klaus Post	7daa20bf74	Generate AVX2 code (#141 ) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```	2020-05-20 12:48:34 +02:00
Klaus Post	cf8495259a	Add pure XOR for 1 parity (#138 ) WithFastOneParityMatrix will switch the matrix to a simple xor if there is only one parity shard. The PAR1 matrix already has this property so it has little effect there.	2020-05-13 11:10:58 +02:00
Klaus Post	2df03bd4d1	Ci test more archs (#135 ) * ci: test more architectures	2020-05-09 10:35:17 +02:00
Klaus Post	696c4018f8	bench: Fix reconstruct benchmarks (#133 ) Always corrupt at least one shard and don't shuffle shards.	2020-05-06 15:42:49 +02:00
Frank Wessels	1b9e129671	Avx512 parallel81 (#131 ) * AVX512 routine for 8x1 parallel processing (WIP) * Testing and integration of Parallel81 assembly routine	2020-05-06 12:32:31 +02:00
Klaus Post	cb7a0b5aef	Do fast by one multiplication (#130 ) When multiplying by one we can use faster math.	2020-05-06 11:14:25 +02:00
Klaus Post	65df535980	Make single goroutine encodes more efficient (#122 ) Calculate the optimal per round size to keep data in cache when not using WithAutoGoroutines. ``` λ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkParallel_8x8x05M-16 675225 321053 -52.45% BenchmarkParallel_20x10x05M-16 3471988 600740 -82.70% BenchmarkParallel_8x8x1M-16 3948606 728093 -81.56% BenchmarkParallel_8x8x8M-16 47361588 5976467 -87.38% BenchmarkParallel_8x8x32M-16 195044200 24365474 -87.51% benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x05M-16 6211.71 13064.22 2.10x BenchmarkParallel_20x10x05M-16 3020.10 17454.73 5.78x BenchmarkParallel_8x8x1M-16 2124.45 11521.34 5.42x BenchmarkParallel_8x8x8M-16 1416.95 11228.85 7.92x BenchmarkParallel_8x8x32M-16 1376.28 11017.04 8.00x ```	2020-05-03 19:37:22 +02:00
Klaus Post	d2cfcb8065	Add commandline arg to disable asm for tests. (#116 ) * Add commandline test args	2020-04-22 15:38:21 +02:00
Klaus Post	0abe9de20c	Update tests (#115 ) Don't create new slices.	2020-02-21 11:30:44 -08:00
dssysolyatin	ec2eb9fb8c	Split: Reduce memory allocation (#103 ) * [Split] Reduce memory allocation in Split function	2019-06-25 16:28:24 +02:00
Klaus Post	f5e73dcfe2	Split blocks into size divisible by 16 Older systems (typically without AVX2) are more sensitive to misaligned load+stores. Add parameter to automatically set the number of goroutines. name old time/op new time/op delta Encode10x2x10000-8 18.4µs ± 1% 16.1µs ± 1% -12.43% (p=0.000 n=9+9) Encode100x20x10000-8 692µs ± 1% 608µs ± 1% -12.10% (p=0.000 n=10+10) Encode17x3x1M-8 1.78ms ± 5% 1.49ms ± 1% -16.63% (p=0.000 n=10+10) Encode10x4x16M-8 21.5ms ± 5% 19.6ms ± 4% -8.74% (p=0.000 n=10+9) Encode5x2x1M-8 343µs ± 2% 267µs ± 2% -22.22% (p=0.000 n=9+10) Encode10x2x1M-8 858µs ± 5% 701µs ± 5% -18.34% (p=0.000 n=10+10) Encode10x4x1M-8 1.34ms ± 1% 1.16ms ± 1% -13.19% (p=0.000 n=9+9) Encode50x20x1M-8 30.3ms ± 4% 25.0ms ± 2% -17.51% (p=0.000 n=10+8) Encode17x3x16M-8 26.9ms ± 1% 24.5ms ± 4% -9.13% (p=0.000 n=8+10) name old speed new speed delta Encode10x2x10000-8 5.45GB/s ± 1% 6.22GB/s ± 1% +14.20% (p=0.000 n=9+9) Encode100x20x10000-8 1.44GB/s ± 1% 1.64GB/s ± 1% +13.77% (p=0.000 n=10+10) Encode17x3x1M-8 10.0GB/s ± 5% 12.0GB/s ± 1% +19.88% (p=0.000 n=10+10) Encode10x4x16M-8 7.81GB/s ± 5% 8.56GB/s ± 5% +9.58% (p=0.000 n=10+9) Encode5x2x1M-8 15.3GB/s ± 2% 19.6GB/s ± 2% +28.57% (p=0.000 n=9+10) Encode10x2x1M-8 12.2GB/s ± 5% 15.0GB/s ± 5% +22.45% (p=0.000 n=10+10) Encode10x4x1M-8 7.84GB/s ± 1% 9.03GB/s ± 1% +15.19% (p=0.000 n=9+9) Encode50x20x1M-8 1.73GB/s ± 4% 2.09GB/s ± 4% +20.59% (p=0.000 n=10+9) Encode17x3x16M-8 10.6GB/s ± 1% 11.7GB/s ± 4% +10.12% (p=0.000 n=8+10)	2017-11-18 22:00:55 +01:00
Klaus Post	61c22eab55	Cauchy Matrix option (#70 ) * Experimental Cauchy Matrix Experimental support for Cauchy style matrix http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf All matrices appear reversible. * Remove Go 1.5 and 1.6 from CI tests. * Fix comment. * Increase max number of goroutines+docs.	2017-10-01 14:02:11 +02:00
David Reiss	ddcafc661e	Allow reconstructing into pre-allocated memory. (#66 ) This changes the interface of Reconstruct and ReconstructData to accept slices of zero length but sufficient capacity for shards to reconstruct, and reslices them instead of allocating new memory.	2017-09-20 21:08:24 +02:00
chenzhongtao	d78bf472d8	add Update parity function (#60 ) Add Update parity function	2017-08-20 11:42:39 +02:00
Klaus Post	dc6af2dce5	Minor cleanup (#61 ) * Remove some benchmarks * Format tables a bit. * Doc cleanup	2017-08-13 22:38:27 +02:00
Frank Wessels	0de37d7697	Add ReconstructData interface method (#57 ) * Add ReconstructData interface method to allow reconstruction of any missing data shards * Add support for just reconstructing data shards only to SteamEncoder.Reconstruct()	2017-07-20 12:15:46 +02:00
Fred Akalin	18d548df63	Add support for PAR1 (#55 ) PAR1 is a file format which uses a Reed-Solomon code similar to the current one, except it uses a different (flawed) coding matrix. Add support for it via a WithPAR1Matrix option, so that this code can be used to encode/decode PAR1 files. Also add the option to existing tests, and add a test demonstrating the flaw in PAR1's coding matrix. Also fix an mistakenly inverted test in testOpts(). Incidentally, PAR1 is obsoleted by PAR2, which uses GF(2^16) and tries to fix the flaw in the coding matrix; however, PAR2's coding matrix is still flawed! The real solution is to build the coding matrix like in this repository. PAR1 spec: http://parchive.sourceforge.net/docs/specifications/parity-volume-spec-1.0/article-spec.html Paper describing the (flawed) Reed-Solomon code used by PAR1: http://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.html	2017-06-20 20:24:57 +02:00
Fred Akalin	87c4e5ae75	Allow 256 total shards (#54 ) * Allow 256 total shards	2017-06-19 11:26:52 +02:00
Klaus Post	5abf0ee302	Add options (#46 ) * Add options Make constants changeable as options. The API remains backwards compatible. * Update documentation. * Fix line endings * fmt * fmt * Use functions for parameters. Much neater.	2017-02-19 11:13:22 +01:00
Peter C	c54154da9e	Add Inverse Matrix caching in a Thread-Safe Lookup Tree (#36 ) * Add matrix inversion caching * Benchmark and Parallel Benchmark tests for Reconstruct	2016-09-12 21:31:07 +02:00
Christian Muehlhaeuser	b1c8b4b073	Make Join return an error if a reconstruction is required first If one or more required data shards are still nil and we can't correctly join them before a reconstruction, return ErrReconstructRequired.	2016-08-05 19:23:08 +02:00
Harshavardhana	ba30981088	Add checks for data and parity to not exceed 255 shards in total. Fixes #16	2016-06-03 01:31:01 -07:00
xiaost	9f0bea8a29	Tests: backport go1.6 rand.Read for speedup tests	2016-04-07 18:34:47 +08:00
klauspost	976a24f33b	Move examples to separate file/package This makes the reedsolomon package prefix show up in the documentation examples. + StreamEncoder example.	2015-11-03 12:12:42 +01:00
lukechampine	86bd0f239b	seed RNG in TestSplitJoin	2015-08-08 18:20:40 -04:00
lukechampine	458f451fc2	add codeSomeShardsP test	2015-08-08 13:52:00 -04:00
lukechampine	bb7bd0036a	fully test Split/Join functions	2015-08-08 13:51:11 -04:00
lukechampine	64b705bbf6	fully test Reconstruct function Well, I can't figure out how to trigger the Invert error. It may not be possible; need more domain knowledge to be sure.	2015-08-08 13:50:18 -04:00
lukechampine	f81ea8daaf	fully test Verify function	2015-08-08 13:50:18 -04:00
lukechampine	0238782585	fully test Encode function	2015-08-08 13:50:18 -04:00
lukechampine	10fbe96890	use slice literal	2015-08-06 22:56:32 -04:00
lukechampine	640ab74d9d	fully test the New function	2015-08-06 22:47:11 -04:00
klauspost	d31049df42	Add another example that shows that sets can be xor'ed and still remain valid.	2015-06-23 14:35:16 +02:00

1 2

63 Commits (jerasure-matrix)