reedsolomon-go

Commit Graph

Author	SHA1	Message	Date
Vitaliy Filippov	b933ef1add	Implement jerasure algorithm of matrix generation for interoperability	2022-08-15 14:30:30 +03:00
Vitaliy Filippov	10e7890be7	Add custom coding matrix support (#187 ) Co-authored-by: Vitaliy Filippov <vitalif@yourcmc.ru>	2022-06-17 11:43:51 +02:00
Vitaliy Filippov	1ef513248a	Publish withSSE/withAVX options (#186 ) Co-authored-by: Vitaliy Filippov <vitalif@yourcmc.ru>	2022-06-17 11:42:53 +02:00
Klaus Post	ab26eb4126	Add WithInversionCache and use pointer methods (#160 ) There appears to be writes to value receivers. Add `WithInversionCache(bool)` to disable cache. Fixes #159	2021-01-13 10:21:28 +01:00
Klaus Post	519603f6e1	Update packages (#154 ) * Update packages Update cpuid and clean up generated.	2020-12-09 22:56:01 +01:00
Klaus Post	cf8495259a	Add pure XOR for 1 parity (#138 ) WithFastOneParityMatrix will switch the matrix to a simple xor if there is only one parity shard. The PAR1 matrix already has this property so it has little effect there.	2020-05-13 11:10:58 +02:00
Klaus Post	abb309aca7	Fix stream allocations (#129 ) Numbers speak for themselves: ``` benchmark old ns/op new ns/op delta BenchmarkStreamEncode10x2x10000-32 4792420 7937 -99.83% BenchmarkStreamEncode100x20x10000-32 38424066 473285 -98.77% BenchmarkStreamEncode17x3x1M-32 8195036 1482191 -81.91% BenchmarkStreamEncode10x4x16M-32 21356715 18051773 -15.47% BenchmarkStreamEncode5x2x1M-32 3295827 412301 -87.49% BenchmarkStreamEncode10x2x1M-32 5249011 798828 -84.78% BenchmarkStreamEncode10x4x1M-32 6392974 904818 -85.85% BenchmarkStreamEncode50x20x1M-32 29083474 7199282 -75.25% BenchmarkStreamEncode17x3x16M-32 32451850 28036421 -13.61% BenchmarkStreamVerify10x2x10000-32 4858416 12988 -99.73% BenchmarkStreamVerify50x5x50000-32 17047361 377003 -97.79% BenchmarkStreamVerify10x2x1M-32 4869964 887214 -81.78% BenchmarkStreamVerify5x2x1M-32 3282999 591669 -81.98% BenchmarkStreamVerify10x4x1M-32 5824392 1230888 -78.87% BenchmarkStreamVerify50x20x1M-32 27301648 6204613 -77.27% BenchmarkStreamVerify10x4x16M-32 8508963 18845695 +121.48% benchmark old MB/s new MB/s speedup BenchmarkStreamEncode10x2x10000-32 20.87 12599.82 603.73x BenchmarkStreamEncode100x20x10000-32 26.03 2112.89 81.17x BenchmarkStreamEncode17x3x1M-32 2175.19 12026.65 5.53x BenchmarkStreamEncode10x4x16M-32 7855.71 9293.94 1.18x BenchmarkStreamEncode5x2x1M-32 1590.76 12716.14 7.99x BenchmarkStreamEncode10x2x1M-32 1997.66 13126.43 6.57x BenchmarkStreamEncode10x4x1M-32 1640.20 11588.81 7.07x BenchmarkStreamEncode50x20x1M-32 1802.70 7282.50 4.04x BenchmarkStreamEncode17x3x16M-32 8788.80 10172.93 1.16x BenchmarkStreamVerify10x2x10000-32 20.58 7699.20 374.11x BenchmarkStreamVerify50x5x50000-32 293.30 13262.49 45.22x BenchmarkStreamVerify10x2x1M-32 2153.15 11818.75 5.49x BenchmarkStreamVerify5x2x1M-32 1596.98 8861.17 5.55x BenchmarkStreamVerify10x4x1M-32 1800.32 8518.86 4.73x BenchmarkStreamVerify50x20x1M-32 1920.35 8449.97 4.40x BenchmarkStreamVerify10x4x16M-32 19717.11 8902.41 0.45x ```	2020-05-05 16:35:35 +02:00
Klaus Post	65df535980	Make single goroutine encodes more efficient (#122 ) Calculate the optimal per round size to keep data in cache when not using WithAutoGoroutines. ``` λ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkParallel_8x8x05M-16 675225 321053 -52.45% BenchmarkParallel_20x10x05M-16 3471988 600740 -82.70% BenchmarkParallel_8x8x1M-16 3948606 728093 -81.56% BenchmarkParallel_8x8x8M-16 47361588 5976467 -87.38% BenchmarkParallel_8x8x32M-16 195044200 24365474 -87.51% benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x05M-16 6211.71 13064.22 2.10x BenchmarkParallel_20x10x05M-16 3020.10 17454.73 5.78x BenchmarkParallel_8x8x1M-16 2124.45 11521.34 5.42x BenchmarkParallel_8x8x8M-16 1416.95 11228.85 7.92x BenchmarkParallel_8x8x32M-16 1376.28 11017.04 8.00x ```	2020-05-03 19:37:22 +02:00
Klaus Post	c3634dce94	Use CPU cache to set minSplitSize (#117 ) Use L1 cache size to set default split size.	2020-04-22 16:12:18 +02:00
Klaus Post	d2cfcb8065	Add commandline arg to disable asm for tests. (#116 ) * Add commandline test args	2020-04-22 15:38:21 +02:00
Klaus Post	0883d2f011	Only enable AVX512 on AMD64 Fixes #102	2019-05-26 12:12:55 +02:00
Frank Wessels	79aee05119	AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91 ) The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table: ``` $ benchcmp avx2.txt avx512.txt benchmark AVX2 MB/s AVX512 MB/s speedup BenchmarkEncode8x8x1M-72 1681.35 4125.64 2.45x BenchmarkEncode8x4x8M-72 1529.36 5507.97 3.60x BenchmarkEncode8x8x8M-72 791.16 2952.29 3.73x BenchmarkEncode8x8x32M-72 573.26 2168.61 3.78x BenchmarkEncode12x4x12M-72 1234.41 4912.37 3.98x BenchmarkEncode16x4x16M-72 1189.59 5138.01 4.32x BenchmarkEncode24x8x24M-72 690.68 2583.70 3.74x BenchmarkEncode24x8x48M-72 674.20 2643.31 3.92x ```	2019-02-10 11:17:23 +01:00
Klaus Post	f5e73dcfe2	Split blocks into size divisible by 16 Older systems (typically without AVX2) are more sensitive to misaligned load+stores. Add parameter to automatically set the number of goroutines. name old time/op new time/op delta Encode10x2x10000-8 18.4µs ± 1% 16.1µs ± 1% -12.43% (p=0.000 n=9+9) Encode100x20x10000-8 692µs ± 1% 608µs ± 1% -12.10% (p=0.000 n=10+10) Encode17x3x1M-8 1.78ms ± 5% 1.49ms ± 1% -16.63% (p=0.000 n=10+10) Encode10x4x16M-8 21.5ms ± 5% 19.6ms ± 4% -8.74% (p=0.000 n=10+9) Encode5x2x1M-8 343µs ± 2% 267µs ± 2% -22.22% (p=0.000 n=9+10) Encode10x2x1M-8 858µs ± 5% 701µs ± 5% -18.34% (p=0.000 n=10+10) Encode10x4x1M-8 1.34ms ± 1% 1.16ms ± 1% -13.19% (p=0.000 n=9+9) Encode50x20x1M-8 30.3ms ± 4% 25.0ms ± 2% -17.51% (p=0.000 n=10+8) Encode17x3x16M-8 26.9ms ± 1% 24.5ms ± 4% -9.13% (p=0.000 n=8+10) name old speed new speed delta Encode10x2x10000-8 5.45GB/s ± 1% 6.22GB/s ± 1% +14.20% (p=0.000 n=9+9) Encode100x20x10000-8 1.44GB/s ± 1% 1.64GB/s ± 1% +13.77% (p=0.000 n=10+10) Encode17x3x1M-8 10.0GB/s ± 5% 12.0GB/s ± 1% +19.88% (p=0.000 n=10+10) Encode10x4x16M-8 7.81GB/s ± 5% 8.56GB/s ± 5% +9.58% (p=0.000 n=10+9) Encode5x2x1M-8 15.3GB/s ± 2% 19.6GB/s ± 2% +28.57% (p=0.000 n=9+10) Encode10x2x1M-8 12.2GB/s ± 5% 15.0GB/s ± 5% +22.45% (p=0.000 n=10+10) Encode10x4x1M-8 7.84GB/s ± 1% 9.03GB/s ± 1% +15.19% (p=0.000 n=9+9) Encode50x20x1M-8 1.73GB/s ± 4% 2.09GB/s ± 4% +20.59% (p=0.000 n=10+9) Encode17x3x16M-8 10.6GB/s ± 1% 11.7GB/s ± 4% +10.12% (p=0.000 n=8+10)	2017-11-18 22:00:55 +01:00
Klaus Post	61c22eab55	Cauchy Matrix option (#70 ) * Experimental Cauchy Matrix Experimental support for Cauchy style matrix http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf All matrices appear reversible. * Remove Go 1.5 and 1.6 from CI tests. * Fix comment. * Increase max number of goroutines+docs.	2017-10-01 14:02:11 +02:00
chenzhongtao	d78bf472d8	add Update parity function (#60 ) Add Update parity function	2017-08-20 11:42:39 +02:00
Fred Akalin	18d548df63	Add support for PAR1 (#55 ) PAR1 is a file format which uses a Reed-Solomon code similar to the current one, except it uses a different (flawed) coding matrix. Add support for it via a WithPAR1Matrix option, so that this code can be used to encode/decode PAR1 files. Also add the option to existing tests, and add a test demonstrating the flaw in PAR1's coding matrix. Also fix an mistakenly inverted test in testOpts(). Incidentally, PAR1 is obsoleted by PAR2, which uses GF(2^16) and tries to fix the flaw in the coding matrix; however, PAR2's coding matrix is still flawed! The real solution is to build the coding matrix like in this repository. PAR1 spec: http://parchive.sourceforge.net/docs/specifications/parity-volume-spec-1.0/article-spec.html Paper describing the (flawed) Reed-Solomon code used by PAR1: http://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.html	2017-06-20 20:24:57 +02:00
Klaus Post	dde6ad55c5	Set correct field in WithMinSplitSize Fixes #51	2017-05-28 12:38:06 +02:00
Klaus Post	5abf0ee302	Add options (#46 ) * Add options Make constants changeable as options. The API remains backwards compatible. * Update documentation. * Fix line endings * fmt * fmt * Use functions for parameters. Much neater.	2017-02-19 11:13:22 +01:00

18 Commits (b933ef1add0b9ebb756b52a03ba5017b68c486e4)