reedsolomon-go/options.go

187 lines
5.0 KiB
Go
Raw Normal View History

package reedsolomon
import (
"runtime"
"github.com/klauspost/cpuid/v2"
)
// Option allows to override processing parameters.
type Option func(*options)
type options struct {
maxGoroutines int
minSplitSize int
shardSize int
perRound int
useAVX512, useAVX2, useSSSE3, useSSE2 bool
usePAR1Matrix bool
useCauchy bool
fastOneParity bool
inversionCache bool
Fix stream allocations (#129) Numbers speak for themselves: ``` benchmark old ns/op new ns/op delta BenchmarkStreamEncode10x2x10000-32 4792420 7937 -99.83% BenchmarkStreamEncode100x20x10000-32 38424066 473285 -98.77% BenchmarkStreamEncode17x3x1M-32 8195036 1482191 -81.91% BenchmarkStreamEncode10x4x16M-32 21356715 18051773 -15.47% BenchmarkStreamEncode5x2x1M-32 3295827 412301 -87.49% BenchmarkStreamEncode10x2x1M-32 5249011 798828 -84.78% BenchmarkStreamEncode10x4x1M-32 6392974 904818 -85.85% BenchmarkStreamEncode50x20x1M-32 29083474 7199282 -75.25% BenchmarkStreamEncode17x3x16M-32 32451850 28036421 -13.61% BenchmarkStreamVerify10x2x10000-32 4858416 12988 -99.73% BenchmarkStreamVerify50x5x50000-32 17047361 377003 -97.79% BenchmarkStreamVerify10x2x1M-32 4869964 887214 -81.78% BenchmarkStreamVerify5x2x1M-32 3282999 591669 -81.98% BenchmarkStreamVerify10x4x1M-32 5824392 1230888 -78.87% BenchmarkStreamVerify50x20x1M-32 27301648 6204613 -77.27% BenchmarkStreamVerify10x4x16M-32 8508963 18845695 +121.48% benchmark old MB/s new MB/s speedup BenchmarkStreamEncode10x2x10000-32 20.87 12599.82 603.73x BenchmarkStreamEncode100x20x10000-32 26.03 2112.89 81.17x BenchmarkStreamEncode17x3x1M-32 2175.19 12026.65 5.53x BenchmarkStreamEncode10x4x16M-32 7855.71 9293.94 1.18x BenchmarkStreamEncode5x2x1M-32 1590.76 12716.14 7.99x BenchmarkStreamEncode10x2x1M-32 1997.66 13126.43 6.57x BenchmarkStreamEncode10x4x1M-32 1640.20 11588.81 7.07x BenchmarkStreamEncode50x20x1M-32 1802.70 7282.50 4.04x BenchmarkStreamEncode17x3x16M-32 8788.80 10172.93 1.16x BenchmarkStreamVerify10x2x10000-32 20.58 7699.20 374.11x BenchmarkStreamVerify50x5x50000-32 293.30 13262.49 45.22x BenchmarkStreamVerify10x2x1M-32 2153.15 11818.75 5.49x BenchmarkStreamVerify5x2x1M-32 1596.98 8861.17 5.55x BenchmarkStreamVerify10x4x1M-32 1800.32 8518.86 4.73x BenchmarkStreamVerify50x20x1M-32 1920.35 8449.97 4.40x BenchmarkStreamVerify10x4x16M-32 19717.11 8902.41 0.45x ```
2020-05-05 17:35:35 +03:00
// stream options
concReads bool
concWrites bool
streamBS int
}
var defaultOptions = options{
maxGoroutines: 384,
minSplitSize: -1,
fastOneParity: false,
inversionCache: true,
// Detect CPU capabilities.
useSSSE3: cpuid.CPU.Supports(cpuid.SSSE3),
useSSE2: cpuid.CPU.Supports(cpuid.SSE2),
useAVX2: cpuid.CPU.Supports(cpuid.AVX2),
useAVX512: cpuid.CPU.Supports(cpuid.AVX512F, cpuid.AVX512BW),
}
func init() {
if runtime.GOMAXPROCS(0) <= 1 {
defaultOptions.maxGoroutines = 1
}
}
// WithMaxGoroutines is the maximum number of goroutines number for encoding & decoding.
// Jobs will be split into this many parts, unless each goroutine would have to process
// less than minSplitSize bytes (set with WithMinSplitSize).
// For the best speed, keep this well above the GOMAXPROCS number for more fine grained
// scheduling.
// If n <= 0, it is ignored.
func WithMaxGoroutines(n int) Option {
return func(o *options) {
if n > 0 {
o.maxGoroutines = n
}
}
}
Split blocks into size divisible by 16 Older systems (typically without AVX2) are more sensitive to misaligned load+stores. Add parameter to automatically set the number of goroutines. name old time/op new time/op delta Encode10x2x10000-8 18.4µs ± 1% 16.1µs ± 1% -12.43% (p=0.000 n=9+9) Encode100x20x10000-8 692µs ± 1% 608µs ± 1% -12.10% (p=0.000 n=10+10) Encode17x3x1M-8 1.78ms ± 5% 1.49ms ± 1% -16.63% (p=0.000 n=10+10) Encode10x4x16M-8 21.5ms ± 5% 19.6ms ± 4% -8.74% (p=0.000 n=10+9) Encode5x2x1M-8 343µs ± 2% 267µs ± 2% -22.22% (p=0.000 n=9+10) Encode10x2x1M-8 858µs ± 5% 701µs ± 5% -18.34% (p=0.000 n=10+10) Encode10x4x1M-8 1.34ms ± 1% 1.16ms ± 1% -13.19% (p=0.000 n=9+9) Encode50x20x1M-8 30.3ms ± 4% 25.0ms ± 2% -17.51% (p=0.000 n=10+8) Encode17x3x16M-8 26.9ms ± 1% 24.5ms ± 4% -9.13% (p=0.000 n=8+10) name old speed new speed delta Encode10x2x10000-8 5.45GB/s ± 1% 6.22GB/s ± 1% +14.20% (p=0.000 n=9+9) Encode100x20x10000-8 1.44GB/s ± 1% 1.64GB/s ± 1% +13.77% (p=0.000 n=10+10) Encode17x3x1M-8 10.0GB/s ± 5% 12.0GB/s ± 1% +19.88% (p=0.000 n=10+10) Encode10x4x16M-8 7.81GB/s ± 5% 8.56GB/s ± 5% +9.58% (p=0.000 n=10+9) Encode5x2x1M-8 15.3GB/s ± 2% 19.6GB/s ± 2% +28.57% (p=0.000 n=9+10) Encode10x2x1M-8 12.2GB/s ± 5% 15.0GB/s ± 5% +22.45% (p=0.000 n=10+10) Encode10x4x1M-8 7.84GB/s ± 1% 9.03GB/s ± 1% +15.19% (p=0.000 n=9+9) Encode50x20x1M-8 1.73GB/s ± 4% 2.09GB/s ± 4% +20.59% (p=0.000 n=10+9) Encode17x3x16M-8 10.6GB/s ± 1% 11.7GB/s ± 4% +10.12% (p=0.000 n=8+10)
2017-11-18 19:37:40 +03:00
// WithAutoGoroutines will adjust the number of goroutines for optimal speed with a
// specific shard size.
// Send in the shard size you expect to send. Other shard sizes will work, but may not
// run at the optimal speed.
// Overwrites WithMaxGoroutines.
// If shardSize <= 0, it is ignored.
func WithAutoGoroutines(shardSize int) Option {
return func(o *options) {
o.shardSize = shardSize
}
}
// WithMinSplitSize is the minimum encoding size in bytes per goroutine.
// By default this parameter is determined by CPU cache characteristics.
// See WithMaxGoroutines on how jobs are split.
// If n <= 0, it is ignored.
func WithMinSplitSize(n int) Option {
return func(o *options) {
if n > 0 {
o.minSplitSize = n
}
}
}
Fix stream allocations (#129) Numbers speak for themselves: ``` benchmark old ns/op new ns/op delta BenchmarkStreamEncode10x2x10000-32 4792420 7937 -99.83% BenchmarkStreamEncode100x20x10000-32 38424066 473285 -98.77% BenchmarkStreamEncode17x3x1M-32 8195036 1482191 -81.91% BenchmarkStreamEncode10x4x16M-32 21356715 18051773 -15.47% BenchmarkStreamEncode5x2x1M-32 3295827 412301 -87.49% BenchmarkStreamEncode10x2x1M-32 5249011 798828 -84.78% BenchmarkStreamEncode10x4x1M-32 6392974 904818 -85.85% BenchmarkStreamEncode50x20x1M-32 29083474 7199282 -75.25% BenchmarkStreamEncode17x3x16M-32 32451850 28036421 -13.61% BenchmarkStreamVerify10x2x10000-32 4858416 12988 -99.73% BenchmarkStreamVerify50x5x50000-32 17047361 377003 -97.79% BenchmarkStreamVerify10x2x1M-32 4869964 887214 -81.78% BenchmarkStreamVerify5x2x1M-32 3282999 591669 -81.98% BenchmarkStreamVerify10x4x1M-32 5824392 1230888 -78.87% BenchmarkStreamVerify50x20x1M-32 27301648 6204613 -77.27% BenchmarkStreamVerify10x4x16M-32 8508963 18845695 +121.48% benchmark old MB/s new MB/s speedup BenchmarkStreamEncode10x2x10000-32 20.87 12599.82 603.73x BenchmarkStreamEncode100x20x10000-32 26.03 2112.89 81.17x BenchmarkStreamEncode17x3x1M-32 2175.19 12026.65 5.53x BenchmarkStreamEncode10x4x16M-32 7855.71 9293.94 1.18x BenchmarkStreamEncode5x2x1M-32 1590.76 12716.14 7.99x BenchmarkStreamEncode10x2x1M-32 1997.66 13126.43 6.57x BenchmarkStreamEncode10x4x1M-32 1640.20 11588.81 7.07x BenchmarkStreamEncode50x20x1M-32 1802.70 7282.50 4.04x BenchmarkStreamEncode17x3x16M-32 8788.80 10172.93 1.16x BenchmarkStreamVerify10x2x10000-32 20.58 7699.20 374.11x BenchmarkStreamVerify50x5x50000-32 293.30 13262.49 45.22x BenchmarkStreamVerify10x2x1M-32 2153.15 11818.75 5.49x BenchmarkStreamVerify5x2x1M-32 1596.98 8861.17 5.55x BenchmarkStreamVerify10x4x1M-32 1800.32 8518.86 4.73x BenchmarkStreamVerify50x20x1M-32 1920.35 8449.97 4.40x BenchmarkStreamVerify10x4x16M-32 19717.11 8902.41 0.45x ```
2020-05-05 17:35:35 +03:00
// WithConcurrentStreams will enable concurrent reads and writes on the streams.
// Default: Disabled, meaning only one stream will be read/written at the time.
// Ignored if not used on a stream input.
func WithConcurrentStreams(enabled bool) Option {
return func(o *options) {
o.concReads, o.concWrites = enabled, enabled
}
}
// WithConcurrentStreamReads will enable concurrent reads from the input streams.
// Default: Disabled, meaning only one stream will be read at the time.
// Ignored if not used on a stream input.
func WithConcurrentStreamReads(enabled bool) Option {
return func(o *options) {
o.concReads = enabled
}
}
// WithConcurrentStreamWrites will enable concurrent writes to the the output streams.
// Default: Disabled, meaning only one stream will be written at the time.
// Ignored if not used on a stream input.
func WithConcurrentStreamWrites(enabled bool) Option {
return func(o *options) {
o.concWrites = enabled
}
}
// WithInversionCache allows to control the inversion cache.
// This will cache reconstruction matrices so they can be reused.
// Enabled by default.
func WithInversionCache(enabled bool) Option {
return func(o *options) {
o.inversionCache = enabled
}
}
Fix stream allocations (#129) Numbers speak for themselves: ``` benchmark old ns/op new ns/op delta BenchmarkStreamEncode10x2x10000-32 4792420 7937 -99.83% BenchmarkStreamEncode100x20x10000-32 38424066 473285 -98.77% BenchmarkStreamEncode17x3x1M-32 8195036 1482191 -81.91% BenchmarkStreamEncode10x4x16M-32 21356715 18051773 -15.47% BenchmarkStreamEncode5x2x1M-32 3295827 412301 -87.49% BenchmarkStreamEncode10x2x1M-32 5249011 798828 -84.78% BenchmarkStreamEncode10x4x1M-32 6392974 904818 -85.85% BenchmarkStreamEncode50x20x1M-32 29083474 7199282 -75.25% BenchmarkStreamEncode17x3x16M-32 32451850 28036421 -13.61% BenchmarkStreamVerify10x2x10000-32 4858416 12988 -99.73% BenchmarkStreamVerify50x5x50000-32 17047361 377003 -97.79% BenchmarkStreamVerify10x2x1M-32 4869964 887214 -81.78% BenchmarkStreamVerify5x2x1M-32 3282999 591669 -81.98% BenchmarkStreamVerify10x4x1M-32 5824392 1230888 -78.87% BenchmarkStreamVerify50x20x1M-32 27301648 6204613 -77.27% BenchmarkStreamVerify10x4x16M-32 8508963 18845695 +121.48% benchmark old MB/s new MB/s speedup BenchmarkStreamEncode10x2x10000-32 20.87 12599.82 603.73x BenchmarkStreamEncode100x20x10000-32 26.03 2112.89 81.17x BenchmarkStreamEncode17x3x1M-32 2175.19 12026.65 5.53x BenchmarkStreamEncode10x4x16M-32 7855.71 9293.94 1.18x BenchmarkStreamEncode5x2x1M-32 1590.76 12716.14 7.99x BenchmarkStreamEncode10x2x1M-32 1997.66 13126.43 6.57x BenchmarkStreamEncode10x4x1M-32 1640.20 11588.81 7.07x BenchmarkStreamEncode50x20x1M-32 1802.70 7282.50 4.04x BenchmarkStreamEncode17x3x16M-32 8788.80 10172.93 1.16x BenchmarkStreamVerify10x2x10000-32 20.58 7699.20 374.11x BenchmarkStreamVerify50x5x50000-32 293.30 13262.49 45.22x BenchmarkStreamVerify10x2x1M-32 2153.15 11818.75 5.49x BenchmarkStreamVerify5x2x1M-32 1596.98 8861.17 5.55x BenchmarkStreamVerify10x4x1M-32 1800.32 8518.86 4.73x BenchmarkStreamVerify50x20x1M-32 1920.35 8449.97 4.40x BenchmarkStreamVerify10x4x16M-32 19717.11 8902.41 0.45x ```
2020-05-05 17:35:35 +03:00
// WithStreamBlockSize allows to set a custom block size per round of reads/writes.
// If not set, any shard size set with WithAutoGoroutines will be used.
// If WithAutoGoroutines is also unset, 4MB will be used.
// Ignored if not used on stream.
func WithStreamBlockSize(n int) Option {
return func(o *options) {
o.streamBS = n
}
}
func withSSSE3(enabled bool) Option {
return func(o *options) {
o.useSSSE3 = enabled
}
}
func withAVX2(enabled bool) Option {
return func(o *options) {
o.useAVX2 = enabled
}
}
func withSSE2(enabled bool) Option {
return func(o *options) {
o.useSSE2 = enabled
}
}
func withAVX512(enabled bool) Option {
return func(o *options) {
o.useAVX512 = enabled
}
}
// WithPAR1Matrix causes the encoder to build the matrix how PARv1
// does. Note that the method they use is buggy, and may lead to cases
// where recovery is impossible, even if there are enough parity
// shards.
func WithPAR1Matrix() Option {
return func(o *options) {
o.usePAR1Matrix = true
o.useCauchy = false
}
}
// WithCauchyMatrix will make the encoder build a Cauchy style matrix.
// The output of this is not compatible with the standard output.
// A Cauchy matrix is faster to generate. This does not affect data throughput,
// but will result in slightly faster start-up time.
func WithCauchyMatrix() Option {
return func(o *options) {
o.useCauchy = true
o.usePAR1Matrix = false
}
}
// WithFastOneParityMatrix will switch the matrix to a simple xor
// if there is only one parity shard.
// The PAR1 matrix already has this property so it has little effect there.
func WithFastOneParityMatrix() Option {
return func(o *options) {
o.fastOneParity = true
}
}