reedsolomon-go/reedsolomon.go

1264 lines
36 KiB
Go
Raw Normal View History

2015-06-19 17:31:24 +03:00
/**
* Reed-Solomon Coding over 8-bit values.
*
* Copyright 2015, Klaus Post
* Copyright 2015, Backblaze, Inc.
*/
2015-11-03 14:09:36 +03:00
// Package reedsolomon enables Erasure Coding in Go
//
// For usage and examples, see https://github.com/klauspost/reedsolomon
//
2015-06-19 17:31:24 +03:00
package reedsolomon
import (
"bytes"
2015-06-19 17:31:24 +03:00
"errors"
"io"
Split blocks into size divisible by 16 Older systems (typically without AVX2) are more sensitive to misaligned load+stores. Add parameter to automatically set the number of goroutines. name old time/op new time/op delta Encode10x2x10000-8 18.4µs ± 1% 16.1µs ± 1% -12.43% (p=0.000 n=9+9) Encode100x20x10000-8 692µs ± 1% 608µs ± 1% -12.10% (p=0.000 n=10+10) Encode17x3x1M-8 1.78ms ± 5% 1.49ms ± 1% -16.63% (p=0.000 n=10+10) Encode10x4x16M-8 21.5ms ± 5% 19.6ms ± 4% -8.74% (p=0.000 n=10+9) Encode5x2x1M-8 343µs ± 2% 267µs ± 2% -22.22% (p=0.000 n=9+10) Encode10x2x1M-8 858µs ± 5% 701µs ± 5% -18.34% (p=0.000 n=10+10) Encode10x4x1M-8 1.34ms ± 1% 1.16ms ± 1% -13.19% (p=0.000 n=9+9) Encode50x20x1M-8 30.3ms ± 4% 25.0ms ± 2% -17.51% (p=0.000 n=10+8) Encode17x3x16M-8 26.9ms ± 1% 24.5ms ± 4% -9.13% (p=0.000 n=8+10) name old speed new speed delta Encode10x2x10000-8 5.45GB/s ± 1% 6.22GB/s ± 1% +14.20% (p=0.000 n=9+9) Encode100x20x10000-8 1.44GB/s ± 1% 1.64GB/s ± 1% +13.77% (p=0.000 n=10+10) Encode17x3x1M-8 10.0GB/s ± 5% 12.0GB/s ± 1% +19.88% (p=0.000 n=10+10) Encode10x4x16M-8 7.81GB/s ± 5% 8.56GB/s ± 5% +9.58% (p=0.000 n=10+9) Encode5x2x1M-8 15.3GB/s ± 2% 19.6GB/s ± 2% +28.57% (p=0.000 n=9+10) Encode10x2x1M-8 12.2GB/s ± 5% 15.0GB/s ± 5% +22.45% (p=0.000 n=10+10) Encode10x4x1M-8 7.84GB/s ± 1% 9.03GB/s ± 1% +15.19% (p=0.000 n=9+9) Encode50x20x1M-8 1.73GB/s ± 4% 2.09GB/s ± 4% +20.59% (p=0.000 n=10+9) Encode17x3x16M-8 10.6GB/s ± 1% 11.7GB/s ± 4% +10.12% (p=0.000 n=8+10)
2017-11-18 19:37:40 +03:00
"runtime"
2015-06-19 17:31:24 +03:00
"sync"
Split blocks into size divisible by 16 Older systems (typically without AVX2) are more sensitive to misaligned load+stores. Add parameter to automatically set the number of goroutines. name old time/op new time/op delta Encode10x2x10000-8 18.4µs ± 1% 16.1µs ± 1% -12.43% (p=0.000 n=9+9) Encode100x20x10000-8 692µs ± 1% 608µs ± 1% -12.10% (p=0.000 n=10+10) Encode17x3x1M-8 1.78ms ± 5% 1.49ms ± 1% -16.63% (p=0.000 n=10+10) Encode10x4x16M-8 21.5ms ± 5% 19.6ms ± 4% -8.74% (p=0.000 n=10+9) Encode5x2x1M-8 343µs ± 2% 267µs ± 2% -22.22% (p=0.000 n=9+10) Encode10x2x1M-8 858µs ± 5% 701µs ± 5% -18.34% (p=0.000 n=10+10) Encode10x4x1M-8 1.34ms ± 1% 1.16ms ± 1% -13.19% (p=0.000 n=9+9) Encode50x20x1M-8 30.3ms ± 4% 25.0ms ± 2% -17.51% (p=0.000 n=10+8) Encode17x3x16M-8 26.9ms ± 1% 24.5ms ± 4% -9.13% (p=0.000 n=8+10) name old speed new speed delta Encode10x2x10000-8 5.45GB/s ± 1% 6.22GB/s ± 1% +14.20% (p=0.000 n=9+9) Encode100x20x10000-8 1.44GB/s ± 1% 1.64GB/s ± 1% +13.77% (p=0.000 n=10+10) Encode17x3x1M-8 10.0GB/s ± 5% 12.0GB/s ± 1% +19.88% (p=0.000 n=10+10) Encode10x4x16M-8 7.81GB/s ± 5% 8.56GB/s ± 5% +9.58% (p=0.000 n=10+9) Encode5x2x1M-8 15.3GB/s ± 2% 19.6GB/s ± 2% +28.57% (p=0.000 n=9+10) Encode10x2x1M-8 12.2GB/s ± 5% 15.0GB/s ± 5% +22.45% (p=0.000 n=10+10) Encode10x4x1M-8 7.84GB/s ± 1% 9.03GB/s ± 1% +15.19% (p=0.000 n=9+9) Encode50x20x1M-8 1.73GB/s ± 4% 2.09GB/s ± 4% +20.59% (p=0.000 n=10+9) Encode17x3x16M-8 10.6GB/s ± 1% 11.7GB/s ± 4% +10.12% (p=0.000 n=8+10)
2017-11-18 19:37:40 +03:00
"github.com/klauspost/cpuid/v2"
2015-06-19 17:31:24 +03:00
)
// Encoder is an interface to encode Reed-Salomon parity sets for your data.
type Encoder interface {
// Encode parity for a set of data shards.
// Input is 'shards' containing data shards followed by parity shards.
// The number of shards must match the number given to New().
// Each shard is a byte array, and they must all be the same size.
// The parity shards will always be overwritten and the data shards
// will remain the same, so it is safe for you to read from the
// data shards while this is running.
Encode(shards [][]byte) error
// EncodeIdx will add parity for a single data shard.
// Parity shards should start out as 0. The caller must zero them.
// Data shards must be delivered exactly once. There is no check for this.
// The parity shards will always be updated and the data shards will remain the same.
EncodeIdx(dataShard []byte, idx int, parity [][]byte) error
// Verify returns true if the parity shards contain correct data.
// The data is the same format as Encode. No data is modified, so
// you are allowed to read from data while this is running.
Verify(shards [][]byte) (bool, error)
// Reconstruct will recreate the missing shards if possible.
//
// Given a list of shards, some of which contain data, fills in the
// ones that don't have data.
//
// The length of the array must be equal to the total number of shards.
// You indicate that a shard is missing by setting it to nil or zero-length.
// If a shard is zero-length but has sufficient capacity, that memory will
// be used, otherwise a new []byte will be allocated.
//
// If there are too few shards to reconstruct the missing
// ones, ErrTooFewShards will be returned.
//
// The reconstructed shard set is complete, but integrity is not verified.
// Use the Verify function to check if data set is ok.
Reconstruct(shards [][]byte) error
// ReconstructData will recreate any missing data shards, if possible.
//
// Given a list of shards, some of which contain data, fills in the
// data shards that don't have data.
//
// The length of the array must be equal to Shards.
// You indicate that a shard is missing by setting it to nil or zero-length.
// If a shard is zero-length but has sufficient capacity, that memory will
// be used, otherwise a new []byte will be allocated.
//
// If there are too few shards to reconstruct the missing
// ones, ErrTooFewShards will be returned.
//
// As the reconstructed shard set may contain missing parity shards,
// calling the Verify function is likely to fail.
ReconstructData(shards [][]byte) error
// Update parity is use for change a few data shards and update it's parity.
// Input 'newDatashards' containing data shards changed.
// Input 'shards' containing old data shards (if data shard not changed, it can be nil) and old parity shards.
// new parity shards will in shards[DataShards:]
// Update is very useful if DataShards much larger than ParityShards and changed data shards is few. It will
// faster than Encode and not need read all data shards to encode.
Update(shards [][]byte, newDatashards [][]byte) error
// Split a data slice into the number of shards given to the encoder,
// and create empty parity shards.
//
// The data will be split into equally sized shards.
// If the data size isn't dividable by the number of shards,
// the last shard will contain extra zeros.
//
2016-05-01 13:00:51 +03:00
// There must be at least 1 byte otherwise ErrShortData will be
// returned.
//
// The data will not be copied, except for the last shard, so you
// should not modify the data of the input slice afterwards.
Split(data []byte) ([][]byte, error)
// Join the shards and write the data segment to dst.
//
// Only the data shards are considered.
// You must supply the exact output size you want.
// If there are to few shards given, ErrTooFewShards will be returned.
// If the total data size is less than outSize, ErrShortData will be returned.
Join(dst io.Writer, shards [][]byte, outSize int) error
}
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
const (
avx2CodeGenMinSize = 64
avx2CodeGenMinShards = 3
avx2CodeGenMaxGoroutines = 8
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
intSize = 32 << (^uint(0) >> 63) // 32 or 64
maxInt = 1<<(intSize-1) - 1
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
)
// reedSolomon contains a matrix for a specific
// distribution of datashards and parity shards.
// Construct if using New()
type reedSolomon struct {
DataShards int // Number of data shards, should not be modified.
ParityShards int // Number of parity shards, should not be modified.
Shards int // Total number of shards. Calculated, and should not be modified.
m matrix
tree *inversionTree
parity [][]byte
o options
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
mPool sync.Pool
2015-06-19 17:31:24 +03:00
}
// ErrInvShardNum will be returned by New, if you attempt to create
2021-03-08 18:13:24 +03:00
// an Encoder with less than one data shard or less than zero parity
// shards.
var ErrInvShardNum = errors.New("cannot create Encoder with less than one data shard or less than zero parity shards")
// ErrMaxShardNum will be returned by New, if you attempt to create an
// Encoder where data and parity shards are bigger than the order of
// GF(2^8).
var ErrMaxShardNum = errors.New("cannot create Encoder with more than 256 data+parity shards")
// buildMatrix creates the matrix to use for encoding, given the
// number of data shards and the number of total shards.
//
// The top square of the matrix is guaranteed to be an identity
// matrix, which means that the data shards are unchanged after
// encoding.
func buildMatrix(dataShards, totalShards int) (matrix, error) {
// Start with a Vandermonde matrix. This matrix would work,
// in theory, but doesn't have the property that the data
// shards are unchanged after encoding.
vm, err := vandermonde(totalShards, dataShards)
if err != nil {
return nil, err
}
// Multiply by the inverse of the top square of the matrix.
// This will make the top square be the identity matrix, but
// preserve the property that any square subset of rows is
// invertible.
top, err := vm.SubMatrix(0, 0, dataShards, dataShards)
if err != nil {
return nil, err
}
topInv, err := top.Invert()
if err != nil {
return nil, err
}
return vm.Multiply(topInv)
}
// buildMatrixPAR1 creates the matrix to use for encoding according to
// the PARv1 spec, given the number of data shards and the number of
// total shards. Note that the method they use is buggy, and may lead
// to cases where recovery is impossible, even if there are enough
// parity shards.
//
// The top square of the matrix is guaranteed to be an identity
// matrix, which means that the data shards are unchanged after
// encoding.
func buildMatrixPAR1(dataShards, totalShards int) (matrix, error) {
result, err := newMatrix(totalShards, dataShards)
if err != nil {
return nil, err
}
for r, row := range result {
// The top portion of the matrix is the identity
// matrix, and the bottom is a transposed Vandermonde
// matrix starting at 1 instead of 0.
if r < dataShards {
result[r][r] = 1
} else {
for c := range row {
result[r][c] = galExp(byte(c+1), r-dataShards)
}
}
}
return result, nil
}
func buildMatrixCauchy(dataShards, totalShards int) (matrix, error) {
result, err := newMatrix(totalShards, dataShards)
if err != nil {
return nil, err
}
for r, row := range result {
// The top portion of the matrix is the identity
// matrix, and the bottom is a transposed Cauchy matrix.
if r < dataShards {
result[r][r] = 1
} else {
for c := range row {
result[r][c] = invTable[(byte(r ^ c))]
}
}
}
return result, nil
}
// buildXorMatrix can be used to build a matrix with pure XOR
// operations if there is only one parity shard.
func buildXorMatrix(dataShards, totalShards int) (matrix, error) {
if dataShards+1 != totalShards {
return nil, errors.New("internal error")
}
result, err := newMatrix(totalShards, dataShards)
if err != nil {
return nil, err
}
for r, row := range result {
// The top portion of the matrix is the identity
// matrix.
if r < dataShards {
result[r][r] = 1
} else {
// Set all values to 1 (XOR)
for c := range row {
result[r][c] = 1
}
}
}
return result, nil
}
// New creates a new encoder and initializes it to
// the number of data shards and parity shards that
// you want to use. You can reuse this encoder.
// Note that the maximum number of total shards is 256.
// If no options are supplied, default options are used.
func New(dataShards, parityShards int, opts ...Option) (Encoder, error) {
r := reedSolomon{
DataShards: dataShards,
ParityShards: parityShards,
Shards: dataShards + parityShards,
o: defaultOptions,
2015-06-19 17:31:24 +03:00
}
for _, opt := range opts {
opt(&r.o)
}
2021-03-08 18:13:24 +03:00
if dataShards <= 0 || parityShards < 0 {
return nil, ErrInvShardNum
}
if dataShards+parityShards > 256 {
return nil, ErrMaxShardNum
}
2021-03-08 18:13:24 +03:00
if parityShards == 0 {
return &r, nil
}
var err error
switch {
case r.o.fastOneParity && parityShards == 1:
r.m, err = buildXorMatrix(dataShards, r.Shards)
case r.o.useCauchy:
r.m, err = buildMatrixCauchy(dataShards, r.Shards)
case r.o.usePAR1Matrix:
r.m, err = buildMatrixPAR1(dataShards, r.Shards)
default:
r.m, err = buildMatrix(dataShards, r.Shards)
}
2015-06-19 17:31:24 +03:00
if err != nil {
return nil, err
}
// Calculate what we want per round
r.o.perRound = cpuid.CPU.Cache.L2
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
divide := parityShards + 1
if avx2CodeGen && r.o.useAVX2 && (dataShards > maxAvx2Inputs || parityShards > maxAvx2Outputs) {
// Base on L1 cache if we have many inputs.
r.o.perRound = cpuid.CPU.Cache.L1D
divide = 0
if dataShards > maxAvx2Inputs {
divide += maxAvx2Inputs
} else {
divide += dataShards
}
if parityShards > maxAvx2Inputs {
divide += maxAvx2Outputs
} else {
divide += parityShards
}
}
if r.o.perRound <= 0 {
// Set to 128K if undetectable.
r.o.perRound = 128 << 10
}
if cpuid.CPU.ThreadsPerCore > 1 && r.o.maxGoroutines > cpuid.CPU.PhysicalCores {
// If multiple threads per core, make sure they don't contend for cache.
r.o.perRound /= cpuid.CPU.ThreadsPerCore
}
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
// 1 input + parity must fit in cache, and we add one more to be safer.
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
r.o.perRound = r.o.perRound / divide
// Align to 64 bytes.
r.o.perRound = ((r.o.perRound + 63) / 64) * 64
if r.o.minSplitSize <= 0 {
// Set minsplit as high as we can, but still have parity in L1.
cacheSize := cpuid.CPU.Cache.L1D
if cacheSize <= 0 {
cacheSize = 32 << 10
}
r.o.minSplitSize = cacheSize / (parityShards + 1)
// Min 1K
if r.o.minSplitSize < 1024 {
r.o.minSplitSize = 1024
}
}
2020-05-06 16:42:30 +03:00
if r.o.shardSize > 0 {
p := runtime.GOMAXPROCS(0)
if p == 1 || r.o.shardSize <= r.o.minSplitSize*2 {
// Not worth it.
r.o.maxGoroutines = 1
} else {
g := r.o.shardSize / r.o.perRound
// Overprovision by a factor of 2.
if g < p*2 && r.o.perRound > r.o.minSplitSize*2 {
g = p * 2
r.o.perRound /= 2
}
// Have g be multiple of p
g += p - 1
g -= g % p
r.o.maxGoroutines = g
}
}
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
// Generated AVX2 does not need data to stay in L1 cache between runs.
// We will be purely limited by RAM speed.
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
if r.canAVX2C(avx2CodeGenMinSize, maxAvx2Inputs, maxAvx2Outputs) && r.o.maxGoroutines > avx2CodeGenMaxGoroutines {
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
r.o.maxGoroutines = avx2CodeGenMaxGoroutines
}
// Inverted matrices are cached in a tree keyed by the indices
// of the invalid rows of the data to reconstruct.
// The inversion root node will have the identity matrix as
// its inversion matrix because it implies there are no errors
// with the original data.
if r.o.inversionCache {
r.tree = newInversionTree(dataShards, parityShards)
}
r.parity = make([][]byte, parityShards)
2015-06-19 17:31:24 +03:00
for i := range r.parity {
r.parity[i] = r.m[dataShards+i]
2015-06-19 17:31:24 +03:00
}
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
if avx2CodeGen && r.o.useAVX2 {
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
sz := r.DataShards * r.ParityShards * 2 * 32
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
r.mPool.New = func() interface{} {
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
return make([]byte, sz)
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
}
}
2015-06-19 17:31:24 +03:00
return &r, err
}
// ErrTooFewShards is returned if too few shards where given to
// Encode/Verify/Reconstruct/Update. It will also be returned from Reconstruct
// if there were too few shards to reconstruct the missing data.
var ErrTooFewShards = errors.New("too few shards given")
2015-06-19 17:31:24 +03:00
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
// Encode parity for a set of data shards.
// An array 'shards' containing data shards followed by parity shards.
// The number of shards must match the number given to New.
2015-06-19 17:31:24 +03:00
// Each shard is a byte array, and they must all be the same size.
// The parity shards will always be overwritten and the data shards
// will remain the same.
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) Encode(shards [][]byte) error {
if len(shards) != r.Shards {
2015-06-19 17:31:24 +03:00
return ErrTooFewShards
}
err := checkShards(shards, false)
if err != nil {
return err
}
// Get the slice of output buffers.
output := shards[r.DataShards:]
2015-06-19 17:31:24 +03:00
// Do the coding.
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
r.codeSomeShards(r.parity, shards[0:r.DataShards], output[:r.ParityShards], len(shards[0]))
2015-06-19 17:31:24 +03:00
return nil
}
// EncodeIdx will add parity for a single data shard.
// Parity shards should start out zeroed. The caller must zero them before first call.
// Data shards should only be delivered once. There is no check for this.
// The parity shards will always be updated and the data shards will remain the unchanged.
func (r *reedSolomon) EncodeIdx(dataShard []byte, idx int, parity [][]byte) error {
if len(parity) != r.ParityShards {
return ErrTooFewShards
}
if len(parity) == 0 {
return nil
}
if idx < 0 || idx >= r.DataShards {
return ErrInvShardNum
}
err := checkShards(parity, false)
if err != nil {
return err
}
if len(parity[0]) != len(dataShard) {
return ErrShardSize
}
// Process using no goroutines for now.
start, end := 0, r.o.perRound
if end > len(dataShard) {
end = len(dataShard)
}
for start < len(dataShard) {
in := dataShard[start:end]
for iRow := 0; iRow < r.ParityShards; iRow++ {
galMulSliceXor(r.parity[iRow][idx], in, parity[iRow][start:end], &r.o)
}
start = end
end += r.o.perRound
if end > len(dataShard) {
end = len(dataShard)
}
}
return nil
}
// ErrInvalidInput is returned if invalid input parameter of Update.
var ErrInvalidInput = errors.New("invalid input")
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) Update(shards [][]byte, newDatashards [][]byte) error {
if len(shards) != r.Shards {
return ErrTooFewShards
}
if len(newDatashards) != r.DataShards {
return ErrTooFewShards
}
err := checkShards(shards, true)
if err != nil {
return err
}
err = checkShards(newDatashards, true)
if err != nil {
return err
}
for i := range newDatashards {
if newDatashards[i] != nil && shards[i] == nil {
return ErrInvalidInput
}
}
for _, p := range shards[r.DataShards:] {
if p == nil {
return ErrInvalidInput
}
}
shardSize := shardSize(shards)
// Get the slice of output buffers.
output := shards[r.DataShards:]
// Do the coding.
r.updateParityShards(r.parity, shards[0:r.DataShards], newDatashards[0:r.DataShards], output, r.ParityShards, shardSize)
return nil
}
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) updateParityShards(matrixRows, oldinputs, newinputs, outputs [][]byte, outputCount, byteCount int) {
2021-03-08 18:13:24 +03:00
if len(outputs) == 0 {
return
}
if r.o.maxGoroutines > 1 && byteCount > r.o.minSplitSize {
r.updateParityShardsP(matrixRows, oldinputs, newinputs, outputs, outputCount, byteCount)
return
}
for c := 0; c < r.DataShards; c++ {
in := newinputs[c]
if in == nil {
continue
}
oldin := oldinputs[c]
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
// oldinputs data will be changed
sliceXor(in, oldin, &r.o)
for iRow := 0; iRow < outputCount; iRow++ {
galMulSliceXor(matrixRows[iRow][c], oldin, outputs[iRow], &r.o)
}
}
}
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) updateParityShardsP(matrixRows, oldinputs, newinputs, outputs [][]byte, outputCount, byteCount int) {
var wg sync.WaitGroup
do := byteCount / r.o.maxGoroutines
if do < r.o.minSplitSize {
do = r.o.minSplitSize
}
start := 0
for start < byteCount {
if start+do > byteCount {
do = byteCount - start
}
wg.Add(1)
go func(start, stop int) {
for c := 0; c < r.DataShards; c++ {
in := newinputs[c]
if in == nil {
continue
}
oldin := oldinputs[c]
// oldinputs data will be change
sliceXor(in[start:stop], oldin[start:stop], &r.o)
for iRow := 0; iRow < outputCount; iRow++ {
galMulSliceXor(matrixRows[iRow][c], oldin[start:stop], outputs[iRow][start:stop], &r.o)
}
}
wg.Done()
}(start, start+do)
start += do
}
wg.Wait()
}
2015-06-19 17:31:24 +03:00
// Verify returns true if the parity shards contain the right data.
// The data is the same format as Encode. No data is modified.
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) Verify(shards [][]byte) (bool, error) {
if len(shards) != r.Shards {
2015-06-19 17:31:24 +03:00
return false, ErrTooFewShards
}
err := checkShards(shards, false)
if err != nil {
return false, err
}
// Slice of buffers being checked.
toCheck := shards[r.DataShards:]
2015-06-19 17:31:24 +03:00
// Do the checking.
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
return r.checkSomeShards(r.parity, shards[:r.DataShards], toCheck[:r.ParityShards], len(shards[0])), nil
2015-06-19 17:31:24 +03:00
}
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
func (r *reedSolomon) canAVX2C(byteCount int, inputs, outputs int) bool {
return avx2CodeGen && r.o.useAVX2 &&
byteCount >= avx2CodeGenMinSize && inputs+outputs >= avx2CodeGenMinShards &&
inputs <= maxAvx2Inputs && outputs <= maxAvx2Outputs
}
2015-06-19 17:31:24 +03:00
// Multiplies a subset of rows from a coding matrix by a full set of
// input shards to produce some output shards.
// 'matrixRows' is The rows from the matrix to use.
// 'inputs' An array of byte arrays, each of which is one input shard.
// The number of inputs used is determined by the length of each matrix row.
2015-06-19 17:31:24 +03:00
// outputs Byte arrays where the computed shards are stored.
// The number of outputs computed, and the
// number of matrix rows used, is determined by
// outputCount, which is the number of outputs to compute.
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
func (r *reedSolomon) codeSomeShards(matrixRows, inputs, outputs [][]byte, byteCount int) {
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
if len(outputs) == 0 {
return
}
switch {
case r.o.useAVX512 && r.o.maxGoroutines > 1 && byteCount > r.o.minSplitSize && len(inputs) >= 4 && len(outputs) >= 2:
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
r.codeSomeShardsAvx512P(matrixRows, inputs, outputs, byteCount)
return
case r.o.useAVX512 && len(inputs) >= 4 && len(outputs) >= 2:
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
r.codeSomeShardsAvx512(matrixRows, inputs, outputs, byteCount)
return
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
case byteCount > r.o.minSplitSize:
r.codeSomeShardsP(matrixRows, inputs, outputs, byteCount)
2015-06-19 17:31:24 +03:00
return
}
// Process using no goroutines
start, end := 0, r.o.perRound
if end > len(inputs[0]) {
end = len(inputs[0])
}
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
if r.canAVX2C(byteCount, len(inputs), len(outputs)) {
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
m := genAvx2Matrix(matrixRows, len(inputs), 0, len(outputs), r.mPool.Get().([]byte))
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
start += galMulSlicesAvx2(m, inputs, outputs, 0, byteCount)
r.mPool.Put(m)
end = len(inputs[0])
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
} else if len(inputs)+len(outputs) > avx2CodeGenMinShards && r.canAVX2C(byteCount, maxAvx2Inputs, maxAvx2Outputs) {
end = len(inputs[0])
inIdx := 0
m := r.mPool.Get().([]byte)
defer r.mPool.Put(m)
ins := inputs
for len(ins) > 0 {
inPer := ins
if len(inPer) > maxAvx2Inputs {
inPer = inPer[:maxAvx2Inputs]
}
outs := outputs
outIdx := 0
for len(outs) > 0 {
outPer := outs
if len(outPer) > maxAvx2Outputs {
outPer = outPer[:maxAvx2Outputs]
}
m = genAvx2Matrix(matrixRows[outIdx:], len(inPer), inIdx, len(outPer), m)
if inIdx == 0 {
galMulSlicesAvx2(m, inPer, outPer, 0, byteCount)
} else {
galMulSlicesAvx2Xor(m, inPer, outPer, 0, byteCount)
}
start = byteCount & avxSizeMask
outIdx += len(outPer)
outs = outs[len(outPer):]
}
inIdx += len(inPer)
ins = ins[len(inPer):]
}
if start >= end {
return
}
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
}
for start < len(inputs[0]) {
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
for c := 0; c < len(inputs); c++ {
in := inputs[c][start:end]
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
for iRow := 0; iRow < len(outputs); iRow++ {
if c == 0 {
galMulSlice(matrixRows[iRow][c], in, outputs[iRow][start:end], &r.o)
} else {
galMulSliceXor(matrixRows[iRow][c], in, outputs[iRow][start:end], &r.o)
}
2015-06-19 17:31:24 +03:00
}
}
start = end
end += r.o.perRound
if end > len(inputs[0]) {
end = len(inputs[0])
}
2015-06-19 17:31:24 +03:00
}
}
// Perform the same as codeSomeShards, but split the workload into
// several goroutines.
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
func (r *reedSolomon) codeSomeShardsP(matrixRows, inputs, outputs [][]byte, byteCount int) {
2015-06-19 17:31:24 +03:00
var wg sync.WaitGroup
Faster AVX2 encoding (#153) * Remove 50% of bounds checks when copying. * Use RIP only addressing, free one register. ``` benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 57663.49 58005.87 1.01x BenchmarkGalois1M-32 49479.31 49848.29 1.01x BenchmarkGaloisXor128K-32 46310.69 46501.88 1.00x BenchmarkGaloisXor1M-32 43804.86 43984.39 1.00x BenchmarkEncode10x2x10000-32 25926.93 27457.75 1.06x BenchmarkEncode100x20x10000-32 2635.82 2818.95 1.07x BenchmarkEncode17x3x1M-32 63215.11 61576.76 0.97x BenchmarkEncode10x4x16M-32 19551.54 19505.07 1.00x BenchmarkEncode5x2x1M-32 79612.06 81985.14 1.03x BenchmarkEncode10x2x1M-32 121478.29 127739.41 1.05x BenchmarkEncode10x4x1M-32 70757.61 74423.67 1.05x BenchmarkEncode50x20x1M-32 19811.96 20103.32 1.01x BenchmarkEncode17x3x16M-32 27202.10 27825.34 1.02x BenchmarkEncode_8x4x8M-32 19029.04 19701.31 1.04x BenchmarkEncode_12x4x12M-32 22449.87 22480.51 1.00x BenchmarkEncode_16x4x16M-32 24536.74 24672.24 1.01x BenchmarkEncode_16x4x32M-32 24381.34 24981.99 1.02x BenchmarkEncode_16x4x64M-32 24717.69 25086.94 1.01x BenchmarkEncode_8x5x8M-32 16763.51 17154.04 1.02x BenchmarkEncode_8x6x8M-32 15067.22 15205.87 1.01x BenchmarkEncode_8x7x8M-32 13156.38 13589.40 1.03x BenchmarkEncode_8x9x8M-32 11363.74 11523.70 1.01x BenchmarkEncode_8x10x8M-32 10359.37 10474.91 1.01x BenchmarkEncode_8x11x8M-32 9627.07 9463.24 0.98x BenchmarkEncode_8x8x05M-32 30104.80 32634.89 1.08x BenchmarkEncode_8x8x1M-32 36497.28 36425.88 1.00x BenchmarkEncode_8x8x8M-32 12186.19 11602.41 0.95x BenchmarkEncode_8x8x32M-32 11670.72 11413.71 0.98x BenchmarkEncode_24x8x24M-32 21709.83 21652.50 1.00x BenchmarkEncode_24x8x48M-32 22494.40 22280.59 0.99x BenchmarkVerify10x2x10000-32 10567.56 10483.91 0.99x BenchmarkVerify50x5x50000-32 28102.84 27923.63 0.99x BenchmarkVerify10x2x1M-32 30298.33 30106.18 0.99x BenchmarkVerify5x2x1M-32 16115.91 15847.03 0.98x BenchmarkVerify10x4x1M-32 15382.13 14852.68 0.97x BenchmarkVerify50x20x1M-32 8476.02 8466.24 1.00x BenchmarkVerify10x4x16M-32 15101.03 15434.71 1.02x BenchmarkReconstruct10x2x10000-32 26228.18 26960.19 1.03x BenchmarkReconstruct50x5x50000-32 31091.42 30975.82 1.00x BenchmarkReconstruct10x2x1M-32 58548.87 60281.92 1.03x BenchmarkReconstruct5x2x1M-32 39499.23 41791.80 1.06x BenchmarkReconstruct10x4x1M-32 41448.60 43053.15 1.04x BenchmarkReconstruct50x20x1M-32 17185.99 17354.67 1.01x BenchmarkReconstruct10x4x16M-32 18798.60 18847.43 1.00x BenchmarkReconstructData10x2x10000-32 27208.48 27538.38 1.01x BenchmarkReconstructData50x5x50000-32 32135.65 32078.91 1.00x BenchmarkReconstructData10x2x1M-32 63180.19 67332.17 1.07x BenchmarkReconstructData5x2x1M-32 47532.85 49932.17 1.05x BenchmarkReconstructData10x4x1M-32 50059.14 52323.15 1.05x BenchmarkReconstructData50x20x1M-32 26679.75 26714.11 1.00x BenchmarkReconstructData10x4x16M-32 24854.99 24527.23 0.99x BenchmarkReconstructP10x2x10000-32 115089.87 113229.75 0.98x BenchmarkReconstructP10x5x20000-32 129838.75 132871.10 1.02x BenchmarkParallel_8x8x64K-32 69951.43 69980.44 1.00x BenchmarkParallel_8x8x05M-32 11752.94 11724.35 1.00x BenchmarkParallel_20x10x05M-32 18553.93 18613.33 1.00x BenchmarkParallel_8x8x1M-32 11639.19 11746.86 1.01x BenchmarkParallel_8x8x8M-32 11799.36 11685.63 0.99x BenchmarkParallel_8x8x32M-32 11510.94 11791.72 1.02x BenchmarkParallel_8x3x1M-32 20268.92 20678.21 1.02x BenchmarkParallel_8x4x1M-32 17616.05 17856.17 1.01x BenchmarkParallel_8x5x1M-32 15590.87 15872.42 1.02x BenchmarkStreamEncode10x2x10000-32 14917.08 15408.39 1.03x BenchmarkStreamEncode100x20x10000-32 2014.81 2077.31 1.03x BenchmarkStreamEncode17x3x1M-32 11839.37 12434.80 1.05x BenchmarkStreamEncode10x4x16M-32 9151.14 9206.98 1.01x BenchmarkStreamEncode5x2x1M-32 13598.55 13663.56 1.00x BenchmarkStreamEncode10x2x1M-32 13192.91 13453.41 1.02x BenchmarkStreamEncode10x4x1M-32 12109.90 12050.68 1.00x BenchmarkStreamEncode50x20x1M-32 8640.73 8370.10 0.97x BenchmarkStreamEncode17x3x16M-32 10473.17 10527.04 1.01x BenchmarkStreamVerify10x2x10000-32 7032.23 7128.82 1.01x BenchmarkStreamVerify50x5x50000-32 13023.46 13109.31 1.01x BenchmarkStreamVerify10x2x1M-32 11941.63 11949.91 1.00x BenchmarkStreamVerify5x2x1M-32 8029.93 8263.39 1.03x BenchmarkStreamVerify10x4x1M-32 8137.82 8271.11 1.02x BenchmarkStreamVerify50x20x1M-32 7378.87 7708.81 1.04x BenchmarkStreamVerify10x4x16M-32 8973.18 8955.29 1.00x ```
2020-11-10 16:39:23 +03:00
gor := r.o.maxGoroutines
var avx2Matrix []byte
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
useAvx2 := r.canAVX2C(byteCount, len(inputs), len(outputs))
Faster AVX2 encoding (#153) * Remove 50% of bounds checks when copying. * Use RIP only addressing, free one register. ``` benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 57663.49 58005.87 1.01x BenchmarkGalois1M-32 49479.31 49848.29 1.01x BenchmarkGaloisXor128K-32 46310.69 46501.88 1.00x BenchmarkGaloisXor1M-32 43804.86 43984.39 1.00x BenchmarkEncode10x2x10000-32 25926.93 27457.75 1.06x BenchmarkEncode100x20x10000-32 2635.82 2818.95 1.07x BenchmarkEncode17x3x1M-32 63215.11 61576.76 0.97x BenchmarkEncode10x4x16M-32 19551.54 19505.07 1.00x BenchmarkEncode5x2x1M-32 79612.06 81985.14 1.03x BenchmarkEncode10x2x1M-32 121478.29 127739.41 1.05x BenchmarkEncode10x4x1M-32 70757.61 74423.67 1.05x BenchmarkEncode50x20x1M-32 19811.96 20103.32 1.01x BenchmarkEncode17x3x16M-32 27202.10 27825.34 1.02x BenchmarkEncode_8x4x8M-32 19029.04 19701.31 1.04x BenchmarkEncode_12x4x12M-32 22449.87 22480.51 1.00x BenchmarkEncode_16x4x16M-32 24536.74 24672.24 1.01x BenchmarkEncode_16x4x32M-32 24381.34 24981.99 1.02x BenchmarkEncode_16x4x64M-32 24717.69 25086.94 1.01x BenchmarkEncode_8x5x8M-32 16763.51 17154.04 1.02x BenchmarkEncode_8x6x8M-32 15067.22 15205.87 1.01x BenchmarkEncode_8x7x8M-32 13156.38 13589.40 1.03x BenchmarkEncode_8x9x8M-32 11363.74 11523.70 1.01x BenchmarkEncode_8x10x8M-32 10359.37 10474.91 1.01x BenchmarkEncode_8x11x8M-32 9627.07 9463.24 0.98x BenchmarkEncode_8x8x05M-32 30104.80 32634.89 1.08x BenchmarkEncode_8x8x1M-32 36497.28 36425.88 1.00x BenchmarkEncode_8x8x8M-32 12186.19 11602.41 0.95x BenchmarkEncode_8x8x32M-32 11670.72 11413.71 0.98x BenchmarkEncode_24x8x24M-32 21709.83 21652.50 1.00x BenchmarkEncode_24x8x48M-32 22494.40 22280.59 0.99x BenchmarkVerify10x2x10000-32 10567.56 10483.91 0.99x BenchmarkVerify50x5x50000-32 28102.84 27923.63 0.99x BenchmarkVerify10x2x1M-32 30298.33 30106.18 0.99x BenchmarkVerify5x2x1M-32 16115.91 15847.03 0.98x BenchmarkVerify10x4x1M-32 15382.13 14852.68 0.97x BenchmarkVerify50x20x1M-32 8476.02 8466.24 1.00x BenchmarkVerify10x4x16M-32 15101.03 15434.71 1.02x BenchmarkReconstruct10x2x10000-32 26228.18 26960.19 1.03x BenchmarkReconstruct50x5x50000-32 31091.42 30975.82 1.00x BenchmarkReconstruct10x2x1M-32 58548.87 60281.92 1.03x BenchmarkReconstruct5x2x1M-32 39499.23 41791.80 1.06x BenchmarkReconstruct10x4x1M-32 41448.60 43053.15 1.04x BenchmarkReconstruct50x20x1M-32 17185.99 17354.67 1.01x BenchmarkReconstruct10x4x16M-32 18798.60 18847.43 1.00x BenchmarkReconstructData10x2x10000-32 27208.48 27538.38 1.01x BenchmarkReconstructData50x5x50000-32 32135.65 32078.91 1.00x BenchmarkReconstructData10x2x1M-32 63180.19 67332.17 1.07x BenchmarkReconstructData5x2x1M-32 47532.85 49932.17 1.05x BenchmarkReconstructData10x4x1M-32 50059.14 52323.15 1.05x BenchmarkReconstructData50x20x1M-32 26679.75 26714.11 1.00x BenchmarkReconstructData10x4x16M-32 24854.99 24527.23 0.99x BenchmarkReconstructP10x2x10000-32 115089.87 113229.75 0.98x BenchmarkReconstructP10x5x20000-32 129838.75 132871.10 1.02x BenchmarkParallel_8x8x64K-32 69951.43 69980.44 1.00x BenchmarkParallel_8x8x05M-32 11752.94 11724.35 1.00x BenchmarkParallel_20x10x05M-32 18553.93 18613.33 1.00x BenchmarkParallel_8x8x1M-32 11639.19 11746.86 1.01x BenchmarkParallel_8x8x8M-32 11799.36 11685.63 0.99x BenchmarkParallel_8x8x32M-32 11510.94 11791.72 1.02x BenchmarkParallel_8x3x1M-32 20268.92 20678.21 1.02x BenchmarkParallel_8x4x1M-32 17616.05 17856.17 1.01x BenchmarkParallel_8x5x1M-32 15590.87 15872.42 1.02x BenchmarkStreamEncode10x2x10000-32 14917.08 15408.39 1.03x BenchmarkStreamEncode100x20x10000-32 2014.81 2077.31 1.03x BenchmarkStreamEncode17x3x1M-32 11839.37 12434.80 1.05x BenchmarkStreamEncode10x4x16M-32 9151.14 9206.98 1.01x BenchmarkStreamEncode5x2x1M-32 13598.55 13663.56 1.00x BenchmarkStreamEncode10x2x1M-32 13192.91 13453.41 1.02x BenchmarkStreamEncode10x4x1M-32 12109.90 12050.68 1.00x BenchmarkStreamEncode50x20x1M-32 8640.73 8370.10 0.97x BenchmarkStreamEncode17x3x16M-32 10473.17 10527.04 1.01x BenchmarkStreamVerify10x2x10000-32 7032.23 7128.82 1.01x BenchmarkStreamVerify50x5x50000-32 13023.46 13109.31 1.01x BenchmarkStreamVerify10x2x1M-32 11941.63 11949.91 1.00x BenchmarkStreamVerify5x2x1M-32 8029.93 8263.39 1.03x BenchmarkStreamVerify10x4x1M-32 8137.82 8271.11 1.02x BenchmarkStreamVerify50x20x1M-32 7378.87 7708.81 1.04x BenchmarkStreamVerify10x4x16M-32 8973.18 8955.29 1.00x ```
2020-11-10 16:39:23 +03:00
if useAvx2 {
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
avx2Matrix = genAvx2Matrix(matrixRows, len(inputs), 0, len(outputs), r.mPool.Get().([]byte))
Faster AVX2 encoding (#153) * Remove 50% of bounds checks when copying. * Use RIP only addressing, free one register. ``` benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 57663.49 58005.87 1.01x BenchmarkGalois1M-32 49479.31 49848.29 1.01x BenchmarkGaloisXor128K-32 46310.69 46501.88 1.00x BenchmarkGaloisXor1M-32 43804.86 43984.39 1.00x BenchmarkEncode10x2x10000-32 25926.93 27457.75 1.06x BenchmarkEncode100x20x10000-32 2635.82 2818.95 1.07x BenchmarkEncode17x3x1M-32 63215.11 61576.76 0.97x BenchmarkEncode10x4x16M-32 19551.54 19505.07 1.00x BenchmarkEncode5x2x1M-32 79612.06 81985.14 1.03x BenchmarkEncode10x2x1M-32 121478.29 127739.41 1.05x BenchmarkEncode10x4x1M-32 70757.61 74423.67 1.05x BenchmarkEncode50x20x1M-32 19811.96 20103.32 1.01x BenchmarkEncode17x3x16M-32 27202.10 27825.34 1.02x BenchmarkEncode_8x4x8M-32 19029.04 19701.31 1.04x BenchmarkEncode_12x4x12M-32 22449.87 22480.51 1.00x BenchmarkEncode_16x4x16M-32 24536.74 24672.24 1.01x BenchmarkEncode_16x4x32M-32 24381.34 24981.99 1.02x BenchmarkEncode_16x4x64M-32 24717.69 25086.94 1.01x BenchmarkEncode_8x5x8M-32 16763.51 17154.04 1.02x BenchmarkEncode_8x6x8M-32 15067.22 15205.87 1.01x BenchmarkEncode_8x7x8M-32 13156.38 13589.40 1.03x BenchmarkEncode_8x9x8M-32 11363.74 11523.70 1.01x BenchmarkEncode_8x10x8M-32 10359.37 10474.91 1.01x BenchmarkEncode_8x11x8M-32 9627.07 9463.24 0.98x BenchmarkEncode_8x8x05M-32 30104.80 32634.89 1.08x BenchmarkEncode_8x8x1M-32 36497.28 36425.88 1.00x BenchmarkEncode_8x8x8M-32 12186.19 11602.41 0.95x BenchmarkEncode_8x8x32M-32 11670.72 11413.71 0.98x BenchmarkEncode_24x8x24M-32 21709.83 21652.50 1.00x BenchmarkEncode_24x8x48M-32 22494.40 22280.59 0.99x BenchmarkVerify10x2x10000-32 10567.56 10483.91 0.99x BenchmarkVerify50x5x50000-32 28102.84 27923.63 0.99x BenchmarkVerify10x2x1M-32 30298.33 30106.18 0.99x BenchmarkVerify5x2x1M-32 16115.91 15847.03 0.98x BenchmarkVerify10x4x1M-32 15382.13 14852.68 0.97x BenchmarkVerify50x20x1M-32 8476.02 8466.24 1.00x BenchmarkVerify10x4x16M-32 15101.03 15434.71 1.02x BenchmarkReconstruct10x2x10000-32 26228.18 26960.19 1.03x BenchmarkReconstruct50x5x50000-32 31091.42 30975.82 1.00x BenchmarkReconstruct10x2x1M-32 58548.87 60281.92 1.03x BenchmarkReconstruct5x2x1M-32 39499.23 41791.80 1.06x BenchmarkReconstruct10x4x1M-32 41448.60 43053.15 1.04x BenchmarkReconstruct50x20x1M-32 17185.99 17354.67 1.01x BenchmarkReconstruct10x4x16M-32 18798.60 18847.43 1.00x BenchmarkReconstructData10x2x10000-32 27208.48 27538.38 1.01x BenchmarkReconstructData50x5x50000-32 32135.65 32078.91 1.00x BenchmarkReconstructData10x2x1M-32 63180.19 67332.17 1.07x BenchmarkReconstructData5x2x1M-32 47532.85 49932.17 1.05x BenchmarkReconstructData10x4x1M-32 50059.14 52323.15 1.05x BenchmarkReconstructData50x20x1M-32 26679.75 26714.11 1.00x BenchmarkReconstructData10x4x16M-32 24854.99 24527.23 0.99x BenchmarkReconstructP10x2x10000-32 115089.87 113229.75 0.98x BenchmarkReconstructP10x5x20000-32 129838.75 132871.10 1.02x BenchmarkParallel_8x8x64K-32 69951.43 69980.44 1.00x BenchmarkParallel_8x8x05M-32 11752.94 11724.35 1.00x BenchmarkParallel_20x10x05M-32 18553.93 18613.33 1.00x BenchmarkParallel_8x8x1M-32 11639.19 11746.86 1.01x BenchmarkParallel_8x8x8M-32 11799.36 11685.63 0.99x BenchmarkParallel_8x8x32M-32 11510.94 11791.72 1.02x BenchmarkParallel_8x3x1M-32 20268.92 20678.21 1.02x BenchmarkParallel_8x4x1M-32 17616.05 17856.17 1.01x BenchmarkParallel_8x5x1M-32 15590.87 15872.42 1.02x BenchmarkStreamEncode10x2x10000-32 14917.08 15408.39 1.03x BenchmarkStreamEncode100x20x10000-32 2014.81 2077.31 1.03x BenchmarkStreamEncode17x3x1M-32 11839.37 12434.80 1.05x BenchmarkStreamEncode10x4x16M-32 9151.14 9206.98 1.01x BenchmarkStreamEncode5x2x1M-32 13598.55 13663.56 1.00x BenchmarkStreamEncode10x2x1M-32 13192.91 13453.41 1.02x BenchmarkStreamEncode10x4x1M-32 12109.90 12050.68 1.00x BenchmarkStreamEncode50x20x1M-32 8640.73 8370.10 0.97x BenchmarkStreamEncode17x3x16M-32 10473.17 10527.04 1.01x BenchmarkStreamVerify10x2x10000-32 7032.23 7128.82 1.01x BenchmarkStreamVerify50x5x50000-32 13023.46 13109.31 1.01x BenchmarkStreamVerify10x2x1M-32 11941.63 11949.91 1.00x BenchmarkStreamVerify5x2x1M-32 8029.93 8263.39 1.03x BenchmarkStreamVerify10x4x1M-32 8137.82 8271.11 1.02x BenchmarkStreamVerify50x20x1M-32 7378.87 7708.81 1.04x BenchmarkStreamVerify10x4x16M-32 8973.18 8955.29 1.00x ```
2020-11-10 16:39:23 +03:00
defer r.mPool.Put(avx2Matrix)
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
} else if byteCount < 10<<20 && len(inputs)+len(outputs) > avx2CodeGenMinShards &&
r.canAVX2C(byteCount/4, maxAvx2Inputs, maxAvx2Outputs) {
// It appears there is a switchover point at around 10MB where
// Regular processing is faster...
r.codeSomeShardsAVXP(matrixRows, inputs, outputs, byteCount)
return
Faster AVX2 encoding (#153) * Remove 50% of bounds checks when copying. * Use RIP only addressing, free one register. ``` benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 57663.49 58005.87 1.01x BenchmarkGalois1M-32 49479.31 49848.29 1.01x BenchmarkGaloisXor128K-32 46310.69 46501.88 1.00x BenchmarkGaloisXor1M-32 43804.86 43984.39 1.00x BenchmarkEncode10x2x10000-32 25926.93 27457.75 1.06x BenchmarkEncode100x20x10000-32 2635.82 2818.95 1.07x BenchmarkEncode17x3x1M-32 63215.11 61576.76 0.97x BenchmarkEncode10x4x16M-32 19551.54 19505.07 1.00x BenchmarkEncode5x2x1M-32 79612.06 81985.14 1.03x BenchmarkEncode10x2x1M-32 121478.29 127739.41 1.05x BenchmarkEncode10x4x1M-32 70757.61 74423.67 1.05x BenchmarkEncode50x20x1M-32 19811.96 20103.32 1.01x BenchmarkEncode17x3x16M-32 27202.10 27825.34 1.02x BenchmarkEncode_8x4x8M-32 19029.04 19701.31 1.04x BenchmarkEncode_12x4x12M-32 22449.87 22480.51 1.00x BenchmarkEncode_16x4x16M-32 24536.74 24672.24 1.01x BenchmarkEncode_16x4x32M-32 24381.34 24981.99 1.02x BenchmarkEncode_16x4x64M-32 24717.69 25086.94 1.01x BenchmarkEncode_8x5x8M-32 16763.51 17154.04 1.02x BenchmarkEncode_8x6x8M-32 15067.22 15205.87 1.01x BenchmarkEncode_8x7x8M-32 13156.38 13589.40 1.03x BenchmarkEncode_8x9x8M-32 11363.74 11523.70 1.01x BenchmarkEncode_8x10x8M-32 10359.37 10474.91 1.01x BenchmarkEncode_8x11x8M-32 9627.07 9463.24 0.98x BenchmarkEncode_8x8x05M-32 30104.80 32634.89 1.08x BenchmarkEncode_8x8x1M-32 36497.28 36425.88 1.00x BenchmarkEncode_8x8x8M-32 12186.19 11602.41 0.95x BenchmarkEncode_8x8x32M-32 11670.72 11413.71 0.98x BenchmarkEncode_24x8x24M-32 21709.83 21652.50 1.00x BenchmarkEncode_24x8x48M-32 22494.40 22280.59 0.99x BenchmarkVerify10x2x10000-32 10567.56 10483.91 0.99x BenchmarkVerify50x5x50000-32 28102.84 27923.63 0.99x BenchmarkVerify10x2x1M-32 30298.33 30106.18 0.99x BenchmarkVerify5x2x1M-32 16115.91 15847.03 0.98x BenchmarkVerify10x4x1M-32 15382.13 14852.68 0.97x BenchmarkVerify50x20x1M-32 8476.02 8466.24 1.00x BenchmarkVerify10x4x16M-32 15101.03 15434.71 1.02x BenchmarkReconstruct10x2x10000-32 26228.18 26960.19 1.03x BenchmarkReconstruct50x5x50000-32 31091.42 30975.82 1.00x BenchmarkReconstruct10x2x1M-32 58548.87 60281.92 1.03x BenchmarkReconstruct5x2x1M-32 39499.23 41791.80 1.06x BenchmarkReconstruct10x4x1M-32 41448.60 43053.15 1.04x BenchmarkReconstruct50x20x1M-32 17185.99 17354.67 1.01x BenchmarkReconstruct10x4x16M-32 18798.60 18847.43 1.00x BenchmarkReconstructData10x2x10000-32 27208.48 27538.38 1.01x BenchmarkReconstructData50x5x50000-32 32135.65 32078.91 1.00x BenchmarkReconstructData10x2x1M-32 63180.19 67332.17 1.07x BenchmarkReconstructData5x2x1M-32 47532.85 49932.17 1.05x BenchmarkReconstructData10x4x1M-32 50059.14 52323.15 1.05x BenchmarkReconstructData50x20x1M-32 26679.75 26714.11 1.00x BenchmarkReconstructData10x4x16M-32 24854.99 24527.23 0.99x BenchmarkReconstructP10x2x10000-32 115089.87 113229.75 0.98x BenchmarkReconstructP10x5x20000-32 129838.75 132871.10 1.02x BenchmarkParallel_8x8x64K-32 69951.43 69980.44 1.00x BenchmarkParallel_8x8x05M-32 11752.94 11724.35 1.00x BenchmarkParallel_20x10x05M-32 18553.93 18613.33 1.00x BenchmarkParallel_8x8x1M-32 11639.19 11746.86 1.01x BenchmarkParallel_8x8x8M-32 11799.36 11685.63 0.99x BenchmarkParallel_8x8x32M-32 11510.94 11791.72 1.02x BenchmarkParallel_8x3x1M-32 20268.92 20678.21 1.02x BenchmarkParallel_8x4x1M-32 17616.05 17856.17 1.01x BenchmarkParallel_8x5x1M-32 15590.87 15872.42 1.02x BenchmarkStreamEncode10x2x10000-32 14917.08 15408.39 1.03x BenchmarkStreamEncode100x20x10000-32 2014.81 2077.31 1.03x BenchmarkStreamEncode17x3x1M-32 11839.37 12434.80 1.05x BenchmarkStreamEncode10x4x16M-32 9151.14 9206.98 1.01x BenchmarkStreamEncode5x2x1M-32 13598.55 13663.56 1.00x BenchmarkStreamEncode10x2x1M-32 13192.91 13453.41 1.02x BenchmarkStreamEncode10x4x1M-32 12109.90 12050.68 1.00x BenchmarkStreamEncode50x20x1M-32 8640.73 8370.10 0.97x BenchmarkStreamEncode17x3x16M-32 10473.17 10527.04 1.01x BenchmarkStreamVerify10x2x10000-32 7032.23 7128.82 1.01x BenchmarkStreamVerify50x5x50000-32 13023.46 13109.31 1.01x BenchmarkStreamVerify10x2x1M-32 11941.63 11949.91 1.00x BenchmarkStreamVerify5x2x1M-32 8029.93 8263.39 1.03x BenchmarkStreamVerify10x4x1M-32 8137.82 8271.11 1.02x BenchmarkStreamVerify50x20x1M-32 7378.87 7708.81 1.04x BenchmarkStreamVerify10x4x16M-32 8973.18 8955.29 1.00x ```
2020-11-10 16:39:23 +03:00
}
do := byteCount / gor
if do < r.o.minSplitSize {
do = r.o.minSplitSize
Optimized encoding & decoding goroutines number hardware: E5-2630 v2 (Intel x86-64 with ssse3) software: linux, go1.6, GOMAXPROCS=2 Performances before after change BenchmarkEncode10x2x10000-2 2884.95 MB/s 2837.93 MB/s 0.98x BenchmarkEncode100x20x10000-2 593.93 MB/s 577.17 MB/s 0.97x BenchmarkEncode17x3x1M-2 2903.74 MB/s 5197.99 MB/s 1.80x BenchmarkEncode10x4x16M-2 1992.13 MB/s 3689.69 MB/s 1.85x BenchmarkEncode5x2x1M-2 2883.78 MB/s 7506.19 MB/s 2.60x BenchmarkEncode10x2x1M-2 3205.63 MB/s 7848.12 MB/s 2.45x BenchmarkEncode10x4x1M-2 2218.35 MB/s 3998.35 MB/s 1.80x BenchmarkEncode50x20x1M-2 579.24 MB/s 641.08 MB/s 1.11x BenchmarkEncode17x3x16M-2 2652.36 MB/s 4775.41 MB/s 1.80x BenchmarkVerify10x2x10000-2 1327.27 MB/s 1837.41 MB/s 1.38x BenchmarkVerify50x5x50000-2 1481.89 MB/s 2684.57 MB/s 1.81x BenchmarkVerify10x2x1M-2 1553.91 MB/s 5704.71 MB/s 3.67x BenchmarkVerify5x2x1M-2 939.90 MB/s 4949.30 MB/s 5.26x BenchmarkVerify10x4x1M-2 956.89 MB/s 3191.01 MB/s 3.33x BenchmarkVerify50x20x1M-2 490.49 MB/s 823.46 MB/s 1.68x BenchmarkVerify10x4x16M-2 1078.03 MB/s 3196.97 MB/s 2.97x BenchmarkStreamEncode10x2x10000-2 2.40 MB/s 12.10 MB/s 5.04x BenchmarkStreamEncode100x20x10000-2 6.72 MB/s 10.72 MB/s 1.60x BenchmarkStreamEncode17x3x1M-2 390.75 MB/s 845.08 MB/s 2.16x BenchmarkStreamEncode10x4x16M-2 1175.93 MB/s 1803.71 MB/s 1.53x BenchmarkStreamEncode5x2x1M-2 207.85 MB/s 790.02 MB/s 3.80x BenchmarkStreamEncode10x2x1M-2 296.77 MB/s 872.41 MB/s 2.94x BenchmarkStreamEncode10x4x1M-2 264.43 MB/s 699.25 MB/s 2.64x BenchmarkStreamEncode50x20x1M-2 284.93 MB/s 414.65 MB/s 1.46x BenchmarkStreamEncode17x3x16M-2 1439.13 MB/s 1933.42 MB/s 1.34x BenchmarkStreamVerify10x2x10000-2 2.33 MB/s 12.07 MB/s 5.18x BenchmarkStreamVerify50x5x50000-2 86.53 MB/s 136.02 MB/s 1.57x BenchmarkStreamVerify10x2x1M-2 315.65 MB/s 909.44 MB/s 2.88x BenchmarkStreamVerify5x2x1M-2 180.45 MB/s 772.42 MB/s 4.28x BenchmarkStreamVerify10x4x1M-2 310.35 MB/s 779.26 MB/s 2.51x BenchmarkStreamVerify50x20x1M-2 547.23 MB/s 773.74 MB/s 1.41x BenchmarkStreamVerify10x4x16M-2 4128.01 MB/s 6606.43 MB/s 1.60x
2016-04-11 12:02:54 +03:00
}
Faster AVX2 encoding (#153) * Remove 50% of bounds checks when copying. * Use RIP only addressing, free one register. ``` benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 57663.49 58005.87 1.01x BenchmarkGalois1M-32 49479.31 49848.29 1.01x BenchmarkGaloisXor128K-32 46310.69 46501.88 1.00x BenchmarkGaloisXor1M-32 43804.86 43984.39 1.00x BenchmarkEncode10x2x10000-32 25926.93 27457.75 1.06x BenchmarkEncode100x20x10000-32 2635.82 2818.95 1.07x BenchmarkEncode17x3x1M-32 63215.11 61576.76 0.97x BenchmarkEncode10x4x16M-32 19551.54 19505.07 1.00x BenchmarkEncode5x2x1M-32 79612.06 81985.14 1.03x BenchmarkEncode10x2x1M-32 121478.29 127739.41 1.05x BenchmarkEncode10x4x1M-32 70757.61 74423.67 1.05x BenchmarkEncode50x20x1M-32 19811.96 20103.32 1.01x BenchmarkEncode17x3x16M-32 27202.10 27825.34 1.02x BenchmarkEncode_8x4x8M-32 19029.04 19701.31 1.04x BenchmarkEncode_12x4x12M-32 22449.87 22480.51 1.00x BenchmarkEncode_16x4x16M-32 24536.74 24672.24 1.01x BenchmarkEncode_16x4x32M-32 24381.34 24981.99 1.02x BenchmarkEncode_16x4x64M-32 24717.69 25086.94 1.01x BenchmarkEncode_8x5x8M-32 16763.51 17154.04 1.02x BenchmarkEncode_8x6x8M-32 15067.22 15205.87 1.01x BenchmarkEncode_8x7x8M-32 13156.38 13589.40 1.03x BenchmarkEncode_8x9x8M-32 11363.74 11523.70 1.01x BenchmarkEncode_8x10x8M-32 10359.37 10474.91 1.01x BenchmarkEncode_8x11x8M-32 9627.07 9463.24 0.98x BenchmarkEncode_8x8x05M-32 30104.80 32634.89 1.08x BenchmarkEncode_8x8x1M-32 36497.28 36425.88 1.00x BenchmarkEncode_8x8x8M-32 12186.19 11602.41 0.95x BenchmarkEncode_8x8x32M-32 11670.72 11413.71 0.98x BenchmarkEncode_24x8x24M-32 21709.83 21652.50 1.00x BenchmarkEncode_24x8x48M-32 22494.40 22280.59 0.99x BenchmarkVerify10x2x10000-32 10567.56 10483.91 0.99x BenchmarkVerify50x5x50000-32 28102.84 27923.63 0.99x BenchmarkVerify10x2x1M-32 30298.33 30106.18 0.99x BenchmarkVerify5x2x1M-32 16115.91 15847.03 0.98x BenchmarkVerify10x4x1M-32 15382.13 14852.68 0.97x BenchmarkVerify50x20x1M-32 8476.02 8466.24 1.00x BenchmarkVerify10x4x16M-32 15101.03 15434.71 1.02x BenchmarkReconstruct10x2x10000-32 26228.18 26960.19 1.03x BenchmarkReconstruct50x5x50000-32 31091.42 30975.82 1.00x BenchmarkReconstruct10x2x1M-32 58548.87 60281.92 1.03x BenchmarkReconstruct5x2x1M-32 39499.23 41791.80 1.06x BenchmarkReconstruct10x4x1M-32 41448.60 43053.15 1.04x BenchmarkReconstruct50x20x1M-32 17185.99 17354.67 1.01x BenchmarkReconstruct10x4x16M-32 18798.60 18847.43 1.00x BenchmarkReconstructData10x2x10000-32 27208.48 27538.38 1.01x BenchmarkReconstructData50x5x50000-32 32135.65 32078.91 1.00x BenchmarkReconstructData10x2x1M-32 63180.19 67332.17 1.07x BenchmarkReconstructData5x2x1M-32 47532.85 49932.17 1.05x BenchmarkReconstructData10x4x1M-32 50059.14 52323.15 1.05x BenchmarkReconstructData50x20x1M-32 26679.75 26714.11 1.00x BenchmarkReconstructData10x4x16M-32 24854.99 24527.23 0.99x BenchmarkReconstructP10x2x10000-32 115089.87 113229.75 0.98x BenchmarkReconstructP10x5x20000-32 129838.75 132871.10 1.02x BenchmarkParallel_8x8x64K-32 69951.43 69980.44 1.00x BenchmarkParallel_8x8x05M-32 11752.94 11724.35 1.00x BenchmarkParallel_20x10x05M-32 18553.93 18613.33 1.00x BenchmarkParallel_8x8x1M-32 11639.19 11746.86 1.01x BenchmarkParallel_8x8x8M-32 11799.36 11685.63 0.99x BenchmarkParallel_8x8x32M-32 11510.94 11791.72 1.02x BenchmarkParallel_8x3x1M-32 20268.92 20678.21 1.02x BenchmarkParallel_8x4x1M-32 17616.05 17856.17 1.01x BenchmarkParallel_8x5x1M-32 15590.87 15872.42 1.02x BenchmarkStreamEncode10x2x10000-32 14917.08 15408.39 1.03x BenchmarkStreamEncode100x20x10000-32 2014.81 2077.31 1.03x BenchmarkStreamEncode17x3x1M-32 11839.37 12434.80 1.05x BenchmarkStreamEncode10x4x16M-32 9151.14 9206.98 1.01x BenchmarkStreamEncode5x2x1M-32 13598.55 13663.56 1.00x BenchmarkStreamEncode10x2x1M-32 13192.91 13453.41 1.02x BenchmarkStreamEncode10x4x1M-32 12109.90 12050.68 1.00x BenchmarkStreamEncode50x20x1M-32 8640.73 8370.10 0.97x BenchmarkStreamEncode17x3x16M-32 10473.17 10527.04 1.01x BenchmarkStreamVerify10x2x10000-32 7032.23 7128.82 1.01x BenchmarkStreamVerify50x5x50000-32 13023.46 13109.31 1.01x BenchmarkStreamVerify10x2x1M-32 11941.63 11949.91 1.00x BenchmarkStreamVerify5x2x1M-32 8029.93 8263.39 1.03x BenchmarkStreamVerify10x4x1M-32 8137.82 8271.11 1.02x BenchmarkStreamVerify50x20x1M-32 7378.87 7708.81 1.04x BenchmarkStreamVerify10x4x16M-32 8973.18 8955.29 1.00x ```
2020-11-10 16:39:23 +03:00
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
exec := func(start, stop int) {
if useAvx2 && stop-start >= 64 {
start += galMulSlicesAvx2(avx2Matrix, inputs, outputs, start, stop)
}
lstart, lstop := start, start+r.o.perRound
if lstop > stop {
lstop = stop
}
for lstart < stop {
for c := 0; c < len(inputs); c++ {
in := inputs[c][lstart:lstop]
for iRow := 0; iRow < len(outputs); iRow++ {
if c == 0 {
galMulSlice(matrixRows[iRow][c], in, outputs[iRow][lstart:lstop], &r.o)
} else {
galMulSliceXor(matrixRows[iRow][c], in, outputs[iRow][lstart:lstop], &r.o)
}
}
}
lstart = lstop
lstop += r.o.perRound
if lstop > stop {
lstop = stop
}
}
wg.Done()
}
if gor <= 1 {
wg.Add(1)
exec(0, byteCount)
return
}
avx2: Add 64 bytes per loop processing (#128) * avx2: Add 64 bytes per loop processing Not super clean benchmark run, but `BenchmarkGalois` is consistently faster. ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2551 2261 -11.37% BenchmarkGalois1M-32 22492 21107 -6.16% BenchmarkGaloisXor128K-32 2972 2808 -5.52% BenchmarkGaloisXor1M-32 25181 23951 -4.88% BenchmarkEncode10x2x10000-32 5081 4722 -7.07% BenchmarkEncode100x20x10000-32 383800 346655 -9.68% BenchmarkEncode17x3x1M-32 264806 263191 -0.61% BenchmarkEncode10x4x16M-32 8337857 8376910 +0.47% BenchmarkEncode5x2x1M-32 77119 73598 -4.57% BenchmarkEncode10x2x1M-32 108424 102423 -5.53% BenchmarkEncode10x4x1M-32 194427 184301 -5.21% BenchmarkEncode50x20x1M-32 3870301 3747639 -3.17% BenchmarkEncode17x3x16M-32 10617586 10602449 -0.14% BenchmarkEncode_8x4x8M-32 3227254 3229451 +0.07% BenchmarkEncode_12x4x12M-32 6841898 6847261 +0.08% BenchmarkEncode_16x4x16M-32 11153469 11048738 -0.94% BenchmarkEncode_16x4x32M-32 21947506 21826647 -0.55% BenchmarkEncode_16x4x64M-32 43163608 42971338 -0.45% BenchmarkEncode_8x5x8M-32 3856675 3780730 -1.97% BenchmarkEncode_8x6x8M-32 4322023 4437109 +2.66% BenchmarkEncode_8x7x8M-32 5011434 4959623 -1.03% BenchmarkEncode_8x9x8M-32 6243694 6098824 -2.32% BenchmarkEncode_8x10x8M-32 6724456 6657099 -1.00% BenchmarkEncode_8x11x8M-32 7207693 7340332 +1.84% BenchmarkEncode_8x8x05M-32 176877 172183 -2.65% BenchmarkEncode_8x8x1M-32 309716 301743 -2.57% BenchmarkEncode_8x8x8M-32 5498952 5489078 -0.18% BenchmarkEncode_8x8x32M-32 22630195 22557074 -0.32% BenchmarkEncode_24x8x24M-32 28488886 28220702 -0.94% BenchmarkEncode_24x8x48M-32 56124735 54862495 -2.25% BenchmarkVerify10x2x10000-32 9874 9356 -5.25% BenchmarkVerify50x5x50000-32 175610 159735 -9.04% BenchmarkVerify10x2x1M-32 331276 311726 -5.90% BenchmarkVerify5x2x1M-32 265466 248075 -6.55% BenchmarkVerify10x4x1M-32 701627 606420 -13.57% BenchmarkVerify50x20x1M-32 4338171 4245635 -2.13% BenchmarkVerify10x4x16M-32 12312830 11932698 -3.09% BenchmarkReconstruct10x2x10000-32 1594 1504 -5.65% BenchmarkReconstruct50x5x50000-32 95101 79558 -16.34% BenchmarkReconstruct10x2x1M-32 38479 37225 -3.26% BenchmarkReconstruct5x2x1M-32 30968 30013 -3.08% BenchmarkReconstruct10x4x1M-32 81630 75350 -7.69% BenchmarkReconstruct50x20x1M-32 1136952 1040156 -8.51% BenchmarkReconstruct10x4x16M-32 685408 656484 -4.22% BenchmarkReconstructData10x2x10000-32 1609 1486 -7.64% BenchmarkReconstructData50x5x50000-32 87090 71512 -17.89% BenchmarkReconstructData10x2x1M-32 31497 30347 -3.65% BenchmarkReconstructData5x2x1M-32 23379 22611 -3.28% BenchmarkReconstructData10x4x1M-32 63853 61035 -4.41% BenchmarkReconstructData50x20x1M-32 1048807 966201 -7.88% BenchmarkReconstructData10x4x16M-32 866658 892252 +2.95% BenchmarkReconstructP10x2x10000-32 544 540 -0.74% BenchmarkReconstructP10x5x20000-32 1242 1206 -2.90% BenchmarkSplit10x4x160M-32 2735508 2743214 +0.28% BenchmarkSplit5x2x5M-32 276232 288523 +4.45% BenchmarkSplit10x2x1M-32 44389 45517 +2.54% BenchmarkSplit10x4x10M-32 477282 460888 -3.43% BenchmarkSplit50x20x50M-32 1608821 1602105 -0.42% BenchmarkSplit17x3x272M-32 2035932 2034705 -0.06% BenchmarkParallel_8x8x05M-32 346733 351837 +1.47% BenchmarkParallel_20x10x05M-32 577127 586232 +1.58% BenchmarkParallel_8x8x1M-32 722453 729294 +0.95% BenchmarkParallel_8x8x8M-32 5717650 5817130 +1.74% BenchmarkParallel_8x8x32M-32 22914260 24132696 +5.32% BenchmarkStreamEncode10x2x10000-32 6703131 7141021 +6.53% BenchmarkStreamEncode100x20x10000-32 38175873 39767386 +4.17% BenchmarkStreamEncode17x3x1M-32 8920549 9218973 +3.35% BenchmarkStreamEncode10x4x16M-32 21841702 21784898 -0.26% BenchmarkStreamEncode5x2x1M-32 4088001 3247404 -20.56% BenchmarkStreamEncode10x2x1M-32 5860652 5932381 +1.22% BenchmarkStreamEncode10x4x1M-32 7555172 7589960 +0.46% BenchmarkStreamEncode50x20x1M-32 30006814 30250054 +0.81% BenchmarkStreamEncode17x3x16M-32 32757489 32818254 +0.19% BenchmarkStreamVerify10x2x10000-32 6714996 6831093 +1.73% BenchmarkStreamVerify50x5x50000-32 18525904 18761767 +1.27% BenchmarkStreamVerify10x2x1M-32 5232278 5444148 +4.05% BenchmarkStreamVerify5x2x1M-32 3673843 3755283 +2.22% BenchmarkStreamVerify10x4x1M-32 7184419 7185293 +0.01% BenchmarkStreamVerify50x20x1M-32 28441187 28574766 +0.47% BenchmarkStreamVerify10x4x16M-32 8538440 8668614 +1.52% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 51374.59 57976.36 1.13x BenchmarkGalois1M-32 46620.03 49679.10 1.07x BenchmarkGaloisXor128K-32 44106.22 46671.56 1.06x BenchmarkGaloisXor1M-32 41641.82 43779.89 1.05x BenchmarkEncode10x2x10000-32 19682.61 21176.81 1.08x BenchmarkEncode100x20x10000-32 2605.52 2884.71 1.11x BenchmarkEncode17x3x1M-32 67316.54 67729.50 1.01x BenchmarkEncode10x4x16M-32 20121.74 20027.93 1.00x BenchmarkEncode5x2x1M-32 67984.17 71236.47 1.05x BenchmarkEncode10x2x1M-32 96710.29 102377.00 1.06x BenchmarkEncode10x4x1M-32 53931.74 56894.82 1.05x BenchmarkEncode50x20x1M-32 13546.44 13989.82 1.03x BenchmarkEncode17x3x16M-32 26862.29 26900.64 1.00x BenchmarkEncode_8x4x8M-32 20794.42 20780.27 1.00x BenchmarkEncode_12x4x12M-32 22069.16 22051.88 1.00x BenchmarkEncode_16x4x16M-32 24067.44 24295.58 1.01x BenchmarkEncode_16x4x32M-32 24461.59 24597.04 1.01x BenchmarkEncode_16x4x64M-32 24876.09 24987.40 1.00x BenchmarkEncode_8x5x8M-32 17400.71 17750.24 1.02x BenchmarkEncode_8x6x8M-32 15527.19 15124.46 0.97x BenchmarkEncode_8x7x8M-32 13391.15 13531.04 1.01x BenchmarkEncode_8x9x8M-32 10748.26 11003.58 1.02x BenchmarkEncode_8x10x8M-32 9979.82 10080.80 1.01x BenchmarkEncode_8x11x8M-32 9310.73 9142.48 0.98x BenchmarkEncode_8x8x05M-32 23713.12 24359.50 1.03x BenchmarkEncode_8x8x1M-32 27084.87 27800.50 1.03x BenchmarkEncode_8x8x8M-32 12203.94 12225.89 1.00x BenchmarkEncode_8x8x32M-32 11861.83 11900.28 1.00x BenchmarkEncode_24x8x24M-32 21200.54 21402.01 1.01x BenchmarkEncode_24x8x48M-32 21522.77 22017.95 1.02x BenchmarkVerify10x2x10000-32 10127.24 10688.01 1.06x BenchmarkVerify50x5x50000-32 28472.25 31301.75 1.10x BenchmarkVerify10x2x1M-32 31652.63 33637.74 1.06x BenchmarkVerify5x2x1M-32 19749.74 21134.27 1.07x BenchmarkVerify10x4x1M-32 14944.92 17291.25 1.16x BenchmarkVerify50x20x1M-32 12085.46 12348.87 1.02x BenchmarkVerify10x4x16M-32 13625.80 14059.87 1.03x BenchmarkReconstruct10x2x10000-32 62723.68 66470.81 1.06x BenchmarkReconstruct50x5x50000-32 52575.87 62847.32 1.20x BenchmarkReconstruct10x2x1M-32 272507.04 281685.84 1.03x BenchmarkReconstruct5x2x1M-32 169299.03 174685.39 1.03x BenchmarkReconstruct10x4x1M-32 128455.17 139161.42 1.08x BenchmarkReconstruct50x20x1M-32 46113.48 50404.73 1.09x BenchmarkReconstruct10x4x16M-32 244777.11 255561.72 1.04x BenchmarkReconstructData10x2x10000-32 62160.46 67305.98 1.08x BenchmarkReconstructData50x5x50000-32 57411.81 69917.97 1.22x BenchmarkReconstructData10x2x1M-32 332909.82 345526.29 1.04x BenchmarkReconstructData5x2x1M-32 224254.60 231868.74 1.03x BenchmarkReconstructData10x4x1M-32 164216.61 171799.68 1.05x BenchmarkReconstructData50x20x1M-32 49988.98 54262.82 1.09x BenchmarkReconstructData10x4x16M-32 193585.15 188032.29 0.97x BenchmarkReconstructP10x2x10000-32 183806.57 185284.57 1.01x BenchmarkReconstructP10x5x20000-32 160985.46 165852.51 1.03x BenchmarkParallel_8x8x05M-32 12096.63 11921.17 0.99x BenchmarkParallel_20x10x05M-32 18168.91 17886.72 0.98x BenchmarkParallel_8x8x1M-32 11611.28 11502.36 0.99x BenchmarkParallel_8x8x8M-32 11737.14 11536.42 0.98x BenchmarkParallel_8x8x32M-32 11714.78 11123.31 0.95x BenchmarkStreamEncode10x2x10000-32 14.92 14.00 0.94x BenchmarkStreamEncode100x20x10000-32 26.19 25.15 0.96x BenchmarkStreamEncode17x3x1M-32 1998.28 1933.60 0.97x BenchmarkStreamEncode10x4x16M-32 7681.28 7701.31 1.00x BenchmarkStreamEncode5x2x1M-32 1282.50 1614.48 1.26x BenchmarkStreamEncode10x2x1M-32 1789.18 1767.55 0.99x BenchmarkStreamEncode10x4x1M-32 1387.89 1381.53 1.00x BenchmarkStreamEncode50x20x1M-32 1747.23 1733.18 0.99x BenchmarkStreamEncode17x3x16M-32 8706.79 8690.67 1.00x BenchmarkStreamVerify10x2x10000-32 14.89 14.64 0.98x BenchmarkStreamVerify50x5x50000-32 269.89 266.50 0.99x BenchmarkStreamVerify10x2x1M-32 2004.05 1926.06 0.96x BenchmarkStreamVerify5x2x1M-32 1427.08 1396.13 0.98x BenchmarkStreamVerify10x4x1M-32 1459.51 1459.34 1.00x BenchmarkStreamVerify50x20x1M-32 1843.41 1834.79 1.00x BenchmarkStreamVerify10x4x16M-32 19649.04 19353.98 0.98x ```
2020-05-05 17:36:01 +03:00
// Make sizes divisible by 64
do = (do + 63) & (^63)
2015-06-19 17:31:24 +03:00
start := 0
Optimized encoding & decoding goroutines number hardware: E5-2630 v2 (Intel x86-64 with ssse3) software: linux, go1.6, GOMAXPROCS=2 Performances before after change BenchmarkEncode10x2x10000-2 2884.95 MB/s 2837.93 MB/s 0.98x BenchmarkEncode100x20x10000-2 593.93 MB/s 577.17 MB/s 0.97x BenchmarkEncode17x3x1M-2 2903.74 MB/s 5197.99 MB/s 1.80x BenchmarkEncode10x4x16M-2 1992.13 MB/s 3689.69 MB/s 1.85x BenchmarkEncode5x2x1M-2 2883.78 MB/s 7506.19 MB/s 2.60x BenchmarkEncode10x2x1M-2 3205.63 MB/s 7848.12 MB/s 2.45x BenchmarkEncode10x4x1M-2 2218.35 MB/s 3998.35 MB/s 1.80x BenchmarkEncode50x20x1M-2 579.24 MB/s 641.08 MB/s 1.11x BenchmarkEncode17x3x16M-2 2652.36 MB/s 4775.41 MB/s 1.80x BenchmarkVerify10x2x10000-2 1327.27 MB/s 1837.41 MB/s 1.38x BenchmarkVerify50x5x50000-2 1481.89 MB/s 2684.57 MB/s 1.81x BenchmarkVerify10x2x1M-2 1553.91 MB/s 5704.71 MB/s 3.67x BenchmarkVerify5x2x1M-2 939.90 MB/s 4949.30 MB/s 5.26x BenchmarkVerify10x4x1M-2 956.89 MB/s 3191.01 MB/s 3.33x BenchmarkVerify50x20x1M-2 490.49 MB/s 823.46 MB/s 1.68x BenchmarkVerify10x4x16M-2 1078.03 MB/s 3196.97 MB/s 2.97x BenchmarkStreamEncode10x2x10000-2 2.40 MB/s 12.10 MB/s 5.04x BenchmarkStreamEncode100x20x10000-2 6.72 MB/s 10.72 MB/s 1.60x BenchmarkStreamEncode17x3x1M-2 390.75 MB/s 845.08 MB/s 2.16x BenchmarkStreamEncode10x4x16M-2 1175.93 MB/s 1803.71 MB/s 1.53x BenchmarkStreamEncode5x2x1M-2 207.85 MB/s 790.02 MB/s 3.80x BenchmarkStreamEncode10x2x1M-2 296.77 MB/s 872.41 MB/s 2.94x BenchmarkStreamEncode10x4x1M-2 264.43 MB/s 699.25 MB/s 2.64x BenchmarkStreamEncode50x20x1M-2 284.93 MB/s 414.65 MB/s 1.46x BenchmarkStreamEncode17x3x16M-2 1439.13 MB/s 1933.42 MB/s 1.34x BenchmarkStreamVerify10x2x10000-2 2.33 MB/s 12.07 MB/s 5.18x BenchmarkStreamVerify50x5x50000-2 86.53 MB/s 136.02 MB/s 1.57x BenchmarkStreamVerify10x2x1M-2 315.65 MB/s 909.44 MB/s 2.88x BenchmarkStreamVerify5x2x1M-2 180.45 MB/s 772.42 MB/s 4.28x BenchmarkStreamVerify10x4x1M-2 310.35 MB/s 779.26 MB/s 2.51x BenchmarkStreamVerify50x20x1M-2 547.23 MB/s 773.74 MB/s 1.41x BenchmarkStreamVerify10x4x16M-2 4128.01 MB/s 6606.43 MB/s 1.60x
2016-04-11 12:02:54 +03:00
for start < byteCount {
if start+do > byteCount {
do = byteCount - start
2015-06-19 17:31:24 +03:00
}
2015-06-19 17:31:24 +03:00
wg.Add(1)
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
go exec(start, start+do)
start += do
}
wg.Wait()
}
// Perform the same as codeSomeShards, but split the workload into
// several goroutines.
func (r *reedSolomon) codeSomeShardsAVXP(matrixRows, inputs, outputs [][]byte, byteCount int) {
var wg sync.WaitGroup
gor := r.o.maxGoroutines
type state struct {
input [][]byte
output [][]byte
m []byte
first bool
}
// Make a plan...
plan := make([]state, 0, ((len(inputs)+maxAvx2Inputs-1)/maxAvx2Inputs)*((len(outputs)+maxAvx2Outputs-1)/maxAvx2Outputs))
tmp := r.mPool.Get().([]byte)
defer func(b []byte) {
r.mPool.Put(b)
}(tmp)
// Flips between input first to output first.
// We put the smallest data load in the inner loop.
if len(inputs) > len(outputs) {
inIdx := 0
ins := inputs
for len(ins) > 0 {
inPer := ins
if len(inPer) > maxAvx2Inputs {
inPer = inPer[:maxAvx2Inputs]
}
outs := outputs
outIdx := 0
for len(outs) > 0 {
outPer := outs
if len(outPer) > maxAvx2Outputs {
outPer = outPer[:maxAvx2Outputs]
}
// Generate local matrix
m := genAvx2Matrix(matrixRows[outIdx:], len(inPer), inIdx, len(outPer), tmp)
tmp = tmp[len(m):]
plan = append(plan, state{
input: inPer,
output: outPer,
m: m,
first: inIdx == 0,
})
outIdx += len(outPer)
outs = outs[len(outPer):]
}
inIdx += len(inPer)
ins = ins[len(inPer):]
}
} else {
outs := outputs
outIdx := 0
for len(outs) > 0 {
outPer := outs
if len(outPer) > maxAvx2Outputs {
outPer = outPer[:maxAvx2Outputs]
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
}
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
inIdx := 0
ins := inputs
for len(ins) > 0 {
inPer := ins
if len(inPer) > maxAvx2Inputs {
inPer = inPer[:maxAvx2Inputs]
}
// Generate local matrix
m := genAvx2Matrix(matrixRows[outIdx:], len(inPer), inIdx, len(outPer), tmp)
tmp = tmp[len(m):]
//fmt.Println("bytes:", len(inPer)*r.o.perRound, "out:", len(outPer)*r.o.perRound)
plan = append(plan, state{
input: inPer,
output: outPer,
m: m,
first: inIdx == 0,
})
inIdx += len(inPer)
ins = ins[len(inPer):]
}
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
outIdx += len(outPer)
outs = outs[len(outPer):]
}
}
do := byteCount / gor
if do < r.o.minSplitSize {
do = r.o.minSplitSize
}
exec := func(start, stop int) {
lstart, lstop := start, start+r.o.perRound
if lstop > stop {
lstop = stop
}
for lstart < stop {
if lstop-lstart >= minAvx2Size {
// Execute plan...
for _, p := range plan {
if p.first {
galMulSlicesAvx2(p.m, p.input, p.output, lstart, lstop)
} else {
galMulSlicesAvx2Xor(p.m, p.input, p.output, lstart, lstop)
2015-06-19 17:31:24 +03:00
}
}
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
lstart += (lstop - lstart) & avxSizeMask
if lstart == lstop {
lstop += r.o.perRound
if lstop > stop {
lstop = stop
}
continue
}
2015-06-19 17:31:24 +03:00
}
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
for c := range inputs {
in := inputs[c][lstart:lstop]
for iRow := 0; iRow < len(outputs); iRow++ {
if c == 0 {
galMulSlice(matrixRows[iRow][c], in, outputs[iRow][lstart:lstop], &r.o)
} else {
galMulSliceXor(matrixRows[iRow][c], in, outputs[iRow][lstart:lstop], &r.o)
}
}
}
lstart = lstop
lstop += r.o.perRound
if lstop > stop {
lstop = stop
}
}
wg.Done()
}
if gor == 1 {
wg.Add(1)
exec(0, byteCount)
return
}
// Make sizes divisible by 64
do = (do + 63) & (^63)
start := 0
for start < byteCount {
if start+do > byteCount {
do = byteCount - start
}
wg.Add(1)
go exec(start, start+do)
2015-06-19 17:31:24 +03:00
start += do
}
wg.Wait()
}
2015-08-07 07:00:37 +03:00
// checkSomeShards is mostly the same as codeSomeShards,
// except this will check values and return
// as soon as a difference is found.
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
func (r *reedSolomon) checkSomeShards(matrixRows, inputs, toCheck [][]byte, byteCount int) bool {
2021-03-08 18:13:24 +03:00
if len(toCheck) == 0 {
return true
}
Wider AVX2 loops and less usage. (#162) * Experiment with 64 bytes/loop AVX2 * Only reduce when doing 64. * Use no more than 8 goroutines for avx2 codegen. ``` name old speed new speed delta Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10) Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10) Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10) Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10) Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10) Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10) Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10) Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10) Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9) Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10) Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9) Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9) Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10) Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8) Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8) Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10) Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10) Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10) Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9) Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8) Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10) Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9) Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10) Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10) Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9) Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7) StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9) StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10) StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9) StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10) StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10) StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7) ```
2021-06-21 16:15:23 +03:00
outputs := make([][]byte, len(toCheck))
for i := range outputs {
outputs[i] = make([]byte, byteCount)
}
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
r.codeSomeShards(matrixRows, inputs, outputs, byteCount)
for i, calc := range outputs {
if !bytes.Equal(calc, toCheck[i]) {
return false
}
}
return true
}
// ErrShardNoData will be returned if there are no shards,
// or if the length of all shards is zero.
2015-06-19 17:31:24 +03:00
var ErrShardNoData = errors.New("no shard data")
// ErrShardSize is returned if shard length isn't the same for all
// shards.
2017-07-16 18:00:58 +03:00
var ErrShardSize = errors.New("shard sizes do not match")
2015-06-19 17:31:24 +03:00
// checkShards will check if shards are the same size
// or 0, if allowed. An error is returned if this fails.
// An error is also returned if all shards are size 0.
2015-06-19 17:31:24 +03:00
func checkShards(shards [][]byte, nilok bool) error {
size := shardSize(shards)
if size == 0 {
return ErrShardNoData
}
2015-06-19 17:31:24 +03:00
for _, shard := range shards {
if len(shard) != size {
if len(shard) != 0 || !nilok {
return ErrShardSize
}
}
}
return nil
}
// shardSize return the size of a single shard.
// The first non-zero size is returned,
// or 0 if all shards are size 0.
2015-06-19 17:31:24 +03:00
func shardSize(shards [][]byte) int {
for _, shard := range shards {
if len(shard) != 0 {
return len(shard)
}
}
return 0
2015-06-19 17:31:24 +03:00
}
// Reconstruct will recreate the missing shards, if possible.
//
// Given a list of shards, some of which contain data, fills in the
// ones that don't have data.
//
// The length of the array must be equal to Shards.
// You indicate that a shard is missing by setting it to nil or zero-length.
// If a shard is zero-length but has sufficient capacity, that memory will
// be used, otherwise a new []byte will be allocated.
2015-06-19 17:31:24 +03:00
//
// If there are too few shards to reconstruct the missing
// ones, ErrTooFewShards will be returned.
//
// The reconstructed shard set is complete, but integrity is not verified.
// Use the Verify function to check if data set is ok.
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) Reconstruct(shards [][]byte) error {
return r.reconstruct(shards, false)
}
// ReconstructData will recreate any missing data shards, if possible.
//
// Given a list of shards, some of which contain data, fills in the
// data shards that don't have data.
//
// The length of the array must be equal to Shards.
// You indicate that a shard is missing by setting it to nil or zero-length.
// If a shard is zero-length but has sufficient capacity, that memory will
// be used, otherwise a new []byte will be allocated.
//
// If there are too few shards to reconstruct the missing
// ones, ErrTooFewShards will be returned.
//
// As the reconstructed shard set may contain missing parity shards,
// calling the Verify function is likely to fail.
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) ReconstructData(shards [][]byte) error {
return r.reconstruct(shards, true)
}
// reconstruct will recreate the missing data shards, and unless
// dataOnly is true, also the missing parity shards
//
// The length of the array must be equal to Shards.
// You indicate that a shard is missing by setting it to nil.
//
// If there are too few shards to reconstruct the missing
// ones, ErrTooFewShards will be returned.
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) reconstruct(shards [][]byte, dataOnly bool) error {
if len(shards) != r.Shards {
2015-06-19 17:31:24 +03:00
return ErrTooFewShards
}
// Check arguments.
err := checkShards(shards, true)
if err != nil {
return err
}
shardSize := shardSize(shards)
// Quick check: are all of the shards present? If so, there's
// nothing to do.
numberPresent := 0
dataPresent := 0
for i := 0; i < r.Shards; i++ {
2015-06-19 17:31:24 +03:00
if len(shards[i]) != 0 {
2015-06-20 11:11:33 +03:00
numberPresent++
if i < r.DataShards {
dataPresent++
}
2015-06-19 17:31:24 +03:00
}
}
if numberPresent == r.Shards || dataOnly && dataPresent == r.DataShards {
2015-06-19 17:31:24 +03:00
// Cool. All of the shards data data. We don't
// need to do anything.
return nil
}
// More complete sanity check
if numberPresent < r.DataShards {
2015-06-19 17:31:24 +03:00
return ErrTooFewShards
}
// Pull out an array holding just the shards that
2015-06-19 17:31:24 +03:00
// correspond to the rows of the submatrix. These shards
// will be the input to the decoding process that re-creates
// the missing data shards.
//
// Also, create an array of indices of the valid rows we do have
// and the invalid rows we don't have up until we have enough valid rows.
subShards := make([][]byte, r.DataShards)
validIndices := make([]int, r.DataShards)
invalidIndices := make([]int, 0)
2015-06-19 17:31:24 +03:00
subMatrixRow := 0
for matrixRow := 0; matrixRow < r.Shards && subMatrixRow < r.DataShards; matrixRow++ {
2015-06-19 17:31:24 +03:00
if len(shards[matrixRow]) != 0 {
subShards[subMatrixRow] = shards[matrixRow]
validIndices[subMatrixRow] = matrixRow
2015-06-19 17:31:24 +03:00
subMatrixRow++
} else {
invalidIndices = append(invalidIndices, matrixRow)
2015-06-19 17:31:24 +03:00
}
}
// Attempt to get the cached inverted matrix out of the tree
// based on the indices of the invalid rows.
dataDecodeMatrix := r.tree.GetInvertedMatrix(invalidIndices)
// If the inverted matrix isn't cached in the tree yet we must
// construct it ourselves and insert it into the tree for the
// future. In this way the inversion tree is lazily loaded.
if dataDecodeMatrix == nil {
// Pull out the rows of the matrix that correspond to the
// shards that we have and build a square matrix. This
// matrix could be used to generate the shards that we have
// from the original data.
subMatrix, _ := newMatrix(r.DataShards, r.DataShards)
for subMatrixRow, validIndex := range validIndices {
for c := 0; c < r.DataShards; c++ {
subMatrix[subMatrixRow][c] = r.m[validIndex][c]
}
}
// Invert the matrix, so we can go from the encoded shards
// back to the original data. Then pull out the row that
// generates the shard that we want to decode. Note that
// since this matrix maps back to the original data, it can
// be used to create a data shard, but not a parity shard.
dataDecodeMatrix, err = subMatrix.Invert()
if err != nil {
return err
}
// Cache the inverted matrix in the tree for future use keyed on the
// indices of the invalid rows.
err = r.tree.InsertInvertedMatrix(invalidIndices, dataDecodeMatrix, r.Shards)
if err != nil {
return err
}
2015-06-19 17:31:24 +03:00
}
// Re-create any data shards that were missing.
//
// The input to the coding is all of the shards we actually
// have, and the output is the missing data shards. The computation
// is done using the special decode matrix we just built.
outputs := make([][]byte, r.ParityShards)
matrixRows := make([][]byte, r.ParityShards)
2015-06-19 17:31:24 +03:00
outputCount := 0
for iShard := 0; iShard < r.DataShards; iShard++ {
2015-06-19 17:31:24 +03:00
if len(shards[iShard]) == 0 {
if cap(shards[iShard]) >= shardSize {
shards[iShard] = shards[iShard][0:shardSize]
} else {
shards[iShard] = make([]byte, shardSize)
}
2015-06-19 17:31:24 +03:00
outputs[outputCount] = shards[iShard]
matrixRows[outputCount] = dataDecodeMatrix[iShard]
outputCount++
}
}
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
r.codeSomeShards(matrixRows, subShards, outputs[:outputCount], shardSize)
2015-06-19 17:31:24 +03:00
if dataOnly {
// Exit out early if we are only interested in the data shards
return nil
}
2015-06-19 17:31:24 +03:00
// Now that we have all of the data shards intact, we can
// compute any of the parity that is missing.
//
// The input to the coding is ALL of the data shards, including
// any that we just calculated. The output is whichever of the
// data shards were missing.
outputCount = 0
for iShard := r.DataShards; iShard < r.Shards; iShard++ {
2015-06-19 17:31:24 +03:00
if len(shards[iShard]) == 0 {
if cap(shards[iShard]) >= shardSize {
shards[iShard] = shards[iShard][0:shardSize]
} else {
shards[iShard] = make([]byte, shardSize)
}
2015-06-19 17:31:24 +03:00
outputs[outputCount] = shards[iShard]
matrixRows[outputCount] = r.parity[iShard-r.DataShards]
2015-06-19 17:31:24 +03:00
outputCount++
}
}
avx2: Improve speed when > 10 input or output shards. (#174) Speeds are including a limiting the number of goroutines with all AVX2 paths, Before/after ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 2240 2240 +0.00% BenchmarkGalois1M-32 19578 18891 -3.51% BenchmarkGaloisXor128K-32 2798 2852 +1.93% BenchmarkGaloisXor1M-32 23334 23345 +0.05% BenchmarkEncode2x1x1M-32 34357 34370 +0.04% BenchmarkEncode10x2x10000-32 3210 3093 -3.64% BenchmarkEncode100x20x10000-32 362925 148214 -59.16% BenchmarkEncode17x3x1M-32 323767 224157 -30.77% BenchmarkEncode10x4x16M-32 8376895 8376737 -0.00% BenchmarkEncode5x2x1M-32 68365 66861 -2.20% BenchmarkEncode10x2x1M-32 101407 93023 -8.27% BenchmarkEncode10x4x1M-32 171880 155477 -9.54% BenchmarkEncode50x20x1M-32 3704691 3015047 -18.62% BenchmarkEncode17x3x16M-32 10279233 10106658 -1.68% BenchmarkEncode_8x4x8M-32 3438245 3326479 -3.25% BenchmarkEncode_12x4x12M-32 6632257 6581637 -0.76% BenchmarkEncode_16x4x16M-32 10815755 10788377 -0.25% BenchmarkEncode_16x4x32M-32 21029061 21507995 +2.28% BenchmarkEncode_16x4x64M-32 42145450 43876850 +4.11% BenchmarkEncode_8x5x8M-32 4543208 3846378 -15.34% BenchmarkEncode_8x6x8M-32 5065494 4397218 -13.19% BenchmarkEncode_8x7x8M-32 5818995 4962884 -14.71% BenchmarkEncode_8x9x8M-32 6215449 6114898 -1.62% BenchmarkEncode_8x10x8M-32 6923415 6610501 -4.52% BenchmarkEncode_8x11x8M-32 7365988 7010473 -4.83% BenchmarkEncode_8x8x05M-32 150857 136820 -9.30% BenchmarkEncode_8x8x1M-32 256722 254854 -0.73% BenchmarkEncode_8x8x8M-32 5547790 5422048 -2.27% BenchmarkEncode_8x8x32M-32 23038643 22705859 -1.44% BenchmarkEncode_24x8x24M-32 27729259 30332216 +9.39% BenchmarkEncode_24x8x48M-32 53865705 61187658 +13.59% BenchmarkVerify10x2x10000-32 8769 8154 -7.01% BenchmarkVerify10x2x1M-32 516149 476180 -7.74% BenchmarkVerify5x2x1M-32 443888 419541 -5.48% BenchmarkVerify10x4x1M-32 1030299 948021 -7.99% BenchmarkVerify50x20x1M-32 7209689 6186891 -14.19% BenchmarkVerify10x4x16M-32 17774456 17681879 -0.52% BenchmarkReconstruct10x2x10000-32 3352 3256 -2.86% BenchmarkReconstruct50x5x50000-32 166417 140900 -15.33% BenchmarkReconstruct10x2x1M-32 189711 174615 -7.96% BenchmarkReconstruct5x2x1M-32 128080 126520 -1.22% BenchmarkReconstruct10x4x1M-32 273312 254017 -7.06% BenchmarkReconstruct50x20x1M-32 3628812 3192474 -12.02% BenchmarkReconstruct10x4x16M-32 8562186 8781479 +2.56% BenchmarkReconstructData10x2x10000-32 3241 3116 -3.86% BenchmarkReconstructData50x5x50000-32 162520 134794 -17.06% BenchmarkReconstructData10x2x1M-32 171253 161955 -5.43% BenchmarkReconstructData5x2x1M-32 102215 106942 +4.62% BenchmarkReconstructData10x4x1M-32 225593 219969 -2.49% BenchmarkReconstructData50x20x1M-32 2515311 2129721 -15.33% BenchmarkReconstructData10x4x16M-32 6980308 6698111 -4.04% BenchmarkReconstructP10x2x10000-32 924 937 +1.35% BenchmarkReconstructP10x5x20000-32 1639 1703 +3.90% BenchmarkSplit10x4x160M-32 4984993 4898045 -1.74% BenchmarkSplit5x2x5M-32 380415 221446 -41.79% BenchmarkSplit10x2x1M-32 58761 53335 -9.23% BenchmarkSplit10x4x10M-32 643188 410959 -36.11% BenchmarkSplit50x20x50M-32 1843879 1647205 -10.67% BenchmarkSplit17x3x272M-32 3684920 3613951 -1.93% BenchmarkParallel_8x8x64K-32 7022 6630 -5.58% BenchmarkParallel_8x8x05M-32 348308 348369 +0.02% BenchmarkParallel_20x10x05M-32 575672 581028 +0.93% BenchmarkParallel_8x8x1M-32 716033 697167 -2.63% BenchmarkParallel_8x8x8M-32 5716048 5616437 -1.74% BenchmarkParallel_8x8x32M-32 22650878 22098667 -2.44% BenchmarkParallel_8x3x1M-32 406839 399125 -1.90% BenchmarkParallel_8x4x1M-32 459107 463890 +1.04% BenchmarkParallel_8x5x1M-32 527488 520334 -1.36% BenchmarkStreamEncode10x2x10000-32 6013 5878 -2.25% BenchmarkStreamEncode100x20x10000-32 503124 267894 -46.75% BenchmarkStreamEncode17x3x1M-32 1561838 1376618 -11.86% BenchmarkStreamEncode10x4x16M-32 19124427 17762582 -7.12% BenchmarkStreamEncode5x2x1M-32 429701 384666 -10.48% BenchmarkStreamEncode10x2x1M-32 801257 763637 -4.70% BenchmarkStreamEncode10x4x1M-32 876065 820744 -6.31% BenchmarkStreamEncode50x20x1M-32 7205112 6081398 -15.60% BenchmarkStreamEncode17x3x16M-32 27182786 26117143 -3.92% BenchmarkStreamVerify10x2x10000-32 13767 14026 +1.88% BenchmarkStreamVerify50x5x50000-32 826983 690453 -16.51% BenchmarkStreamVerify10x2x1M-32 1238566 1182591 -4.52% BenchmarkStreamVerify5x2x1M-32 892661 806301 -9.67% BenchmarkStreamVerify10x4x1M-32 1676394 1631495 -2.68% BenchmarkStreamVerify50x20x1M-32 10877875 10037678 -7.72% BenchmarkStreamVerify10x4x16M-32 27599576 30435400 +10.27% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 58518.53 58510.17 1.00x BenchmarkGalois1M-32 53558.10 55507.44 1.04x BenchmarkGaloisXor128K-32 46839.74 45961.09 0.98x BenchmarkGaloisXor1M-32 44936.98 44917.46 1.00x BenchmarkEncode2x1x1M-32 91561.27 91524.11 1.00x BenchmarkEncode10x2x10000-32 37385.54 38792.54 1.04x BenchmarkEncode100x20x10000-32 3306.47 8096.40 2.45x BenchmarkEncode17x3x1M-32 64773.49 93557.14 1.44x BenchmarkEncode10x4x16M-32 28039.15 28039.68 1.00x BenchmarkEncode5x2x1M-32 107365.88 109781.16 1.02x BenchmarkEncode10x2x1M-32 124083.62 135266.27 1.09x BenchmarkEncode10x4x1M-32 85408.99 94419.71 1.11x BenchmarkEncode50x20x1M-32 19812.81 24344.67 1.23x BenchmarkEncode17x3x16M-32 32642.93 33200.32 1.02x BenchmarkEncode_8x4x8M-32 29277.52 30261.21 1.03x BenchmarkEncode_12x4x12M-32 30355.67 30589.14 1.01x BenchmarkEncode_16x4x16M-32 31023.66 31102.39 1.00x BenchmarkEncode_16x4x32M-32 31912.44 31201.82 0.98x BenchmarkEncode_16x4x64M-32 31846.32 30589.65 0.96x BenchmarkEncode_8x5x8M-32 24003.28 28351.84 1.18x BenchmarkEncode_8x6x8M-32 23184.41 26707.91 1.15x BenchmarkEncode_8x7x8M-32 21623.86 25354.03 1.17x BenchmarkEncode_8x9x8M-32 22943.85 23321.13 1.02x BenchmarkEncode_8x10x8M-32 21809.31 22841.68 1.05x BenchmarkEncode_8x11x8M-32 21637.77 22735.06 1.05x BenchmarkEncode_8x8x05M-32 55606.22 61311.47 1.10x BenchmarkEncode_8x8x1M-32 65351.80 65830.73 1.01x BenchmarkEncode_8x8x8M-32 24193.01 24754.07 1.02x BenchmarkEncode_8x8x32M-32 23303.06 23644.60 1.01x BenchmarkEncode_24x8x24M-32 29041.76 26549.54 0.91x BenchmarkEncode_24x8x48M-32 29900.52 26322.51 0.88x BenchmarkVerify10x2x10000-32 13685.12 14717.10 1.08x BenchmarkVerify10x2x1M-32 24378.43 26424.72 1.08x BenchmarkVerify5x2x1M-32 16535.79 17495.41 1.06x BenchmarkVerify10x4x1M-32 14248.35 15484.96 1.09x BenchmarkVerify50x20x1M-32 10180.79 11863.85 1.17x BenchmarkVerify10x4x16M-32 13214.53 13283.71 1.01x BenchmarkReconstruct10x2x10000-32 35799.16 36854.89 1.03x BenchmarkReconstruct50x5x50000-32 33049.47 39034.89 1.18x BenchmarkReconstruct10x2x1M-32 66326.88 72061.06 1.09x BenchmarkReconstruct5x2x1M-32 57308.21 58014.92 1.01x BenchmarkReconstruct10x4x1M-32 53711.74 57791.66 1.08x BenchmarkReconstruct50x20x1M-32 20227.09 22991.67 1.14x BenchmarkReconstruct10x4x16M-32 27432.37 26747.32 0.98x BenchmarkReconstructData10x2x10000-32 37030.86 38511.87 1.04x BenchmarkReconstructData50x5x50000-32 33842.07 40802.85 1.21x BenchmarkReconstructData10x2x1M-32 73475.57 77693.87 1.06x BenchmarkReconstructData5x2x1M-32 71809.58 68635.57 0.96x BenchmarkReconstructData10x4x1M-32 65073.27 66736.88 1.03x BenchmarkReconstructData50x20x1M-32 29181.41 34464.76 1.18x BenchmarkReconstructData10x4x16M-32 33649.09 35066.75 1.04x BenchmarkReconstructP10x2x10000-32 129819.98 128086.76 0.99x BenchmarkReconstructP10x5x20000-32 183073.89 176202.21 0.96x BenchmarkParallel_8x8x64K-32 149327.33 158153.67 1.06x BenchmarkParallel_8x8x05M-32 24083.89 24079.69 1.00x BenchmarkParallel_20x10x05M-32 27322.20 27070.35 0.99x BenchmarkParallel_8x8x1M-32 23430.78 24064.83 1.03x BenchmarkParallel_8x8x8M-32 23480.86 23897.31 1.02x BenchmarkParallel_8x8x32M-32 23701.99 24294.27 1.02x BenchmarkParallel_8x3x1M-32 28351.11 28899.03 1.02x BenchmarkParallel_8x4x1M-32 27407.34 27124.76 0.99x BenchmarkParallel_8x5x1M-32 25842.27 26197.58 1.01x BenchmarkStreamEncode10x2x10000-32 16629.76 17012.26 1.02x BenchmarkStreamEncode100x20x10000-32 1987.58 3732.83 1.88x BenchmarkStreamEncode17x3x1M-32 11413.34 12948.97 1.13x BenchmarkStreamEncode10x4x16M-32 8772.66 9445.26 1.08x BenchmarkStreamEncode5x2x1M-32 12201.21 13629.70 1.12x BenchmarkStreamEncode10x2x1M-32 13086.64 13731.34 1.05x BenchmarkStreamEncode10x4x1M-32 11969.16 12775.92 1.07x BenchmarkStreamEncode50x20x1M-32 7276.61 8621.18 1.18x BenchmarkStreamEncode17x3x16M-32 10492.40 10920.52 1.04x BenchmarkStreamVerify10x2x10000-32 7264.00 7129.49 0.98x BenchmarkStreamVerify50x5x50000-32 6046.07 7241.62 1.20x BenchmarkStreamVerify10x2x1M-32 8466.05 8866.77 1.05x BenchmarkStreamVerify5x2x1M-32 5873.31 6502.39 1.11x BenchmarkStreamVerify10x4x1M-32 6254.95 6427.09 1.03x BenchmarkStreamVerify50x20x1M-32 4819.76 5223.20 1.08x BenchmarkStreamVerify10x4x16M-32 6078.79 5512.40 0.91x ```
2021-12-09 14:28:44 +03:00
r.codeSomeShards(matrixRows, shards[:r.DataShards], outputs[:outputCount], shardSize)
2015-06-19 17:31:24 +03:00
return nil
}
// ErrShortData will be returned by Split(), if there isn't enough data
// to fill the number of shards.
var ErrShortData = errors.New("not enough data to fill the number of requested shards")
// Split a data slice into the number of shards given to the encoder,
// and create empty parity shards if necessary.
//
// The data will be split into equally sized shards.
2015-08-07 07:00:37 +03:00
// If the data size isn't divisible by the number of shards,
// the last shard will contain extra zeros.
//
2016-05-01 13:00:51 +03:00
// There must be at least 1 byte otherwise ErrShortData will be
// returned.
//
// The data will not be copied, except for the last shard, so you
// should not modify the data of the input slice afterwards.
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) Split(data []byte) ([][]byte, error) {
if len(data) == 0 {
return nil, ErrShortData
}
dataLen := len(data)
// Calculate number of bytes per data shard.
perShard := (len(data) + r.DataShards - 1) / r.DataShards
2015-08-07 07:00:37 +03:00
if cap(data) > len(data) {
data = data[:cap(data)]
}
// Only allocate memory if necessary
var padding []byte
if len(data) < (r.Shards * perShard) {
// calculate maximum number of full shards in `data` slice
fullShards := len(data) / perShard
padding = make([]byte, r.Shards*perShard-perShard*fullShards)
copy(padding, data[perShard*fullShards:])
data = data[0 : perShard*fullShards]
} else {
for i := dataLen; i < dataLen+r.DataShards; i++ {
data[i] = 0
}
}
2015-08-08 23:10:36 +03:00
// Split into equal-length shards.
dst := make([][]byte, r.Shards)
i := 0
for ; i < len(dst) && len(data) >= perShard; i++ {
dst[i] = data[:perShard:perShard]
2015-08-07 07:00:37 +03:00
data = data[perShard:]
}
for j := 0; i+j < len(dst); j++ {
dst[i+j] = padding[:perShard:perShard]
padding = padding[perShard:]
}
return dst, nil
}
// ErrReconstructRequired is returned if too few data shards are intact and a
// reconstruction is required before you can successfully join the shards.
var ErrReconstructRequired = errors.New("reconstruction required as one or more required data shards are nil")
// Join the shards and write the data segment to dst.
//
// Only the data shards are considered.
// You must supply the exact output size you want.
//
// If there are to few shards given, ErrTooFewShards will be returned.
// If the total data size is less than outSize, ErrShortData will be returned.
// If one or more required data shards are nil, ErrReconstructRequired will be returned.
Generate AVX2 code (#141) Replaces AVX2 up to 10x8 configurations with specific generated functions. If code size is a concern `-tags=nogen` can be used. Biggest speedup when not memory constrained. ``` benchmark old MB/s new MB/s speedup BenchmarkEncode_8x5x8M 5895.75 9648.18 1.64x BenchmarkEncode_8x5x8M-4 16773.41 17220.67 1.03x BenchmarkEncode_8x5x8M-16 18263.12 17176.28 0.94x BenchmarkEncode_8x6x8M 5075.89 8548.39 1.68x BenchmarkEncode_8x6x8M-4 14559.83 15370.95 1.06x BenchmarkEncode_8x6x8M-16 16183.37 15291.98 0.94x BenchmarkEncode_8x7x8M 4481.18 7015.60 1.57x BenchmarkEncode_8x7x8M-4 12835.35 13695.90 1.07x BenchmarkEncode_8x7x8M-16 14246.94 13737.36 0.96x BenchmarkEncode_8x8x05M 5569.95 7947.70 1.43x BenchmarkEncode_8x8x05M-4 17334.91 25271.37 1.46x BenchmarkEncode_8x8x05M-16 29349.42 35043.36 1.19x BenchmarkEncode_8x8x1M 4830.58 7891.32 1.63x BenchmarkEncode_8x8x1M-4 17531.36 27371.42 1.56x BenchmarkEncode_8x8x1M-16 29593.98 39241.09 1.33x BenchmarkEncode_8x8x8M 3953.66 6584.26 1.67x BenchmarkEncode_8x8x8M-4 11527.34 12331.23 1.07x BenchmarkEncode_8x8x8M-16 12718.89 12173.08 0.96x BenchmarkEncode_8x8x32M 3927.51 6195.91 1.58x BenchmarkEncode_8x8x32M-4 11490.85 11424.39 0.99x BenchmarkEncode_8x8x32M-16 12506.09 11888.55 0.95x benchmark old MB/s new MB/s speedup BenchmarkParallel_8x8x64K 5490.24 6959.57 1.27x BenchmarkParallel_8x8x64K-4 21078.94 29557.51 1.40x BenchmarkParallel_8x8x64K-16 57508.45 73672.54 1.28x BenchmarkParallel_8x8x1M 4755.49 7667.84 1.61x BenchmarkParallel_8x8x1M-4 11818.66 12013.49 1.02x BenchmarkParallel_8x8x1M-16 12923.12 12109.42 0.94x BenchmarkParallel_8x8x8M 3973.94 6525.85 1.64x BenchmarkParallel_8x8x8M-4 11725.68 11312.46 0.96x BenchmarkParallel_8x8x8M-16 12608.20 11484.98 0.91x BenchmarkParallel_8x3x1M 14139.71 17993.04 1.27x BenchmarkParallel_8x3x1M-4 21805.97 23053.92 1.06x BenchmarkParallel_8x3x1M-16 24673.05 23596.71 0.96x BenchmarkParallel_8x4x1M 10617.88 14474.54 1.36x BenchmarkParallel_8x4x1M-4 18635.82 18965.65 1.02x BenchmarkParallel_8x4x1M-16 21518.12 20171.47 0.94x BenchmarkParallel_8x5x1M 8669.88 11833.96 1.36x BenchmarkParallel_8x5x1M-4 16321.00 17500.30 1.07x BenchmarkParallel_8x5x1M-16 17267.16 17191.04 1.00x ```
2020-05-20 13:48:34 +03:00
func (r *reedSolomon) Join(dst io.Writer, shards [][]byte, outSize int) error {
// Do we have enough shards?
if len(shards) < r.DataShards {
return ErrTooFewShards
}
shards = shards[:r.DataShards]
// Do we have enough data?
size := 0
for _, shard := range shards {
if shard == nil {
return ErrReconstructRequired
}
size += len(shard)
// Do we have enough data already?
if size >= outSize {
break
}
}
if size < outSize {
return ErrShortData
}
// Copy data to dst
2015-08-07 07:00:37 +03:00
write := outSize
for _, shard := range shards {
2015-08-07 07:00:37 +03:00
if write < len(shard) {
_, err := dst.Write(shard[:write])
return err
}
2015-08-07 07:00:37 +03:00
n, err := dst.Write(shard)
if err != nil {
return err
}
2015-08-07 07:00:37 +03:00
write -= n
}
return nil
}