Commit Graph

58 Commits (jerasure-matrix)

Author SHA1 Message Date
Vitaliy Filippov 19a04effc5
Implement ReconstructSome() to reconstruct only specific data shards (#189)
Co-authored-by: Vitaliy Filippov <vitalif@yourcmc.ru>
2022-06-17 11:58:26 +02:00
Klaus Post daf81ef0bd
Use VPTERNLOGD on GOAMD64=v4 (#182)
* Use VPTERNLOGD on GOAMD64=v4
* Bump to Go 1.18
2022-03-16 11:10:29 +01:00
Klaus Post cb226cd9d6
Update docs 2021-11-16 11:47:50 +01:00
Klaus Post 69e2f21ffd
Create performance baseline (#168) 2021-09-01 18:37:47 +02:00
Klaus Post 6e6fbcd31d
Update README.md 2020-12-17 11:35:52 +01:00
Klaus Post 60143f4a15
Update README.md 2020-12-09 22:56:37 +01:00
Frank Wessels 6fbce20c81
Updated performance numbers for Graviton2 on ARM (#146) 2020-05-15 11:09:11 +02:00
Klaus Post e8fdfd6630
Update readme and re-allow s390x failure. 2020-05-14 14:29:53 +02:00
Frank Wessels 2475ea7519
Use proper NEON assembly instructions for ARM (#144)
* Use proper NEON assembly instructions for ARM

* Updated performance numbers for ARM
2020-05-14 10:18:32 +02:00
Klaus Post cf8495259a
Add pure XOR for 1 parity (#138)
WithFastOneParityMatrix will switch the matrix to a simple xor if there is only one parity shard.
The PAR1 matrix already has this property so it has little effect there.
2020-05-13 11:10:58 +02:00
Klaus Post 96dc2a5aa4
Update README 2020-05-06 13:47:25 +02:00
Klaus Post a9588190c0
Optimize pure Go version. (#96)
* Optimize pure Go version.
* Update docs. Add Go 1.12 CI

* Avoid dst bounds check when using noasm ~ 40-50% faster.
* Convert multiply table to a slice whenever used.
* Split on 32 byte boundaries instead of 16 byte.
2019-03-08 10:49:27 +01:00
Klaus Post 2b210cf086
Update README.md
Remove dead link
2019-02-10 22:49:25 +01:00
Frank Wessels 79aee05119 AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91)
The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table:

```
$ benchcmp avx2.txt avx512.txt
benchmark                      AVX2 MB/s    AVX512 MB/s   speedup
BenchmarkEncode8x8x1M-72       1681.35      4125.64       2.45x
BenchmarkEncode8x4x8M-72       1529.36      5507.97       3.60x
BenchmarkEncode8x8x8M-72        791.16      2952.29       3.73x
BenchmarkEncode8x8x32M-72       573.26      2168.61       3.78x
BenchmarkEncode12x4x12M-72     1234.41      4912.37       3.98x
BenchmarkEncode16x4x16M-72     1189.59      5138.01       4.32x
BenchmarkEncode24x8x24M-72      690.68      2583.70       3.74x
BenchmarkEncode24x8x48M-72      674.20      2643.31       3.92x
```
2019-02-10 11:17:23 +01:00
Frank Wessels 8885f3a1c7 Feature/ppc support (#88)
Add accelerated PPC support.
2018-12-18 20:39:59 +01:00
Darren 3133c51b91 Added link to ocaml-reed-solomon-erasure to README.md (#79) 2018-06-30 10:15:29 +02:00
Klaus Post 7d9453e171
Update README.md 2018-05-04 15:02:00 +02:00
Aleksandr Razumov 19a926a71b
Fix typo in README.md 2017-12-15 19:01:33 +03:00
Klaus Post f5e73dcfe2 Split blocks into size divisible by 16
Older systems (typically without AVX2) are more sensitive to misaligned load+stores.

Add parameter to automatically set the number of goroutines.

name                  old time/op    new time/op    delta
Encode10x2x10000-8      18.4µs ± 1%    16.1µs ± 1%  -12.43%    (p=0.000 n=9+9)
Encode100x20x10000-8     692µs ± 1%     608µs ± 1%  -12.10%  (p=0.000 n=10+10)
Encode17x3x1M-8         1.78ms ± 5%    1.49ms ± 1%  -16.63%  (p=0.000 n=10+10)
Encode10x4x16M-8        21.5ms ± 5%    19.6ms ± 4%   -8.74%   (p=0.000 n=10+9)
Encode5x2x1M-8           343µs ± 2%     267µs ± 2%  -22.22%   (p=0.000 n=9+10)
Encode10x2x1M-8          858µs ± 5%     701µs ± 5%  -18.34%  (p=0.000 n=10+10)
Encode10x4x1M-8         1.34ms ± 1%    1.16ms ± 1%  -13.19%    (p=0.000 n=9+9)
Encode50x20x1M-8        30.3ms ± 4%    25.0ms ± 2%  -17.51%   (p=0.000 n=10+8)
Encode17x3x16M-8        26.9ms ± 1%    24.5ms ± 4%   -9.13%   (p=0.000 n=8+10)

name                  old speed      new speed      delta
Encode10x2x10000-8    5.45GB/s ± 1%  6.22GB/s ± 1%  +14.20%    (p=0.000 n=9+9)
Encode100x20x10000-8  1.44GB/s ± 1%  1.64GB/s ± 1%  +13.77%  (p=0.000 n=10+10)
Encode17x3x1M-8       10.0GB/s ± 5%  12.0GB/s ± 1%  +19.88%  (p=0.000 n=10+10)
Encode10x4x16M-8      7.81GB/s ± 5%  8.56GB/s ± 5%   +9.58%   (p=0.000 n=10+9)
Encode5x2x1M-8        15.3GB/s ± 2%  19.6GB/s ± 2%  +28.57%   (p=0.000 n=9+10)
Encode10x2x1M-8       12.2GB/s ± 5%  15.0GB/s ± 5%  +22.45%  (p=0.000 n=10+10)
Encode10x4x1M-8       7.84GB/s ± 1%  9.03GB/s ± 1%  +15.19%    (p=0.000 n=9+9)
Encode50x20x1M-8      1.73GB/s ± 4%  2.09GB/s ± 4%  +20.59%   (p=0.000 n=10+9)
Encode17x3x16M-8      10.6GB/s ± 1%  11.7GB/s ± 4%  +10.12%   (p=0.000 n=8+10)
2017-11-18 22:00:55 +01:00
Nick Heindl e52c150f96 Fix some small typos in README (#71) 2017-11-18 16:17:31 +01:00
Klaus Post 6bb6130ff6 Add laste new feature to doc. 2017-10-01 14:06:06 +02:00
Klaus Post 61c22eab55 Cauchy Matrix option (#70)
* Experimental Cauchy Matrix

Experimental support for Cauchy style matrix

http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf

All matrices appear reversible.

* Remove Go 1.5 and 1.6 from CI tests.

* Fix comment.

* Increase max number of goroutines+docs.
2017-10-01 14:02:11 +02:00
Klaus Post c71640765a Update docs before release, when #62 is ready. (#63)
* Update docs before release, when #62 is ready.

* Update README.md
2017-08-26 11:48:42 +02:00
Frank Wessels 7b88f42e61 Add NEON support for ARM64 (#62)
* Add support for arm64 using NEON instructions

Specifically using the PMULL/PMULL2 polynomial multiplication instructions followed by a reduction step (actually two steps).

* Add ARM performance numbers

* Formatting for performance table

* Refactoring of NEON version and 256-bit wide version

* Expand test slice beyond 32 (for AVX2 and NEON) and test galMulSliceXor explicitly.

* Fix ARM code with missing function.

* Fix missing newline
2017-08-26 11:47:42 +02:00
Klaus Post dc6af2dce5 Minor cleanup (#61)
* Remove some benchmarks
* Format tables a bit.
* Doc cleanup
2017-08-13 22:38:27 +02:00
Klaus Post 82ee2d9869 Update README.md 2017-07-20 12:24:02 +02:00
Frank Wessels 0de37d7697 Add ReconstructData interface method (#57)
* Add ReconstructData interface method to allow reconstruction of any missing data shards
* Add support for just reconstructing data shards only to SteamEncoder.Reconstruct()
2017-07-20 12:15:46 +02:00
Klaus Post 5abf0ee302 Add options (#46)
* Add options

Make constants changeable as options.

The API remains backwards compatible.

* Update documentation.

* Fix line endings

* fmt

* fmt

* Use functions for parameters.

Much neater.
2017-02-19 11:13:22 +01:00
Jesse Lucas ff2f89b6ca Update README.md to fix small typos. 2017-01-05 00:16:24 -05:00
Klaus Post d0a56f72c0 Update README.md 2016-10-28 09:13:20 +02:00
Klaus Post 9998b4cb21 Update README.md 2016-10-28 09:00:26 +02:00
Frank 467733eb9c Add generated byte assembler using asm2plan9s
Add recompilable assembler using asm2plan9s
2016-07-06 21:06:00 +02:00
Klaus Post fab3ee4030 Update README.md 2016-01-11 14:39:25 +01:00
Klaus Post 064a64aeae Update README.md
Add documentation links.
2015-10-27 15:37:09 +01:00
klauspost d2a0b1c12f Show master branch status 2015-10-27 15:14:59 +01:00
klauspost cfc4949ad7 Add Streaming information to the README.md 2015-10-27 14:20:22 +01:00
klauspost dc9cd67c8c PSHUFB is S(upplemental)-SSE3, not plain SSE3. 2015-06-24 16:57:38 +02:00
klauspost 8ebf356efb The number of data shards must be below 257. Check that and update documentation. 2015-06-23 13:39:57 +02:00
Klaus Post 6350190f56 Update README.md 2015-06-22 18:36:27 +02:00
Klaus Post 2e004190cc Update README.md 2015-06-22 15:47:16 +02:00
Klaus Post b52b6b42b1 Update README.md 2015-06-22 15:18:35 +02:00
Klaus Post 9eb924a6c3 Update README.md 2015-06-22 15:16:01 +02:00
Klaus Post 4cd5506bcf Update README.md 2015-06-22 15:15:12 +02:00
Klaus Post 95bed75738 Update README.md 2015-06-22 15:14:22 +02:00
Klaus Post 80b3249f9c Update README.md 2015-06-22 14:56:21 +02:00
Klaus Post 159cea6f93 Update README.md 2015-06-22 14:55:05 +02:00
Klaus Post bab085de71 Update README.md 2015-06-22 12:35:51 +02:00
Klaus Post 1b2f439221 Update README.md 2015-06-22 12:15:33 +02:00
Klaus Post 2ed146b387 Update README.md 2015-06-22 12:03:17 +02:00
Klaus Post c7a0c49be7 Update README.md 2015-06-22 12:00:36 +02:00