Commit Graph

8 Commits (jerasure-matrix)

Author SHA1 Message Date
Zhang Boyang 195d6fc1ad
Fix build tags for gccgo (#163) 2021-03-18 13:39:19 +01:00
Klaus Post f338110979
Make sure assembler is formatted (#145)
* Make sure assembler is formatted
2020-05-14 12:04:55 +02:00
Frank Wessels 27f8a7b6bf
Small optimization to parallal82 for AVX512 by reducing the number of VSHUFI64X2 instructions in the core loop (#143) 2020-05-14 10:19:23 +02:00
Frank Wessels d6d9fba4f9
Take vshufi64x2 out of main loop and initialize upfront (for parallel 81 only) (#139) 2020-05-13 10:59:26 +02:00
Klaus Post 3067f8aed5
asmfmt 2020-05-06 12:36:43 +02:00
Frank Wessels 1b9e129671
Avx512 parallel81 (#131)
* AVX512 routine for 8x1 parallel processing (WIP)

* Testing and integration of Parallel81 assembly routine
2020-05-06 12:32:31 +02:00
Frank Wessels 0b98f5350a
Refactor AVX512 code to use Go assembly instructions. (#121)
Additionally there is a small performance improvement using VPTERNLOGD (instead of two VPXORD instructions).
2020-05-03 13:43:52 +02:00
Frank Wessels 79aee05119 AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91)
The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table:

```
$ benchcmp avx2.txt avx512.txt
benchmark                      AVX2 MB/s    AVX512 MB/s   speedup
BenchmarkEncode8x8x1M-72       1681.35      4125.64       2.45x
BenchmarkEncode8x4x8M-72       1529.36      5507.97       3.60x
BenchmarkEncode8x8x8M-72        791.16      2952.29       3.73x
BenchmarkEncode8x8x32M-72       573.26      2168.61       3.78x
BenchmarkEncode12x4x12M-72     1234.41      4912.37       3.98x
BenchmarkEncode16x4x16M-72     1189.59      5138.01       4.32x
BenchmarkEncode24x8x24M-72      690.68      2583.70       3.74x
BenchmarkEncode24x8x48M-72      674.20      2643.31       3.92x
```
2019-02-10 11:17:23 +01:00