Commit Graph

9 Commits (7761c8f7cd99505cfd8b605059dfd13f14243d27)

Author SHA1 Message Date
Klaus Post 7761c8f7cd
Use Workflows (#169)
* Use Workflows
* Go 1.17 build tags
* Do races separately.
2021-09-01 18:55:02 +02:00
Frank Wessels 2f8e50e65c
Better test coverage for AVX512 (parallel version) (#134) 2020-05-07 09:28:23 +02:00
Frank Wessels 1b9e129671
Avx512 parallel81 (#131)
* AVX512 routine for 8x1 parallel processing (WIP)

* Testing and integration of Parallel81 assembly routine
2020-05-06 12:32:31 +02:00
Klaus Post de70cc155f
AVX512 parallel processing (#120)
Do concurrent processing in AVX512 mode and split jobs by cache size.
2020-05-04 09:17:40 +02:00
Klaus Post 65df535980
Make single goroutine encodes more efficient (#122)
Calculate the optimal per round size to keep data in cache when not using WithAutoGoroutines.

```
λ benchcmp before.txt after.txt
benchmark                          old ns/op     new ns/op     delta
BenchmarkParallel_8x8x05M-16       675225        321053        -52.45%
BenchmarkParallel_20x10x05M-16     3471988       600740        -82.70%
BenchmarkParallel_8x8x1M-16        3948606       728093        -81.56%
BenchmarkParallel_8x8x8M-16        47361588      5976467       -87.38%
BenchmarkParallel_8x8x32M-16       195044200     24365474      -87.51%

benchmark                          old MB/s     new MB/s     speedup
BenchmarkParallel_8x8x05M-16       6211.71      13064.22     2.10x
BenchmarkParallel_20x10x05M-16     3020.10      17454.73     5.78x
BenchmarkParallel_8x8x1M-16        2124.45      11521.34     5.42x
BenchmarkParallel_8x8x8M-16        1416.95      11228.85     7.92x
BenchmarkParallel_8x8x32M-16       1376.28      11017.04     8.00x

```
2020-05-03 19:37:22 +02:00
Frank Wessels 0b98f5350a
Refactor AVX512 code to use Go assembly instructions. (#121)
Additionally there is a small performance improvement using VPTERNLOGD (instead of two VPXORD instructions).
2020-05-03 13:43:52 +02:00
Klaus Post d2cfcb8065
Add commandline arg to disable asm for tests. (#116)
* Add commandline test args
2020-04-22 15:38:21 +02:00
Klaus Post 101092fa3b
Make AVX512 short tests (#114)
Tests are timing out. Use shorter tests for -short.
2020-01-18 14:50:31 +01:00
Frank Wessels 79aee05119 AVX512 accelerated version resulting in a 4x speed improvement over AVX2 (#91)
The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis of up to 4x compared to AVX2 as can be seen in the following table:

```
$ benchcmp avx2.txt avx512.txt
benchmark                      AVX2 MB/s    AVX512 MB/s   speedup
BenchmarkEncode8x8x1M-72       1681.35      4125.64       2.45x
BenchmarkEncode8x4x8M-72       1529.36      5507.97       3.60x
BenchmarkEncode8x8x8M-72        791.16      2952.29       3.73x
BenchmarkEncode8x8x32M-72       573.26      2168.61       3.78x
BenchmarkEncode12x4x12M-72     1234.41      4912.37       3.98x
BenchmarkEncode16x4x16M-72     1189.59      5138.01       4.32x
BenchmarkEncode24x8x24M-72      690.68      2583.70       3.74x
BenchmarkEncode24x8x48M-72      674.20      2643.31       3.92x
```
2019-02-10 11:17:23 +01:00