When running mdtest create + read, fileperproc is set properly so
the driver known it's not a single shared file. But when mdtest is
running with only read (-E) with a pre-existing dataset, fileperproc
is never set, and driver thinks it's a single shared file and can
do optimization to share the file handle.
Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
* Support random data generation in utilities. Update first 8 byte element in each 4k block on updates to defy dedup.
* Incorporate different packet types into mdtest/md-workbench.
* Integrated utilities memory pattern tools into IOR. Now all tools use the same patterns.
* Added IOR long option for compatibility between IOR and other tools.
* Added new tests for random buffers.
* Basic support for memory allocation on GPU using CUDA unified memory. Partially addressing #284. IOR support completed.
* Support for GPU alloc in MDTest and MD-Workbench
* Option: support repeated parsing of same option (allows option sharing across modules).
* Checks for gpuDirect
* Integrate gpuDirect options and basic hooks, more testing to be done.
* POSIX: basic gpuDirect implementation working with fake-gpudirect library.
* CUDA allow setting of DeviceID for IOR (not yet MDTest).
* CUDA/GPUDirect Support --with-X=<path>
* Bugfix in option parser for flags that are part of an argument for an option, e.g., -O=1, if 1 is a flag it is wrongly assumed to be a flag.
* Improve and fix the reporting in MDTest. Reporting per rank now outputs the performance of the individual rank (before the barrier), the iteration throughput includes the barrier time (if not -B=0 is set). Before, it was the time after the barrier.
* Clarify the computation of the results.
* MDTest improve CSV output to include aggregated result.
* MDTest allow storing information per rank for later analysis when using the --saveRankPerformanceDetails=<FILE> option
* MDTest: refactored calculation of results, added time_before_executing a barrier.
The calculation per iteration first computes the value of the slowest process, i.e., highest time or lowest rate. This is then the value for the iteration.
Secondly, calculate the min/max/mean across iterations.
For tree operations, the value is identical to previous as only Rank 0 is involved.