converting to sphinx code snippets (#343)
* converting to sphinx code snippets * more rigorous code highlightingmaster
parent
f239b74d83
commit
61f36f0c48
|
@ -6,19 +6,19 @@ Install
|
||||||
Building
|
Building
|
||||||
--------
|
--------
|
||||||
|
|
||||||
0. If "configure" is missing from the top level directory, you
|
0. If ``configure`` is missing from the top level directory, you
|
||||||
probably retrieved this code directly from the repository.
|
probably retrieved this code directly from the repository.
|
||||||
Run "./bootstrap".
|
Run ``./bootstrap``.
|
||||||
|
|
||||||
If your versions of the autotools are not new enough to run
|
If your versions of the autotools are not new enough to run
|
||||||
this script, download and official tarball in which the
|
this script, download and official tarball in which the
|
||||||
configure script is already provided.
|
configure script is already provided.
|
||||||
|
|
||||||
1. Run "./configure"
|
1. Run ``./configure``
|
||||||
|
|
||||||
See "./configure --help" for configuration options.
|
See ``./configure --help`` for configuration options.
|
||||||
|
|
||||||
2. Run "make"
|
2. Run ``make``
|
||||||
|
|
||||||
3. Optionally, run "make install". The installation prefix
|
3. Optionally, run ``make install``. The installation prefix
|
||||||
can be changed as an option to the "configure" script.
|
can be changed as an option to the ``configure`` script.
|
||||||
|
|
|
@ -11,10 +11,10 @@ Running IOR
|
||||||
-----------
|
-----------
|
||||||
There are two ways of running IOR:
|
There are two ways of running IOR:
|
||||||
|
|
||||||
1) Command line with arguments -- executable followed by command line
|
1) Command line with arguments -- executable followed by command line options.
|
||||||
options.
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
::
|
|
||||||
$ ./IOR -w -r -o filename
|
$ ./IOR -w -r -o filename
|
||||||
|
|
||||||
This performs a write and a read to the file 'filename'.
|
This performs a write and a read to the file 'filename'.
|
||||||
|
@ -24,7 +24,8 @@ There are two ways of running IOR:
|
||||||
conjunction with this for varying specific tests during an execution of
|
conjunction with this for varying specific tests during an execution of
|
||||||
the code. Only arguments before the script will be used!
|
the code. Only arguments before the script will be used!
|
||||||
|
|
||||||
::
|
.. code-block:: shell
|
||||||
|
|
||||||
$ ./IOR -W -f script
|
$ ./IOR -W -f script
|
||||||
|
|
||||||
This defaults all tests in 'script' to use write data checking.
|
This defaults all tests in 'script' to use write data checking.
|
||||||
|
@ -40,10 +41,10 @@ Getting Started with IOR
|
||||||
|
|
||||||
IOR writes data sequentially with the following parameters:
|
IOR writes data sequentially with the following parameters:
|
||||||
|
|
||||||
* blockSize (-b)
|
* ``blockSize`` (``-b``)
|
||||||
* transferSize (-t)
|
* ``transferSize`` (``-t``)
|
||||||
* segmentCount (-s)
|
* ``segmentCount`` (``-s``)
|
||||||
* numTasks (-n)
|
* ``numTasks`` (``-n``)
|
||||||
|
|
||||||
which are best illustrated with a diagram:
|
which are best illustrated with a diagram:
|
||||||
|
|
||||||
|
@ -52,7 +53,9 @@ which are best illustrated with a diagram:
|
||||||
|
|
||||||
These four parameters are all you need to get started with IOR. However,
|
These four parameters are all you need to get started with IOR. However,
|
||||||
naively running IOR usually gives disappointing results. For example, if we run
|
naively running IOR usually gives disappointing results. For example, if we run
|
||||||
a four-node IOR test that writes a total of 16 GiB::
|
a four-node IOR test that writes a total of 16 GiB:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
$ mpirun -n 64 ./ior -t 1m -b 16m -s 16
|
$ mpirun -n 64 ./ior -t 1m -b 16m -s 16
|
||||||
...
|
...
|
||||||
|
@ -67,7 +70,9 @@ we can only get a couple hundred megabytes per second out of a Lustre file
|
||||||
system that should be capable of a lot more.
|
system that should be capable of a lot more.
|
||||||
|
|
||||||
Switching from writing to a single-shared file to one file per process using the
|
Switching from writing to a single-shared file to one file per process using the
|
||||||
-F (filePerProcess=1) option changes the performance dramatically::
|
``-F`` (``filePerProcess=1``) option changes the performance dramatically:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
$ mpirun -n 64 ./ior -t 1m -b 16m -s 16 -F
|
$ mpirun -n 64 ./ior -t 1m -b 16m -s 16 -F
|
||||||
...
|
...
|
||||||
|
@ -123,7 +128,7 @@ There are a couple of ways to measure the read performance of the underlying
|
||||||
Lustre file system. The most crude way is to simply write more data than will
|
Lustre file system. The most crude way is to simply write more data than will
|
||||||
fit into the total page cache so that by the time the write phase has completed,
|
fit into the total page cache so that by the time the write phase has completed,
|
||||||
the beginning of the file has already been evicted from cache. For example,
|
the beginning of the file has already been evicted from cache. For example,
|
||||||
increasing the number of segments (-s) to write more data reveals the point at
|
increasing the number of segments (``-s``) to write more data reveals the point at
|
||||||
which the nodes' page cache on my test system runs over very clearly:
|
which the nodes' page cache on my test system runs over very clearly:
|
||||||
|
|
||||||
.. image:: tutorial-ior-overflowing-cache.png
|
.. image:: tutorial-ior-overflowing-cache.png
|
||||||
|
@ -142,9 +147,11 @@ written by node N-1.
|
||||||
Since page cache is not shared between compute nodes, shifting tasks this way
|
Since page cache is not shared between compute nodes, shifting tasks this way
|
||||||
ensures that each MPI process is reading data it did not write.
|
ensures that each MPI process is reading data it did not write.
|
||||||
|
|
||||||
IOR provides the -C option (reorderTasks) to do this, and it forces each MPI
|
IOR provides the ``-C`` option (``reorderTasks``) to do this, and it forces each MPI
|
||||||
process to read the data written by its neighboring node. Running IOR with
|
process to read the data written by its neighboring node. Running IOR with
|
||||||
this option gives much more credible read performance::
|
this option gives much more credible read performance:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
$ mpirun -n 64 ./ior -t 1m -b 16m -s 16 -F -C
|
$ mpirun -n 64 ./ior -t 1m -b 16m -s 16 -F -C
|
||||||
...
|
...
|
||||||
|
@ -166,8 +173,10 @@ pages we just wrote to flush out to Lustre. Including the time it takes for
|
||||||
fsync() to finish gives us a measure of how long it takes for our data to write
|
fsync() to finish gives us a measure of how long it takes for our data to write
|
||||||
to the page cache and for the page cache to write back to Lustre.
|
to the page cache and for the page cache to write back to Lustre.
|
||||||
|
|
||||||
IOR provides another convenient option, -e (fsync), to do just this. And, once
|
IOR provides another convenient option, ``-e`` (fsync), to do just this. And, once
|
||||||
again, using this option changes our performance measurement quite a bit::
|
again, using this option changes our performance measurement quite a bit:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
$ mpirun -n 64 ./ior -t 1m -b 16m -s 16 -F -C -e
|
$ mpirun -n 64 ./ior -t 1m -b 16m -s 16 -F -C -e
|
||||||
...
|
...
|
||||||
|
@ -192,15 +201,16 @@ the best choice. There are several ways in which we can get clever and defeat
|
||||||
page cache in a more general sense to get meaningful performance numbers.
|
page cache in a more general sense to get meaningful performance numbers.
|
||||||
|
|
||||||
When measuring write performance, bypassing page cache is actually quite simple;
|
When measuring write performance, bypassing page cache is actually quite simple;
|
||||||
opening a file with the O_DIRECT flag going directly to disk. In addition,
|
opening a file with the ``O_DIRECT`` flag going directly to disk. In addition,
|
||||||
the fsync() call can be inserted into applications, as is done with IOR's -e
|
the ``fsync()`` call can be inserted into applications, as is done with IOR's ``-e``
|
||||||
option.
|
option.
|
||||||
|
|
||||||
Measuring read performance is a lot trickier. If you are fortunate enough to
|
Measuring read performance is a lot trickier. If you are fortunate enough to
|
||||||
have root access on a test system, you can force the Linux kernel to empty out
|
have root access on a test system, you can force the Linux kernel to empty out
|
||||||
its page cache by doing
|
its page cache by doing
|
||||||
|
|
||||||
::
|
.. code-block:: shell
|
||||||
|
|
||||||
# echo 1 > /proc/sys/vm/drop_caches
|
# echo 1 > /proc/sys/vm/drop_caches
|
||||||
|
|
||||||
and in fact, this is often good practice before running any benchmark
|
and in fact, this is often good practice before running any benchmark
|
||||||
|
@ -210,7 +220,9 @@ memory for its own use.
|
||||||
|
|
||||||
Unfortunately, many of us do not have root on our systems, so we have to get
|
Unfortunately, many of us do not have root on our systems, so we have to get
|
||||||
even more clever. As it turns out, there is a way to pass a hint to the kernel
|
even more clever. As it turns out, there is a way to pass a hint to the kernel
|
||||||
that a file is no longer needed in page cache::
|
that a file is no longer needed in page cache:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
#define _XOPEN_SOURCE 600
|
#define _XOPEN_SOURCE 600
|
||||||
#include <unistd.h>
|
#include <unistd.h>
|
||||||
|
@ -224,7 +236,7 @@ that a file is no longer needed in page cache::
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
The effect of passing POSIX_FADV_DONTNEED using posix_fadvise() is usually that
|
The effect of passing POSIX_FADV_DONTNEED using ``posix_fadvise()`` is usually that
|
||||||
all pages belonging to that file are evicted from page cache in Linux. However,
|
all pages belonging to that file are evicted from page cache in Linux. However,
|
||||||
this is just a hint --not a guarantee-- and the kernel evicts these pages
|
this is just a hint --not a guarantee-- and the kernel evicts these pages
|
||||||
asynchronously, so it may take a second or two for pages to actually leave page
|
asynchronously, so it may take a second or two for pages to actually leave page
|
||||||
|
|
Loading…
Reference in New Issue