inform aiori interface about RADOS backend
stubbed out aiori backend for rados
additions to get RADOS backend compiling/linking
first cut at rados create/open patha
make sure to return RADOS oid on open/create
implement rados xfer path for WRITE
refactor + implement getfilesize and close
remember to use read_op interface for stat
implement RADOS delete function
don't error in RADOS_Delete for now
implement RADOS set_version
handle open/create flags appropriately
cleanup RADOS error handling
implement read/readcheck/writecheck for RADOS
rados doesn't support directio
implement unsupported aiori ops for RADOS
implement RADOS access call
define rados types if no rados support
It shares the create/open/delete/set_version/get_file_size
functions with POSIX backend.
The mmap backend also supports fsync and fsyncPerWrite options,
and it will use msync() instead and fsync().
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Once a process hits the stonewall (timelimit), they all figure out the maximum pair read/written.
Each proces continues to read/write until the maximum number of pairs is reached, this simulates the wear out.
This makes it so that the buffers are only allocated once per test instead
of once per transfer. This also removes initial buffer set-up from the
timing window.
Added a new struct into ior.h IOR_io_buffers for the buffer, checkbuffer, and readcheckbuffer
so only one pointer needed to be passed to XferBuffersSetup(), XferBuffersFree(),
and WriteOrRead().
Changed the logic in XferBuffersSetup() and XferBuffersFree() to not be transfer
dependent. If a test includes a write check or read check the checkBuffer
and readcheckBuffer will be created once per test in TestIoSys(). The
argument now taken by both function has changed from the access type to
a pointer to IOR_param_t.
Changed WriteOrRead to take as an additional parameter
the IOR_io_buffers struct, since it was no longer creating those
buffers.
Changed how the -l option works. Now you choose the type of datapacket
-l i incompressible data packets
-l incompressible incompressible data packets
-l timestamp timestamped data packets
-l t timestamped data packets
-l offset offset data packets
-l o offset data packets
-G option now is either the seed for the incompressible random packets
or the timestamp, depending on the input to the -l option.
-G will no long timestamp packets on its own without the additon of -l timestamp or -l t
I kept shorter versions of the options for the sake of typing sanity.
These are variants on S3. S3 uses the "pure" S3 interface, e.g. using
Multi-Part-Upload. The "plus" variant enables EMC-extensions in the aws4c
library. This allows the N:N case to use "append", in the case where
"transfer_size" != "block_size" for IOR. In pure S3, the N:N case will
fail, because the EMC-extensions won't be enabled, and appending (which
attempts to use the EMC byte-range tricks to do this) will throw an error.
In the S3_EMC alg, N:1 uses EMCs other byte-range tricks to write different
parts of an N:1 file, and also uses append to write the parts of an N:N
file. Preliminary tests show these EMC extensions look to improve BW by
~20%.
I put all three algs in aiori-S3.c, because it seemed some code was getting
reused. Not sure if that's still going to make sense after the TBD, below.
TBD: Recently realized that the "pure' S3 shouldn't be trying to use
appends for anything. In the N:N case, it should just use MPU, within each
file. Then, there's no need for S3_plus. We just have S3, which does MPU
for all writes where transfer_size != block_size, and uses (standard)
byte-range reads for reading. Then S3_EMC uses "appends for N:N writes,
and byte-range writes for N:1 writes. This separates the code for the two
algs a little more, but we might still want them in the same file.
Testing on our EMC ViPR installation. Therefore, we also have available
some EMC extensions. For example, EMC supports a special "byte-range"
header-option ("Range: bytes=-1-") which allows appending to an object.
This is not needed for N:1 (where every write creates an independent part),
but is vital for N:N (where every write is considered an append, unless
"transfer-size" is the same as "block-size").
We also use a LANL-extended implementation of aws4c 0.5, which provides
some special features, and allows greater efficiency. That is included in
this commit as a tarball. Untar it somewhere else and build it, to produce
a library, which is linked with IOR. (configure with --with-S3).
TBD: EMC also supports a simpler alternative to Multi-Part Upload, which
appears to have several advantages. We'll add that in next, but wanted to
capture this as is, before I break it.
Along the way, added a bunch of diagnostic output in the HDFS calls, which
only shows up at verbosity >= 4. I'll probably remove this stuff before
merging with master. Also, there's an #ifdef'ed-out sleep() in place,
which I used to attach gdb to a running MPI task. I'll get rid of that
later, too.
Also, added another hdfs-related parameter to the IOR_param_t structure;
hdfs_user_name gets the value of the USER environment-variable as the
default HDFS user for connections. Does this cause portability problems?
This provides an HDFS back-end, allowing IOR to exercise a Hadoop
Distributed File-System, plus corresponding changes throughout, to
integrate the new module into the build. The commit compiles at LANL, but
hasn't been run yet. We're currently waiting for some configuration on
machines that will eventually provide HDFS. By default, configure ignores
the HDFS module. You have to explicitly add --with-hdfs.
GPFS supports a "gpfs_fcntl" method for hinting various things,
including "i'm about to write this block of data". Let's see if, for
the cost of a few system calls, we can wrangle the GPFS locking system
into allowing concurrent access with less overhead. (new IOR parameter
gpfsHintAccess)
Also, drop all locks on a file immediately after open/creation in the
shared file case, since we know all processes will touch unique regions
of the file. It may or may not be a good idea to release all file locks
after opening. Processes will then have to re-acquire locks already
held. (new IOR parameter gpfsReleaseToken)
Allows every task to allocate a specified amount of memory as
a rough simulation of a real application's memory usage.
Every page of the allocated memory is touch to defeat lazy
memory allocation.
Original patch by Michael Kluge <michael.kluge@tu-dresden.de>
Only print total summary after all tests run.
Put calculated results from each iteration of a test in a separate
IOR_results_t structure. Clean up the allocation and freeing code
for these caluclated bits, which allowing us to hang onto the results
until the end of all tests. That in turn allows us to perform one
big summary at the end of all of the tests.
Clean up the header files to only contain those things that
need to be shared between .c files.
Functions that are not shared are now declared static to
make their file scope explicit. Functions that ARE shared
are declared in appropriate headers.
I am not going to claim that I caugh everything, but at
least it is a good start.