This commit makes changes to the AIORI backends to add support for
abstacting statfs, mkdir, rmdir, stat, and access. These new
abstractions are used by a modified mdtest. Some changes:
- Require C99. Its 2017 and most compilers now support C11. The
benefits of using C99 include subobject naming (for aiori backend
structs), and fixed size integers (uint64_t). There is no reason to
use the non-standard long long type.
- Moved some of the aiori code into aiori.c so it can be used by both
mdtest and ior.
- Code cleanup of mdtest. This is mostly due to the usage of the IOR
backends rather than a mess of #if code.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
All ranks locally capture and accumulate Etags for the parts they are
writing. In the N:1 cases, these are ethen collected by rank 0, via
MPI_Gather. This is effectively an organization matching the "segmented"
layout. If data was written segmented, then rank0 assigns part-numbers to
with appropriate offsets to correspond to what would've been used by each
rank when writing a given etag. If data was written strided, then etags
must also be accessed in strided order, to build the XML that will be sent.
TBD: Once the total volume of etag data exceeds the size of memory at rank
0, we'll need to impose a more-sophisticated technique. One idea is to
thread the MPI comms differently from the libcurl comms, so that multiple
gathers can be staged incrementally, while sending a single stream of XML
data tot he servers. For example, the libcurl write-function could
interact with the MPI prog to allow the appearance of a single stream of
data.
These are variants on S3. S3 uses the "pure" S3 interface, e.g. using
Multi-Part-Upload. The "plus" variant enables EMC-extensions in the aws4c
library. This allows the N:N case to use "append", in the case where
"transfer_size" != "block_size" for IOR. In pure S3, the N:N case will
fail, because the EMC-extensions won't be enabled, and appending (which
attempts to use the EMC byte-range tricks to do this) will throw an error.
In the S3_EMC alg, N:1 uses EMCs other byte-range tricks to write different
parts of an N:1 file, and also uses append to write the parts of an N:N
file. Preliminary tests show these EMC extensions look to improve BW by
~20%.
I put all three algs in aiori-S3.c, because it seemed some code was getting
reused. Not sure if that's still going to make sense after the TBD, below.
TBD: Recently realized that the "pure' S3 shouldn't be trying to use
appends for anything. In the N:N case, it should just use MPU, within each
file. Then, there's no need for S3_plus. We just have S3, which does MPU
for all writes where transfer_size != block_size, and uses (standard)
byte-range reads for reading. Then S3_EMC uses "appends for N:N writes,
and byte-range writes for N:1 writes. This separates the code for the two
algs a little more, but we might still want them in the same file.
Testing on our EMC ViPR installation. Therefore, we also have available
some EMC extensions. For example, EMC supports a special "byte-range"
header-option ("Range: bytes=-1-") which allows appending to an object.
This is not needed for N:1 (where every write creates an independent part),
but is vital for N:N (where every write is considered an append, unless
"transfer-size" is the same as "block-size").
We also use a LANL-extended implementation of aws4c 0.5, which provides
some special features, and allows greater efficiency. That is included in
this commit as a tarball. Untar it somewhere else and build it, to produce
a library, which is linked with IOR. (configure with --with-S3).
TBD: EMC also supports a simpler alternative to Multi-Part Upload, which
appears to have several advantages. We'll add that in next, but wanted to
capture this as is, before I break it.