Commit Graph

4 Commits (37738dab26cf5c28c95a9be881a43d9a27e39c9d)

Author SHA1 Message Date
Jeff Inman 37738dab26 Numerous changes to file-modes, small build-tweaks, and a tweak to aiori-S3.c
(Only rank-0 should create the bucket, if it doesn't already exist.)

Prepping this for a push (as an experimental S3 build) to github.
2015-05-19 09:36:28 -06:00
Jeffrey Thornton Inman 4368cc2dc4 Fixed striding/segmentation in N:N and N:1 cases, for multi-part-upload.
All ranks locally capture and accumulate Etags for the parts they are
writing.  In the N:1 cases, these are ethen collected by rank 0, via
MPI_Gather.  This is effectively an organization matching the "segmented"
layout.  If data was written segmented, then rank0 assigns part-numbers to
with appropriate offsets to correspond to what would've been used by each
rank when writing a given etag.  If data was written strided, then etags
must also be accessed in strided order, to build the XML that will be sent.

TBD: Once the total volume of etag data exceeds the size of memory at rank
0, we'll need to impose a more-sophisticated technique.  One idea is to
thread the MPI comms differently from the libcurl comms, so that multiple
gathers can be staged incrementally, while sending a single stream of XML
data tot he servers.  For example, the libcurl write-function could
interact with the MPI prog to allow the appearance of a single stream of
data.
2014-12-02 09:10:32 -07:00
Jeffrey Thornton Inman b26f308191 Algorithms 'S3', 'S3_plus', and 'S3_EMC' all available.
These are variants on S3.  S3 uses the "pure" S3 interface, e.g. using
Multi-Part-Upload.  The "plus" variant enables EMC-extensions in the aws4c
library.  This allows the N:N case to use "append", in the case where
"transfer_size" != "block_size" for IOR.  In pure S3, the N:N case will
fail, because the EMC-extensions won't be enabled, and appending (which
attempts to use the EMC byte-range tricks to do this) will throw an error.

In the S3_EMC alg, N:1 uses EMCs other byte-range tricks to write different
parts of an N:1 file, and also uses append to write the parts of an N:N
file.  Preliminary tests show these EMC extensions look to improve BW by
~20%.

I put all three algs in aiori-S3.c, because it seemed some code was getting
reused.  Not sure if that's still going to make sense after the TBD, below.

TBD: Recently realized that the "pure' S3 shouldn't be trying to use
appends for anything.  In the N:N case, it should just use MPU, within each
file.  Then, there's no need for S3_plus.  We just have S3, which does MPU
for all writes where transfer_size != block_size, and uses (standard)
byte-range reads for reading.  Then S3_EMC uses "appends for N:N writes,
and byte-range writes for N:1 writes.  This separates the code for the two
algs a little more, but we might still want them in the same file.
2014-10-29 16:04:30 -06:00
Jeffrey Thornton Inman 2f066624f0 S3 with Multi-Part Upload for N:1 is working.
Testing on our EMC ViPR installation.  Therefore, we also have available
some EMC extensions.  For example, EMC supports a special "byte-range"
header-option ("Range: bytes=-1-") which allows appending to an object.
This is not needed for N:1 (where every write creates an independent part),
but is vital for N:N (where every write is considered an append, unless
"transfer-size" is the same as "block-size").

We also use a LANL-extended implementation of aws4c 0.5, which provides
some special features, and allows greater efficiency.  That is included in
this commit as a tarball.  Untar it somewhere else and build it, to produce
a library, which is linked with IOR.  (configure with --with-S3).

TBD: EMC also supports a simpler alternative to Multi-Part Upload, which
appears to have several advantages.  We'll add that in next, but wanted to
capture this as is, before I break it.
2014-10-27 13:29:44 -06:00