All ranks locally capture and accumulate Etags for the parts they are
writing. In the N:1 cases, these are ethen collected by rank 0, via
MPI_Gather. This is effectively an organization matching the "segmented"
layout. If data was written segmented, then rank0 assigns part-numbers to
with appropriate offsets to correspond to what would've been used by each
rank when writing a given etag. If data was written strided, then etags
must also be accessed in strided order, to build the XML that will be sent.
TBD: Once the total volume of etag data exceeds the size of memory at rank
0, we'll need to impose a more-sophisticated technique. One idea is to
thread the MPI comms differently from the libcurl comms, so that multiple
gathers can be staged incrementally, while sending a single stream of XML
data tot he servers. For example, the libcurl write-function could
interact with the MPI prog to allow the appearance of a single stream of
data.