ft: add gcp design doc

ft: add gcp instructions to public cloud doc
2018-04-03 22:55:08 -07:00 · 2018-04-03 21:23:54 -07:00
2 changed files with 291 additions and 2 deletions
--- a/docs/USING_PUBLIC_CLOUDS.rst
+++ b/docs/USING_PUBLIC_CLOUDS.rst
@ -11,7 +11,8 @@ This section of the documentation shows you how to set up our currently
 supported public cloud backends:
 - `Amazon S3 <#aws-s3-as-a-data-backend>`__ ;
- `Microsoft Azure <#microsoft-azure-as-a-data-backend>`__ .
+- `Microsoft Azure <#microsoft-azure-as-a-data-backend>`__ ;
 - `Google Cloud Storage <#google-cloud-storage-as-a-data-backend>`__ .
 For each public cloud backend, you will have to edit your CloudServer
 :code:`locationConfig.json` and do a few setup steps on the applicable public
@ -362,6 +363,199 @@ For more informations, refer to our template `~/.s3cfg <./CLIENTS/#s3cmd>`__ .
 Pre-existing objects in your MS Azure container can unfortunately not be
 accessed by CloudServer at this time.
 Google Cloud Storage as a data backend
 --------------------------------------
 From the Google Cloud Console
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 From the Google Cloud Store, create a two buckets for this new location
 constraint: one bucket where you will host your data and the other for
 performing multipart upload.
 You will also need to get one of your Interoperability Credentials and provide
 it to CloudServer.
 This can be found in the Google Cloud Storage "Settings" tab then under
 "Interopability".
 In this example, our buckets will be ``zenkobucket`` and ``zenkompubucket``.
 From the CloudServer repository
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 locationConfig.json
 ^^^^^^^^^^^^^^^^^^^
 Edit this file to add a new location constraint. This location constraint will
 constain the information for the Google Cloud Storage bucket to which you will
 be writing your data whenever you create a CloudServer bucket in this location.
 There are a few configurable options here:
 - :code:`type` : set to :code:`gcp` to indicate this location constraint is
  writing data to Google Cloud Storage;
 - :code:`legacyAwsBehavior` : set to :code:`true` to indicate this region should
  behave like AWS S3 :code:`us-east-1` region, set to :code:`false` to indicate
  this region should behave like any other AWS S3 region;
 - :code:`bucketName` : set to an *existing bucket* in your Google Cloud Storage
  Account; this is the bucket in which your data will be stored for this
  location constraint;
 - :code:`mpuBucketName` : set to an *existing bucket*  in your Google Cloud
  Storage Account; this is the bucket in which parts for multipart uploads will
  be stored for this location constraint;
 - :code:`gcpEndpoint` : set to your bucket's endpoint, usually :code:`storage.googleapis.com`;
 - :code:`bucketMatch` : set to :code:`true` if you want your object name to be same
  in your local bucket and your Google Cloud Storage bucket; set to :code:`false`
  if you want your object name to be of the form :code:`{{localBucketName}}/{{objectname}}`
  in your Google Cloud Storage hosted bucket;
 - :code:`credentialsProfile` and :code:`credentials` are two ways to provide
  your Google Cloud Storage Interoperability credentials for that bucket,
  *use only one of them* :
  - :code:`credentialsProfile` : set to the profile name allowing you to access
    your Google Cloud Storage bucket from your :code:`~/.aws/credentials` file;
  - :code:`credentials` : set the two fields inside the object (:code:`accessKey`
    and :code:`secretKey`) to their respective values from your Google Cloud Storage
    Interoperability credentials.
 .. code:: json
    (...)
    "gcp-test": {
        "type": "gcp",
        "legacyAwsBehavior": true,
        "details": {
            "awsEndpoint": "storage.googleapis.com",
            "bucketName": "zenkobucket",
            "mpuBucketName": "zenkompubucket",
            "bucketMatch": true,
            "credentialsProfile": "zenko"
        }
    },
    (...)
 .. code:: json
    (...)
    "gcp-test": {
        "type": "gcp",
        "legacyAwsBehavior": true,
        "details": {
            "awsEndpoint": "storage.googleapis.com",
            "bucketName": "zenkobucket",
            "bucketMatch": true,
            "mpuBucketName": "zenkompubucket",
            "credentials": {
                "accessKey": "WHDBFKILOSDDVF78NPMQ",
                "secretKey": "87hdfGCvDS+YYzefKLnjjZEYstOIuIjs/2X72eET"
            }
        }
    },
    (...)
 .. WARNING::
   If you set :code:`bucketMatch` to :code:`true`, we strongly advise that you
   only have one local bucket per Google Cloud Storage location.
   Without :code:`bucketMatch` set to :code:`false`, your object names in your
   Google Cloud Storage bucket will not be prefixed with your Cloud Server
   bucket name. This means that if you put an object :code:`foo` to your
   CloudServer bucket :code:`zenko1` and you then put a different :code:`foo` to
   your CloudServer bucket :code:`zenko2` and both :code:`zenko1` and
   :code:`zenko2` point to the same Google Cloud Storage bucket, the second
   :code:`foo` will overwrite the first :code:`foo`.
 ~/.aws/credentials
 ^^^^^^^^^^^^^^^^^^
 .. TIP::
   If you explicitly set your :code:`accessKey` and :code:`secretKey` in the
   :code:`credentials` object of your :code:`gcp` location in your
   :code:`locationConfig.json` file, you may skip this section
 Make sure your :code:`~/.aws/credentials` file has a profile matching the one
 defined in your :code:`locationConfig.json`. Following our previous example, it
 would look like:
 .. code:: shell
    [zenko]
    aws_access_key_id=WHDBFKILOSDDVF78NPMQ
    aws_secret_access_key=87hdfGCvDS+YYzefKLnjjZEYstOIuIjs/2X72eET
 Start the server with the ability to write to Google Cloud Storage
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Inside the respository, once all the files have been edited, you shoul dbe able
 to start the server and start writing data to Google Cloud Storage through
 CloudServer.
 .. code:: shell
  # Start the server locally
  $> S3DATA=multiple npm start
 Run the server as a docker container with the ability to write to Google Cloud Storage
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. TIP::
  If you set the :code:`credentials` object in you
  :code:`locationConfig.json` file, you don't need to mount your
  :code:`.aws/credentials` file
 Mount all the files that have been edited to override defaults, and do a
 standard Docker run; then you can start wiriting to Google Cloud Storage through
 CloudServer.
 .. code:: shell
   # Start the server in a Docker container
   $> sudo docker run -d --name CloudServer \
   -v $(pwd)/data:/usr/src/app/localData \
   -v $(pwd)/metadata:/usr/src/app/localMetadata \
   -v $(pwd)/locationConfig.json:/usr/src/app/locationConfig.json \
   -v $(pwd)/conf/authdata.json:/usr/src/app/conf/authdata.json \
   -v ~/.aws/credentials:/root/.aws/credentials \
   -e S3DATA=multiple -e ENDPOINT=http://localhost -p 8000:8000
   -d scality/s3server
 Testing: put an object to Google Cloud Storage using CloudServer
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 In order to start testing pushing to AWS S3, you will need to create a local
 bucket in the AWS S3 location constraint - this local bucket will only store the
 metadata locally, while both the data and any user metadata (:code:`x-amz-meta`
 headers sent with a PUT object, and tags) will be stored on AWS S3.
 This example is based on all our previous steps.
 .. code:: shell
   # Create a local bucket storing data in AWS S3
   $> s3cmd --host=127.0.0.1:8000 mb s3://zenkobucket --region=gcp-test
   # Put an object to Google Cloud Storage, and store the metadata locally
   $> s3cmd --host=127.0.0.1:8000 put /etc/hosts s3://zenkobucket/testput
    upload: '/etc/hosts' -> 's3://zenkobucket/testput'  [1 of 1]
     330 of 330   100% in    0s   380.87 B/s  done
   # List locally to check you have the metadata
   $> s3cmd --host=127.0.0.1:8000 ls s3://zenkobucket
    2017-10-23 10:26       330   s3://zenkobucket/testput
 Then, from the Google Cloud Console, if you go into your bucket, you should see
 your newly uploaded object:
 .. figure:: ../res/gcp-console-successful-put.png
   :alt: Google Cloud Storage Console upload example
 Troubleshooting
 ~~~~~~~~~~~~~~~
 Make sure your :code:`~/.s3cfg` file has credentials matching your local
 CloudServer credentials defined in :code:`conf/authdata.json`. By default, the
 access key is :code:`accessKey1` and the secret key is :code:`verySecretKey1`.
 For more informations, refer to our template `~/.s3cfg <./CLIENTS/#s3cmd>`__ .
 Pre-existing objects in your Google Cloud Storage hosted bucket can
 unfortunately not be accessed by CloudServer at this time.
 For any data backend
 --------------------
--- a/lib/data/external/GCP/DESIGN_GCP.md
+++ b/lib/data/external/GCP/DESIGN_GCP.md
@ -0,0 +1,95 @@
 ## Google Cloud Storage Backend
 ### Overall Design
 The Google Cloud Storage backend is implemented using the `aws-sdk` service
 class for AWS compatible methods. The structure of these methods are
 described in the `gcp-2017-11-01.api.json` file: request inputs, response
 outputs, and required parameters. For non-compatible methods, helper methods are
 implemented to perform the requests; these can be found under the `GcpApis`
 directory.
 The implement GCP Service is designed to work as close as possible to the AWS
 service.
 ### Object Tagging
 Google Cloud Storage does not have object-level tagging methods.
 To be compatible with S3, object tags will be stored as metadata on
 Google Cloud Storage.
 ### Multipart Upload
 Google Cloud Storage does not have AWS S3 multipart upload methods, but there
 are methods for merging multiple objects into a single composite object.
 Utilizing these available methods, GCP is able to perform parallel uploads for
 large uploads; however, due to limits set by Google Cloud Storage, the maximum
 number of parts possible for a single upload is 1024 (AWS limit is 10000).
 As Google Cloud Storage does not have methods for managing mutlipart uploads,
 each part is uploaded as a single object in a Google Cloud Bucket.
 Because of this, a secondary bucket for handling MPU parts is required for
 a GCP multipart upload. The MPU bucket will serve to hide uploaded parts from
 being listed as items of the main bucket as well as handling parts of multiple
 in-progress mutlipart uploads.
 <!-- 
 <p style='font-size: 12'>
 ** The Google Cloud Storage method used for combining multipart objects into a
 single object is the `compose` methods.<br/>
 ** <a>https://cloud.google.com/storage/docs/xml-api/put-object-compose</a>
 </p>
 -->
 #### Multipart Upload Methods Design:
 + **inititateMultipartUpload**:  
 In `initiateMultipartUpload`, new multipart uploads will generate a prefix with
 the scheme of `${objectKeyName}-${uploadI}` and each object related to an MPU
 will be prefixed with it. This method will also create an `init` file that will
 store the metadata related to an MPU for later assignment to the completed
 object.
 + **uploadPart**:  
 `uploadPart` will prefix the upload with the MPU prefix then perform a
 `putObject` request to Google Cloud Storage
 + **uploadPartCopy**:  
 `uploadPartCopy` will prefix the copy upload with the MPU prefix then perform a
 `copyObject` request to Google Cloud Storage
 + **abortMultipartUpload**:  
 `abortMultipartUpload` will perform the action of removing all objects related
 to a multipart upload from the MPU bucket. It does this by first making a
 `listObjectVersions` request to GCP to list all parts with the
 related MPU-prefix then performing a `deleteObject` request on each of the
 objects received.
 + **completeMultipartUpload**:  
 `completeMultipartUpload` will perform the action of combining the given parts
 to be create the single composite object. This method consists of multiple
 steps, due to the limitations of the Google Cloud Storage `compose` method:
    + compose round 1: multiple compose calls to merge, at max, 32 objects into
    a single subpart.
    + compose round 2: multiple compose calls to merge the subpart generated
    in compose round 1 to create the final completed object
    + generate MPU ETag: generate the multipart etag that will be returned as
    part of the completeMultipartUpload response
    + copy to main: retrieve the metadata stored in the `init` file created in
    `initiateMultipartUpload` to be assigned the completed object and copy the
    the completed object from the MPU bucket to the Main bucket
    + cleanUp: remove all objects related to a MPU
 ### Limitations
 + GCP multipart uploads are limited to 1024 parts
 + Each `compose` can merge up to 32 objects per request
 + As Google Cloud Storage doesn't have AWS style MPU methods, GCP MPU will
 require a secondary bucket to perform multipart uploads
 + GCP doesn't not have object-level tagging methods; AWS style tags are stored
 as metadata on Google Cloud Storage
 More information can be found at:
 + https://cloud.google.com/storage/docs/xml-api/overview;
 + https://cloud.google.com/storage/quotas;
 + https://cloud.google.com/storage/docs/xml-api/put-object-compose;
Author	SHA1	Message	Date
Alexander Chan	b353640b8e	ft: add gcp design doc	2018-04-03 22:55:08 -07:00
Alexander Chan	7de1841a3e	ft: add gcp instructions to public cloud doc	2018-04-03 21:23:54 -07:00