ft: add gcp design doc

ft: add gcp instructions to public cloud doc
2018-04-03 22:55:08 -07:00 · 2018-04-03 21:23:54 -07:00
2 changed files with 291 additions and 2 deletions
--- a/docs/USING_PUBLIC_CLOUDS.rst
+++ b/docs/USING_PUBLIC_CLOUDS.rst
@ -11,7 +11,8 @@ This section of the documentation shows you how to set up our currently
 supported public cloud backends:

 - `Amazon S3 <#aws-s3-as-a-data-backend>`__ ;
- `Microsoft Azure <#microsoft-azure-as-a-data-backend>`__ .
+- `Microsoft Azure <#microsoft-azure-as-a-data-backend>`__ ;
+- `Google Cloud Storage <#google-cloud-storage-as-a-data-backend>`__ .

 For each public cloud backend, you will have to edit your CloudServer
 :code:`locationConfig.json` and do a few setup steps on the applicable public
@ -362,6 +363,199 @@ For more informations, refer to our template `~/.s3cfg <./CLIENTS/#s3cmd>`__ .
 Pre-existing objects in your MS Azure container can unfortunately not be
 accessed by CloudServer at this time.

+Google Cloud Storage as a data backend
+--------------------------------------
+
+From the Google Cloud Console
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+From the Google Cloud Store, create a two buckets for this new location
+constraint: one bucket where you will host your data and the other for
+performing multipart upload.
+
+You will also need to get one of your Interoperability Credentials and provide
+it to CloudServer.
+This can be found in the Google Cloud Storage "Settings" tab then under
+"Interopability".
+
+In this example, our buckets will be ``zenkobucket`` and ``zenkompubucket``.
+
+From the CloudServer repository
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+locationConfig.json
+^^^^^^^^^^^^^^^^^^^
+
+Edit this file to add a new location constraint. This location constraint will
+constain the information for the Google Cloud Storage bucket to which you will
+be writing your data whenever you create a CloudServer bucket in this location.
+There are a few configurable options here:
+
+- :code:`type` : set to :code:`gcp` to indicate this location constraint is
+  writing data to Google Cloud Storage;
+- :code:`legacyAwsBehavior` : set to :code:`true` to indicate this region should
+  behave like AWS S3 :code:`us-east-1` region, set to :code:`false` to indicate
+  this region should behave like any other AWS S3 region;
+- :code:`bucketName` : set to an *existing bucket* in your Google Cloud Storage
+  Account; this is the bucket in which your data will be stored for this
+  location constraint;
+- :code:`mpuBucketName` : set to an *existing bucket*  in your Google Cloud
+  Storage Account; this is the bucket in which parts for multipart uploads will
+  be stored for this location constraint;
+- :code:`gcpEndpoint` : set to your bucket's endpoint, usually :code:`storage.googleapis.com`;
+- :code:`bucketMatch` : set to :code:`true` if you want your object name to be same
+  in your local bucket and your Google Cloud Storage bucket; set to :code:`false`
+  if you want your object name to be of the form :code:`{{localBucketName}}/{{objectname}}`
+  in your Google Cloud Storage hosted bucket;
+- :code:`credentialsProfile` and :code:`credentials` are two ways to provide
+  your Google Cloud Storage Interoperability credentials for that bucket,
+  *use only one of them* :
+
+  - :code:`credentialsProfile` : set to the profile name allowing you to access
+    your Google Cloud Storage bucket from your :code:`~/.aws/credentials` file;
+  - :code:`credentials` : set the two fields inside the object (:code:`accessKey`
+    and :code:`secretKey`) to their respective values from your Google Cloud Storage
+    Interoperability credentials.
+
+.. code:: json
+
+    (...)
+    "gcp-test": {
+        "type": "gcp",
+        "legacyAwsBehavior": true,
+        "details": {
+            "awsEndpoint": "storage.googleapis.com",
+            "bucketName": "zenkobucket",
+            "mpuBucketName": "zenkompubucket",
+            "bucketMatch": true,
+            "credentialsProfile": "zenko"
+        }
+    },
+    (...)
+
+.. code:: json
+
+    (...)
+    "gcp-test": {
+        "type": "gcp",
+        "legacyAwsBehavior": true,
+        "details": {
+            "awsEndpoint": "storage.googleapis.com",
+            "bucketName": "zenkobucket",
+            "bucketMatch": true,
+            "mpuBucketName": "zenkompubucket",
+            "credentials": {
+                "accessKey": "WHDBFKILOSDDVF78NPMQ",
+                "secretKey": "87hdfGCvDS+YYzefKLnjjZEYstOIuIjs/2X72eET"
+            }
+        }
+    },
+    (...)
+
+.. WARNING::
+   If you set :code:`bucketMatch` to :code:`true`, we strongly advise that you
+   only have one local bucket per Google Cloud Storage location.
+   Without :code:`bucketMatch` set to :code:`false`, your object names in your
+   Google Cloud Storage bucket will not be prefixed with your Cloud Server
+   bucket name. This means that if you put an object :code:`foo` to your
+   CloudServer bucket :code:`zenko1` and you then put a different :code:`foo` to
+   your CloudServer bucket :code:`zenko2` and both :code:`zenko1` and
+   :code:`zenko2` point to the same Google Cloud Storage bucket, the second
+   :code:`foo` will overwrite the first :code:`foo`.
+
+~/.aws/credentials
+^^^^^^^^^^^^^^^^^^
+
+.. TIP::
+   If you explicitly set your :code:`accessKey` and :code:`secretKey` in the
+   :code:`credentials` object of your :code:`gcp` location in your
+   :code:`locationConfig.json` file, you may skip this section
+
+Make sure your :code:`~/.aws/credentials` file has a profile matching the one
+defined in your :code:`locationConfig.json`. Following our previous example, it
+would look like:
+
+
+.. code:: shell
+
+    [zenko]
+    aws_access_key_id=WHDBFKILOSDDVF78NPMQ
+    aws_secret_access_key=87hdfGCvDS+YYzefKLnjjZEYstOIuIjs/2X72eET
+
+Start the server with the ability to write to Google Cloud Storage
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Inside the respository, once all the files have been edited, you shoul dbe able
+to start the server and start writing data to Google Cloud Storage through
+CloudServer.
+
+.. code:: shell
+
+  # Start the server locally
+  $> S3DATA=multiple npm start
+
+Run the server as a docker container with the ability to write to Google Cloud Storage
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. TIP::
+  If you set the :code:`credentials` object in you
+  :code:`locationConfig.json` file, you don't need to mount your
+  :code:`.aws/credentials` file
+
+Mount all the files that have been edited to override defaults, and do a
+standard Docker run; then you can start wiriting to Google Cloud Storage through
+CloudServer.
+
+.. code:: shell
+
+   # Start the server in a Docker container
+   $> sudo docker run -d --name CloudServer \
+   -v $(pwd)/data:/usr/src/app/localData \
+   -v $(pwd)/metadata:/usr/src/app/localMetadata \
+   -v $(pwd)/locationConfig.json:/usr/src/app/locationConfig.json \
+   -v $(pwd)/conf/authdata.json:/usr/src/app/conf/authdata.json \
+   -v ~/.aws/credentials:/root/.aws/credentials \
+   -e S3DATA=multiple -e ENDPOINT=http://localhost -p 8000:8000
+   -d scality/s3server
+
+Testing: put an object to Google Cloud Storage using CloudServer
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In order to start testing pushing to AWS S3, you will need to create a local
+bucket in the AWS S3 location constraint - this local bucket will only store the
+metadata locally, while both the data and any user metadata (:code:`x-amz-meta`
+headers sent with a PUT object, and tags) will be stored on AWS S3.
+This example is based on all our previous steps.
+
+.. code:: shell
+
+   # Create a local bucket storing data in AWS S3
+   $> s3cmd --host=127.0.0.1:8000 mb s3://zenkobucket --region=gcp-test
+   # Put an object to Google Cloud Storage, and store the metadata locally
+   $> s3cmd --host=127.0.0.1:8000 put /etc/hosts s3://zenkobucket/testput
+    upload: '/etc/hosts' -> 's3://zenkobucket/testput'  [1 of 1]
+     330 of 330   100% in    0s   380.87 B/s  done
+   # List locally to check you have the metadata
+   $> s3cmd --host=127.0.0.1:8000 ls s3://zenkobucket
+    2017-10-23 10:26       330   s3://zenkobucket/testput
+
+Then, from the Google Cloud Console, if you go into your bucket, you should see
+your newly uploaded object:
+
+.. figure:: ../res/gcp-console-successful-put.png
+   :alt: Google Cloud Storage Console upload example
+
+Troubleshooting
+~~~~~~~~~~~~~~~
+
+Make sure your :code:`~/.s3cfg` file has credentials matching your local
+CloudServer credentials defined in :code:`conf/authdata.json`. By default, the
+access key is :code:`accessKey1` and the secret key is :code:`verySecretKey1`.
+For more informations, refer to our template `~/.s3cfg <./CLIENTS/#s3cmd>`__ .
+
+Pre-existing objects in your Google Cloud Storage hosted bucket can
+unfortunately not be accessed by CloudServer at this time.
+
 For any data backend
 --------------------

--- a/lib/data/external/GCP/DESIGN_GCP.md
+++ b/lib/data/external/GCP/DESIGN_GCP.md
@ -0,0 +1,95 @@
+## Google Cloud Storage Backend
+
+### Overall Design
+
+The Google Cloud Storage backend is implemented using the `aws-sdk` service
+class for AWS compatible methods. The structure of these methods are
+described in the `gcp-2017-11-01.api.json` file: request inputs, response
+outputs, and required parameters. For non-compatible methods, helper methods are
+implemented to perform the requests; these can be found under the `GcpApis`
+directory.
+
+The implement GCP Service is designed to work as close as possible to the AWS
+service.
+
+### Object Tagging
+
+Google Cloud Storage does not have object-level tagging methods.
+
+To be compatible with S3, object tags will be stored as metadata on
+Google Cloud Storage.
+
+### Multipart Upload
+
+Google Cloud Storage does not have AWS S3 multipart upload methods, but there
+are methods for merging multiple objects into a single composite object.
+Utilizing these available methods, GCP is able to perform parallel uploads for
+large uploads; however, due to limits set by Google Cloud Storage, the maximum
+number of parts possible for a single upload is 1024 (AWS limit is 10000).
+
+As Google Cloud Storage does not have methods for managing mutlipart uploads,
+each part is uploaded as a single object in a Google Cloud Bucket.
+Because of this, a secondary bucket for handling MPU parts is required for
+a GCP multipart upload. The MPU bucket will serve to hide uploaded parts from
+being listed as items of the main bucket as well as handling parts of multiple
+in-progress mutlipart uploads.
+
+<!-- 
+<p style='font-size: 12'>
+** The Google Cloud Storage method used for combining multipart objects into a
+single object is the `compose` methods.<br/>
+** <a>https://cloud.google.com/storage/docs/xml-api/put-object-compose</a>
+</p>
+ -->
+
+#### Multipart Upload Methods Design:
+
+ **inititateMultipartUpload**:  
+In `initiateMultipartUpload`, new multipart uploads will generate a prefix with
+the scheme of `${objectKeyName}-${uploadI}` and each object related to an MPU
+will be prefixed with it. This method will also create an `init` file that will
+store the metadata related to an MPU for later assignment to the completed
+object.
+
+ **uploadPart**:  
+`uploadPart` will prefix the upload with the MPU prefix then perform a
+`putObject` request to Google Cloud Storage
+
+ **uploadPartCopy**:  
+`uploadPartCopy` will prefix the copy upload with the MPU prefix then perform a
+`copyObject` request to Google Cloud Storage
+
+ **abortMultipartUpload**:  
+`abortMultipartUpload` will perform the action of removing all objects related
+to a multipart upload from the MPU bucket. It does this by first making a
+`listObjectVersions` request to GCP to list all parts with the
+related MPU-prefix then performing a `deleteObject` request on each of the
+objects received.
+
+ **completeMultipartUpload**:  
+`completeMultipartUpload` will perform the action of combining the given parts
+to be create the single composite object. This method consists of multiple
+steps, due to the limitations of the Google Cloud Storage `compose` method:
+    + compose round 1: multiple compose calls to merge, at max, 32 objects into
+    a single subpart.
+    + compose round 2: multiple compose calls to merge the subpart generated
+    in compose round 1 to create the final completed object
+    + generate MPU ETag: generate the multipart etag that will be returned as
+    part of the completeMultipartUpload response
+    + copy to main: retrieve the metadata stored in the `init` file created in
+    `initiateMultipartUpload` to be assigned the completed object and copy the
+    the completed object from the MPU bucket to the Main bucket
+    + cleanUp: remove all objects related to a MPU
+### Limitations
+
+ GCP multipart uploads are limited to 1024 parts
+ Each `compose` can merge up to 32 objects per request
+ As Google Cloud Storage doesn't have AWS style MPU methods, GCP MPU will
+require a secondary bucket to perform multipart uploads
+ GCP doesn't not have object-level tagging methods; AWS style tags are stored
+as metadata on Google Cloud Storage
+
+More information can be found at:
+ https://cloud.google.com/storage/docs/xml-api/overview;
+ https://cloud.google.com/storage/quotas;
+ https://cloud.google.com/storage/docs/xml-api/put-object-compose;
Author	SHA1	Message	Date
Alexander Chan	b353640b8e	ft: add gcp design doc	2018-04-03 22:55:08 -07:00
Alexander Chan	7de1841a3e	ft: add gcp instructions to public cloud doc	2018-04-03 21:23:54 -07:00