Compare commits
2 Commits
developmen
...
ft/gcp-doc
Author | SHA1 | Date |
---|---|---|
Alexander Chan | b353640b8e | |
Alexander Chan | 7de1841a3e |
|
@ -11,7 +11,8 @@ This section of the documentation shows you how to set up our currently
|
||||||
supported public cloud backends:
|
supported public cloud backends:
|
||||||
|
|
||||||
- `Amazon S3 <#aws-s3-as-a-data-backend>`__ ;
|
- `Amazon S3 <#aws-s3-as-a-data-backend>`__ ;
|
||||||
- `Microsoft Azure <#microsoft-azure-as-a-data-backend>`__ .
|
- `Microsoft Azure <#microsoft-azure-as-a-data-backend>`__ ;
|
||||||
|
- `Google Cloud Storage <#google-cloud-storage-as-a-data-backend>`__ .
|
||||||
|
|
||||||
For each public cloud backend, you will have to edit your CloudServer
|
For each public cloud backend, you will have to edit your CloudServer
|
||||||
:code:`locationConfig.json` and do a few setup steps on the applicable public
|
:code:`locationConfig.json` and do a few setup steps on the applicable public
|
||||||
|
@ -362,6 +363,199 @@ For more informations, refer to our template `~/.s3cfg <./CLIENTS/#s3cmd>`__ .
|
||||||
Pre-existing objects in your MS Azure container can unfortunately not be
|
Pre-existing objects in your MS Azure container can unfortunately not be
|
||||||
accessed by CloudServer at this time.
|
accessed by CloudServer at this time.
|
||||||
|
|
||||||
|
Google Cloud Storage as a data backend
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
|
From the Google Cloud Console
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
From the Google Cloud Store, create a two buckets for this new location
|
||||||
|
constraint: one bucket where you will host your data and the other for
|
||||||
|
performing multipart upload.
|
||||||
|
|
||||||
|
You will also need to get one of your Interoperability Credentials and provide
|
||||||
|
it to CloudServer.
|
||||||
|
This can be found in the Google Cloud Storage "Settings" tab then under
|
||||||
|
"Interopability".
|
||||||
|
|
||||||
|
In this example, our buckets will be ``zenkobucket`` and ``zenkompubucket``.
|
||||||
|
|
||||||
|
From the CloudServer repository
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
locationConfig.json
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Edit this file to add a new location constraint. This location constraint will
|
||||||
|
constain the information for the Google Cloud Storage bucket to which you will
|
||||||
|
be writing your data whenever you create a CloudServer bucket in this location.
|
||||||
|
There are a few configurable options here:
|
||||||
|
|
||||||
|
- :code:`type` : set to :code:`gcp` to indicate this location constraint is
|
||||||
|
writing data to Google Cloud Storage;
|
||||||
|
- :code:`legacyAwsBehavior` : set to :code:`true` to indicate this region should
|
||||||
|
behave like AWS S3 :code:`us-east-1` region, set to :code:`false` to indicate
|
||||||
|
this region should behave like any other AWS S3 region;
|
||||||
|
- :code:`bucketName` : set to an *existing bucket* in your Google Cloud Storage
|
||||||
|
Account; this is the bucket in which your data will be stored for this
|
||||||
|
location constraint;
|
||||||
|
- :code:`mpuBucketName` : set to an *existing bucket* in your Google Cloud
|
||||||
|
Storage Account; this is the bucket in which parts for multipart uploads will
|
||||||
|
be stored for this location constraint;
|
||||||
|
- :code:`gcpEndpoint` : set to your bucket's endpoint, usually :code:`storage.googleapis.com`;
|
||||||
|
- :code:`bucketMatch` : set to :code:`true` if you want your object name to be same
|
||||||
|
in your local bucket and your Google Cloud Storage bucket; set to :code:`false`
|
||||||
|
if you want your object name to be of the form :code:`{{localBucketName}}/{{objectname}}`
|
||||||
|
in your Google Cloud Storage hosted bucket;
|
||||||
|
- :code:`credentialsProfile` and :code:`credentials` are two ways to provide
|
||||||
|
your Google Cloud Storage Interoperability credentials for that bucket,
|
||||||
|
*use only one of them* :
|
||||||
|
|
||||||
|
- :code:`credentialsProfile` : set to the profile name allowing you to access
|
||||||
|
your Google Cloud Storage bucket from your :code:`~/.aws/credentials` file;
|
||||||
|
- :code:`credentials` : set the two fields inside the object (:code:`accessKey`
|
||||||
|
and :code:`secretKey`) to their respective values from your Google Cloud Storage
|
||||||
|
Interoperability credentials.
|
||||||
|
|
||||||
|
.. code:: json
|
||||||
|
|
||||||
|
(...)
|
||||||
|
"gcp-test": {
|
||||||
|
"type": "gcp",
|
||||||
|
"legacyAwsBehavior": true,
|
||||||
|
"details": {
|
||||||
|
"awsEndpoint": "storage.googleapis.com",
|
||||||
|
"bucketName": "zenkobucket",
|
||||||
|
"mpuBucketName": "zenkompubucket",
|
||||||
|
"bucketMatch": true,
|
||||||
|
"credentialsProfile": "zenko"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
(...)
|
||||||
|
|
||||||
|
.. code:: json
|
||||||
|
|
||||||
|
(...)
|
||||||
|
"gcp-test": {
|
||||||
|
"type": "gcp",
|
||||||
|
"legacyAwsBehavior": true,
|
||||||
|
"details": {
|
||||||
|
"awsEndpoint": "storage.googleapis.com",
|
||||||
|
"bucketName": "zenkobucket",
|
||||||
|
"bucketMatch": true,
|
||||||
|
"mpuBucketName": "zenkompubucket",
|
||||||
|
"credentials": {
|
||||||
|
"accessKey": "WHDBFKILOSDDVF78NPMQ",
|
||||||
|
"secretKey": "87hdfGCvDS+YYzefKLnjjZEYstOIuIjs/2X72eET"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
(...)
|
||||||
|
|
||||||
|
.. WARNING::
|
||||||
|
If you set :code:`bucketMatch` to :code:`true`, we strongly advise that you
|
||||||
|
only have one local bucket per Google Cloud Storage location.
|
||||||
|
Without :code:`bucketMatch` set to :code:`false`, your object names in your
|
||||||
|
Google Cloud Storage bucket will not be prefixed with your Cloud Server
|
||||||
|
bucket name. This means that if you put an object :code:`foo` to your
|
||||||
|
CloudServer bucket :code:`zenko1` and you then put a different :code:`foo` to
|
||||||
|
your CloudServer bucket :code:`zenko2` and both :code:`zenko1` and
|
||||||
|
:code:`zenko2` point to the same Google Cloud Storage bucket, the second
|
||||||
|
:code:`foo` will overwrite the first :code:`foo`.
|
||||||
|
|
||||||
|
~/.aws/credentials
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
.. TIP::
|
||||||
|
If you explicitly set your :code:`accessKey` and :code:`secretKey` in the
|
||||||
|
:code:`credentials` object of your :code:`gcp` location in your
|
||||||
|
:code:`locationConfig.json` file, you may skip this section
|
||||||
|
|
||||||
|
Make sure your :code:`~/.aws/credentials` file has a profile matching the one
|
||||||
|
defined in your :code:`locationConfig.json`. Following our previous example, it
|
||||||
|
would look like:
|
||||||
|
|
||||||
|
|
||||||
|
.. code:: shell
|
||||||
|
|
||||||
|
[zenko]
|
||||||
|
aws_access_key_id=WHDBFKILOSDDVF78NPMQ
|
||||||
|
aws_secret_access_key=87hdfGCvDS+YYzefKLnjjZEYstOIuIjs/2X72eET
|
||||||
|
|
||||||
|
Start the server with the ability to write to Google Cloud Storage
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Inside the respository, once all the files have been edited, you shoul dbe able
|
||||||
|
to start the server and start writing data to Google Cloud Storage through
|
||||||
|
CloudServer.
|
||||||
|
|
||||||
|
.. code:: shell
|
||||||
|
|
||||||
|
# Start the server locally
|
||||||
|
$> S3DATA=multiple npm start
|
||||||
|
|
||||||
|
Run the server as a docker container with the ability to write to Google Cloud Storage
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. TIP::
|
||||||
|
If you set the :code:`credentials` object in you
|
||||||
|
:code:`locationConfig.json` file, you don't need to mount your
|
||||||
|
:code:`.aws/credentials` file
|
||||||
|
|
||||||
|
Mount all the files that have been edited to override defaults, and do a
|
||||||
|
standard Docker run; then you can start wiriting to Google Cloud Storage through
|
||||||
|
CloudServer.
|
||||||
|
|
||||||
|
.. code:: shell
|
||||||
|
|
||||||
|
# Start the server in a Docker container
|
||||||
|
$> sudo docker run -d --name CloudServer \
|
||||||
|
-v $(pwd)/data:/usr/src/app/localData \
|
||||||
|
-v $(pwd)/metadata:/usr/src/app/localMetadata \
|
||||||
|
-v $(pwd)/locationConfig.json:/usr/src/app/locationConfig.json \
|
||||||
|
-v $(pwd)/conf/authdata.json:/usr/src/app/conf/authdata.json \
|
||||||
|
-v ~/.aws/credentials:/root/.aws/credentials \
|
||||||
|
-e S3DATA=multiple -e ENDPOINT=http://localhost -p 8000:8000
|
||||||
|
-d scality/s3server
|
||||||
|
|
||||||
|
Testing: put an object to Google Cloud Storage using CloudServer
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In order to start testing pushing to AWS S3, you will need to create a local
|
||||||
|
bucket in the AWS S3 location constraint - this local bucket will only store the
|
||||||
|
metadata locally, while both the data and any user metadata (:code:`x-amz-meta`
|
||||||
|
headers sent with a PUT object, and tags) will be stored on AWS S3.
|
||||||
|
This example is based on all our previous steps.
|
||||||
|
|
||||||
|
.. code:: shell
|
||||||
|
|
||||||
|
# Create a local bucket storing data in AWS S3
|
||||||
|
$> s3cmd --host=127.0.0.1:8000 mb s3://zenkobucket --region=gcp-test
|
||||||
|
# Put an object to Google Cloud Storage, and store the metadata locally
|
||||||
|
$> s3cmd --host=127.0.0.1:8000 put /etc/hosts s3://zenkobucket/testput
|
||||||
|
upload: '/etc/hosts' -> 's3://zenkobucket/testput' [1 of 1]
|
||||||
|
330 of 330 100% in 0s 380.87 B/s done
|
||||||
|
# List locally to check you have the metadata
|
||||||
|
$> s3cmd --host=127.0.0.1:8000 ls s3://zenkobucket
|
||||||
|
2017-10-23 10:26 330 s3://zenkobucket/testput
|
||||||
|
|
||||||
|
Then, from the Google Cloud Console, if you go into your bucket, you should see
|
||||||
|
your newly uploaded object:
|
||||||
|
|
||||||
|
.. figure:: ../res/gcp-console-successful-put.png
|
||||||
|
:alt: Google Cloud Storage Console upload example
|
||||||
|
|
||||||
|
Troubleshooting
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Make sure your :code:`~/.s3cfg` file has credentials matching your local
|
||||||
|
CloudServer credentials defined in :code:`conf/authdata.json`. By default, the
|
||||||
|
access key is :code:`accessKey1` and the secret key is :code:`verySecretKey1`.
|
||||||
|
For more informations, refer to our template `~/.s3cfg <./CLIENTS/#s3cmd>`__ .
|
||||||
|
|
||||||
|
Pre-existing objects in your Google Cloud Storage hosted bucket can
|
||||||
|
unfortunately not be accessed by CloudServer at this time.
|
||||||
|
|
||||||
For any data backend
|
For any data backend
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,95 @@
|
||||||
|
## Google Cloud Storage Backend
|
||||||
|
|
||||||
|
### Overall Design
|
||||||
|
|
||||||
|
The Google Cloud Storage backend is implemented using the `aws-sdk` service
|
||||||
|
class for AWS compatible methods. The structure of these methods are
|
||||||
|
described in the `gcp-2017-11-01.api.json` file: request inputs, response
|
||||||
|
outputs, and required parameters. For non-compatible methods, helper methods are
|
||||||
|
implemented to perform the requests; these can be found under the `GcpApis`
|
||||||
|
directory.
|
||||||
|
|
||||||
|
The implement GCP Service is designed to work as close as possible to the AWS
|
||||||
|
service.
|
||||||
|
|
||||||
|
### Object Tagging
|
||||||
|
|
||||||
|
Google Cloud Storage does not have object-level tagging methods.
|
||||||
|
|
||||||
|
To be compatible with S3, object tags will be stored as metadata on
|
||||||
|
Google Cloud Storage.
|
||||||
|
|
||||||
|
### Multipart Upload
|
||||||
|
|
||||||
|
Google Cloud Storage does not have AWS S3 multipart upload methods, but there
|
||||||
|
are methods for merging multiple objects into a single composite object.
|
||||||
|
Utilizing these available methods, GCP is able to perform parallel uploads for
|
||||||
|
large uploads; however, due to limits set by Google Cloud Storage, the maximum
|
||||||
|
number of parts possible for a single upload is 1024 (AWS limit is 10000).
|
||||||
|
|
||||||
|
As Google Cloud Storage does not have methods for managing mutlipart uploads,
|
||||||
|
each part is uploaded as a single object in a Google Cloud Bucket.
|
||||||
|
Because of this, a secondary bucket for handling MPU parts is required for
|
||||||
|
a GCP multipart upload. The MPU bucket will serve to hide uploaded parts from
|
||||||
|
being listed as items of the main bucket as well as handling parts of multiple
|
||||||
|
in-progress mutlipart uploads.
|
||||||
|
|
||||||
|
<!--
|
||||||
|
<p style='font-size: 12'>
|
||||||
|
** The Google Cloud Storage method used for combining multipart objects into a
|
||||||
|
single object is the `compose` methods.<br/>
|
||||||
|
** <a>https://cloud.google.com/storage/docs/xml-api/put-object-compose</a>
|
||||||
|
</p>
|
||||||
|
-->
|
||||||
|
|
||||||
|
#### Multipart Upload Methods Design:
|
||||||
|
|
||||||
|
+ **inititateMultipartUpload**:
|
||||||
|
In `initiateMultipartUpload`, new multipart uploads will generate a prefix with
|
||||||
|
the scheme of `${objectKeyName}-${uploadI}` and each object related to an MPU
|
||||||
|
will be prefixed with it. This method will also create an `init` file that will
|
||||||
|
store the metadata related to an MPU for later assignment to the completed
|
||||||
|
object.
|
||||||
|
|
||||||
|
+ **uploadPart**:
|
||||||
|
`uploadPart` will prefix the upload with the MPU prefix then perform a
|
||||||
|
`putObject` request to Google Cloud Storage
|
||||||
|
|
||||||
|
+ **uploadPartCopy**:
|
||||||
|
`uploadPartCopy` will prefix the copy upload with the MPU prefix then perform a
|
||||||
|
`copyObject` request to Google Cloud Storage
|
||||||
|
|
||||||
|
+ **abortMultipartUpload**:
|
||||||
|
`abortMultipartUpload` will perform the action of removing all objects related
|
||||||
|
to a multipart upload from the MPU bucket. It does this by first making a
|
||||||
|
`listObjectVersions` request to GCP to list all parts with the
|
||||||
|
related MPU-prefix then performing a `deleteObject` request on each of the
|
||||||
|
objects received.
|
||||||
|
|
||||||
|
+ **completeMultipartUpload**:
|
||||||
|
`completeMultipartUpload` will perform the action of combining the given parts
|
||||||
|
to be create the single composite object. This method consists of multiple
|
||||||
|
steps, due to the limitations of the Google Cloud Storage `compose` method:
|
||||||
|
+ compose round 1: multiple compose calls to merge, at max, 32 objects into
|
||||||
|
a single subpart.
|
||||||
|
+ compose round 2: multiple compose calls to merge the subpart generated
|
||||||
|
in compose round 1 to create the final completed object
|
||||||
|
+ generate MPU ETag: generate the multipart etag that will be returned as
|
||||||
|
part of the completeMultipartUpload response
|
||||||
|
+ copy to main: retrieve the metadata stored in the `init` file created in
|
||||||
|
`initiateMultipartUpload` to be assigned the completed object and copy the
|
||||||
|
the completed object from the MPU bucket to the Main bucket
|
||||||
|
+ cleanUp: remove all objects related to a MPU
|
||||||
|
### Limitations
|
||||||
|
|
||||||
|
+ GCP multipart uploads are limited to 1024 parts
|
||||||
|
+ Each `compose` can merge up to 32 objects per request
|
||||||
|
+ As Google Cloud Storage doesn't have AWS style MPU methods, GCP MPU will
|
||||||
|
require a secondary bucket to perform multipart uploads
|
||||||
|
+ GCP doesn't not have object-level tagging methods; AWS style tags are stored
|
||||||
|
as metadata on Google Cloud Storage
|
||||||
|
|
||||||
|
More information can be found at:
|
||||||
|
+ https://cloud.google.com/storage/docs/xml-api/overview;
|
||||||
|
+ https://cloud.google.com/storage/quotas;
|
||||||
|
+ https://cloud.google.com/storage/docs/xml-api/put-object-compose;
|
Loading…
Reference in New Issue