S3 for Vitastor

The moment has come - Vitastor S3 implementation based on Zenko CloudServer is released.

Highlights

Zenko CloudServer is implemented in node.js.
Object metadata is stored in MongoDB.
Modified Zenko CloudServer version is used for Vitastor. It is slightly different from the original, has an optimised build and unneeded dependencies are stripped off.
Object data is stored in Vitastor block volumes, but the volume metadata is stored in the same MongoDB, not in Vitastor etcd.
Objects are written to volumes sequentially one after another. The space is allocated with rounding to the sector size (4 KB), so each object takes at least 4 KB.
An important property of such storage scheme is that small objects aren't chunked into parts in Vitastor EC N+K pools and thus don't require reads from all N disks when downloading.
Deleted objects are marked as deleted, but the space is only actually freed during asynchronously executed "defragmentation" process. Defragmentation runs automatically in the background when a volume reaches configured amount of "garbage" (20% by default). Defragmentation copies actual objects to new volume(s) and then removes the old volume. Defragmentation can be configured in locationConfig.json.

Plans for future development

User account storage in the DB instead of a static file. Original Zenko uses a separate closed-source "Scality Vault" service for it, that's why we use a static file for now.
More detailed documentation.
Support for other (and faster) key-value DBMS for object metadata storage.
Other performance optimisations, for example, related to the used hash function - MD5 used for Amazon compatibility purposes is relatively slow.
Object Lifecycle support. There is a Lifecycle implementation for Zenko called Backbeat but it's not adapted for Vitastor yet.
Quota support. Original Zenko uses a separate "SCUBA" service for quotas, but it's also proprietary and not available publicly.

Installation

In a few words:

Install MongoDB, create a user for S3 metadata DB.
Create a Vitastor pool for S3 data.
Download and setup the Docker container vitalif/vitastor-zenko.

Setup MongoDB

You can setup MongoDB yourself, following the MongoDB manual.

Or you can follow the instructions below - it describes a simple example of MongoDB setup in Docker (through docker-compose) with 3 replicas.

On each host, create a file docker-compose.yml with the content listed below. Replace <YOUR_PASSWORD> with your future mongodb administrator password, and optionally replace 0.0.0.0 with localhost,<server_IP>. It's recommended to either use a private IP or setup TLS afterwards.

version: '3.1'

services:

  mongo:
    container_name: mongo
    image: mongo:7-jammy
    restart: always
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: <YOUR_PASSWORD>
    network_mode: host
    volumes:
      - ./keyfile:/opt/keyfile
      - ./mongo-data/db:/data/db
      - ./mongo-data/configdb:/data/configdb
    entrypoint: /bin/bash -c
    command: [ "chown mongodb /opt/keyfile && chmod 600 /opt/keyfile && . /usr/local/bin/docker-entrypoint.sh mongod --replSet rs0 --keyFile /opt/keyfile --bind_ip 0.0.0.0" ]

Generate a shared cluster key using openssl rand -base64 756 > ./keyfile and copy that keyfile to all hosts.
Start MongoDB on all hosts with docker compose up -d mongo.
Enter Mongo Shell with docker exec -it mongo mongosh -u root -p <YOUR_PASSWORD> localhost/admin and execute the following command (replace IP addresses 10.10.10.{1,2,3} with your host IPs):

rs.initiate({ _id: 'rs0', members: [ { _id: 1, host: '10.10.10.1:27017' }, { _id: 2, host: '10.10.10.2:27017' }, { _id: 3, host: '10.10.10.3:27017' } ] })

Stay in Mongo Shell and create a user for the future S3 database:

db.createUser({ user: 's3', pwd: '<YOUR_S3_PASSWORD>', roles: [ { role: 'readWrite', db: 's3' }, { role: 'dbAdmin', db: 's3' }, { role: 'readWrite', db: 'vitastor' }, { role: 'dbAdmin', db: 'vitastor' } ] })

Setup Vitastor

Create a pool in Vitastor for S3 object data, for example:

vitastor-cli create-pool --ec 2+1 -n 512 s3-data --used_for_app s3:standard

The --used_for_app options works as fool-proofing and prevents you from accidentally creating a regular block volume in the S3 pool and overwriting some S3 data. Also it hides inode space statistics from Vitastor etcd.

Retrieve the ID of your pool with vitastor-cli ls-pools s3-data --detail.

Setup Vitastor S3

Add the following lines to docker-compose.yml (instead of network_mode: host, you can use ports: [ "8000:8000", "8002:8002" ]):

  zenko:
    container_name: zenko
    image: vitalif/vitastor-zenko
    restart: always
    security_opt:
      - seccomp:unconfined
    ulimits:
      memlock: -1
    network_mode: host
    volumes:
      - /etc/vitastor:/etc/vitastor
      - /etc/vitastor/s3:/conf

Download Docker image: docker pull vitalif/vitastor-zenko

Extract configuration file examples from the Docker image:

docker run --rm -it -v /etc/vitastor:/etc/vitastor -v /etc/vitastor/s3:/conf vitalif/vitastor-zenko configure.sh

Edit configuration files in /etc/vitastor/s3/:
- config.json - common settings.
- authdata.json - user accounts and access keys.
- locationConfig.json - S3 storage class list with placement settings. Note: it actually contains storage classes (like STANDARD, COLD, etc) instead of "locations" (zones like us-east-1) as in the original Zenko CloudServer.
- Put your MongoDB connection data into config.json and locationConfig.json.
- Put your Vitastor pool ID into locationConfig.json.
- For now, the complete list of Vitastor backend settings is only available in the code.

Start Zenko

Start the S3 server with:

docker run --restart always --security-opt seccomp:unconfined --ulimit memlock=-1 --network=host \
    -v /etc/vitastor:/etc/vitastor -v /etc/vitastor/s3:/conf --name zenko vitalif/vitastor-zenko

If you use default settings, Zenko CloudServer starts on port 8000. The default access key is accessKey1 with a secret key of verySecretKey1.

Now you can access your S3 with, for example, s3cmd:

s3cmd --access_key=accessKey1 --secret_key=verySecretKey1 --host=http://localhost:8000 mb s3://testbucket

Or even mount it with GeeseFS:

AWS_ACCESS_KEY_ID=accessKey1 \
    AWS_SECRET_ACCESS_KEY=verySecretKey1 \
    geesefs --endpoint http://localhost:8000 testbucket mountdir

Author & License

Zenko CloudServer author is Scality, licensed under Apache License, version 2.0
Vitastor and Zenko Vitastor backend author is Vitaliy Filippov, licensed under VNPL-1.1 (a "network copyleft" license based on AGPL/SSPL, but worded in a better way)
Vitastor S3 repository: https://git.yourcmc.ru/vitalif/zenko-cloudserver-vitastor
Vitastor S3 backend code: https://git.yourcmc.ru/vitalif/zenko-arsenal/src/branch/master/lib/storage/data/vitastor/VitastorBackend.ts

7.7 KiB Raw Permalink Blame History