Commit Graph

241 Commits (ec44cb09e73cdc811824ae39518908258b4f14d2)

Author SHA1 Message Date
Vitaliy Filippov 7eabc364bf Release 0.6.7
- Implement CLI commands for listing, viewing I/O statistics, creating,
  snapshotting, cloning, resizing and modifying images. All these operations
  are covered by 3 commands: ls, create, modify
- Implement an important fix to prior OSD set tracking for PGs. The previous
  version had an issue which could lead to data loss due to an OSD with older
  copy of the data thinking it has the newest copy
- Fix I/O statistics aggregation in the monitor
- Several minor fixes for Cinder driver
- Fix QEMU driver to be compatible with QEMU 2.x > 2.0
- Fix stalls sometimes possible in configurations without immediate_commit due
  to insufficient amount of automatic internal fsync operations
- Add `vita` alias for `vitastor-cli`
2021-11-13 23:23:55 +03:00
Vitaliy Filippov a346f84c69 Allow to show only specific images in listing 2021-11-13 23:23:55 +03:00
Vitaliy Filippov 71a0c1a7b9 Fix list sorting 2021-11-13 23:23:55 +03:00
Vitaliy Filippov 110b39900b Rename the new "set" command to "modify" 2021-11-13 22:39:17 +03:00
Vitaliy Filippov 42479b4590 Fix vitastor-nbd list, add ls alias 2021-11-13 22:39:17 +03:00
Vitaliy Filippov 6e82044e84 Add `vita` symlink 2021-11-13 22:39:17 +03:00
Vitaliy Filippov 2cb3e84882 Implement CLI set (resize, change readonly status) command 2021-11-13 22:39:17 +03:00
Vitaliy Filippov aa436027c8 Report pg/history from OSD on every degraded activation
Required to prevent data loss due to activation of an OSD with older data
when PG OSD set change doesn't occur. I.e. fixes the simplest case:
- Run 2 OSDs with 1 PG
- Start writing into the PG
- Stop OSD 2
- Stop OSD 1
- Start OSD 2

After this change the PG will refuse to start after the last step.
2021-11-13 22:39:17 +03:00
Vitaliy Filippov 577a563b91 Allow to disable colored output 2021-11-11 01:41:58 +03:00
Vitaliy Filippov e4efa2c08a Improve vitastor-cli ls - show I/O statistics, allow to sort & limit output 2021-11-11 01:41:58 +03:00
Vitaliy Filippov d528cd77f1 Fix install_symlink 2021-11-09 16:42:29 +03:00
Vitaliy Filippov 4d43774cbb Use 5s etcd_report_interval by default 2021-11-09 01:27:12 +03:00
Vitaliy Filippov a1488f7217 Fix qemu_driver to build with QEMU 2.x (previously it was only correct for QEMU 2.0) 2021-11-08 23:07:31 +03:00
Vitaliy Filippov 404e07d365 Implement image/snapshot/clone creation and listing by pool 2021-11-07 01:01:07 +03:00
Vitaliy Filippov b3dcee0d43 Also print "bare" inodes with missing config if they occupy space 2021-11-06 14:56:41 +03:00
Vitaliy Filippov 609bd4eb59 Remove naggy RDMA messages when log level is zero 2021-11-06 14:36:23 +03:00
Vitaliy Filippov 8e445ddc9a Begin to implement CLI: implement listing, add help, add create stub 2021-11-06 14:32:19 +03:00
Tân Lê e889ac4209
Fix building QEMU 3.1 2021-11-05 13:45:51 +07:00
Vitaliy Filippov cfe8de9b84 Autosync based on number of unstable ops to prevent journal stalls 2021-10-30 14:26:48 +03:00
Vitaliy Filippov fb2f7a0d3c Release 0.6.6
- New command-line tool: vitastor-cli
- Implement layer (snapshot/clone) merge and delete
- Remove 'bool' from the C header
- Fix a very rare flusher stall
- More diagnostics now printed for slow ops in the log
2021-10-19 02:26:37 +03:00
Vitaliy Filippov 38d85da19a Fix build for older gcc 2021-10-19 02:26:37 +03:00
Vitaliy Filippov 89dcda1fed Remove "bool" from the C header 2021-10-18 01:49:07 +03:00
Vitaliy Filippov 1526e2055e Do not crash with RDMA when receiving garbage, free RDMA buffers when connection is closed 2021-10-15 23:56:22 +03:00
Vitaliy Filippov 74cb3911db Rebase children of the "inverse" child when it is removed, change /index/image/%s keys during metadata ops 2021-09-26 13:41:48 +03:00
Vitaliy Filippov d5efbbb6b9 Rename commands and add CLI help 2021-09-26 13:14:36 +03:00
Vitaliy Filippov 4319091bd3 Implement "inverse merge" optimisation 2021-09-26 12:59:04 +03:00
Vitaliy Filippov 6d307d5391 Ignore "readonly" flag when merging snapshots 2021-09-26 11:32:42 +03:00
Vitaliy Filippov 065dfef683 Rename vitastor-cmd to vitastor-cli 2021-09-26 00:52:05 +03:00
Vitaliy Filippov 4d6b85fe67 Split one big cmd.cpp into multiple files 2021-09-26 00:48:08 +03:00
Vitaliy Filippov 2dd2f29f46 Move get_inode_cfg to cli_tool_t 2021-09-25 23:36:45 +03:00
Vitaliy Filippov fc3a1e076a Fix minor bugs in snapshot removal, check it in tests 2021-09-25 19:30:29 +03:00
Vitaliy Filippov 3a3e168c42 Implement high-level snapshot flatten and remove commands 2021-09-25 01:36:44 +03:00
Vitaliy Filippov 95c55da0ad Implement merge with CAS 2021-08-01 20:06:05 +03:00
Vitaliy Filippov 5cf1157f16 Return real version on CAS failure 2021-08-01 20:05:19 +03:00
Vitaliy Filippov acf637950c Implement layer merge
A new command merges multiple snapshot/clone layers into one of them,
so merged layers can be deleted after this procedure
2021-07-31 00:23:30 +03:00
Vitaliy Filippov a02b02eb04 Use new listing methods in rm_inode 2021-07-20 00:19:34 +03:00
Vitaliy Filippov 7d3d696110 Implement object listing with controllable parallelism in cluster_client 2021-07-20 00:19:34 +03:00
Vitaliy Filippov 712576ca75
Merge pull request #13 from lnsyyj/wip-vitastor-debug
fix BLOCKSTORE_DEBUG, error: ‘dirty_it’ was not declared in this scope
2021-07-18 01:25:05 +03:00
Vitaliy Filippov 28bd94d2c2 Make diagnostics slightly better 2021-07-18 01:24:38 +03:00
Vitaliy Filippov 148ff04aa8 Do not lose flusher queue entries when an "older object rescan" happens in parallel with flushing of an older version of another object 2021-07-18 01:20:54 +03:00
JiangYu e86df4a2a2 fix BLOCKSTORE_DEBUG, error: ‘dirty_it’ was not declared in this scope
Signed-off-by: JiangYu <lnsyyj@hotmail.com>
2021-07-18 00:46:05 +08:00
Vitaliy Filippov e74af9745e Print journal flusher diagnostics on slow ops 2021-07-17 16:13:41 +03:00
Vitaliy Filippov 0e0509e3da Dump op states in slow operation log 2021-07-16 01:58:50 +03:00
Vitaliy Filippov cb282d25e0 Release 0.6.5
- Basic support for OpenStack: Cinder driver, patches for Nova and libvirt
- Add missing "image" and "config_path" QEMU options
- Calculate aggregate per-pool statistics in monitor
- Implement writes with Check-And-Set semantics
- Add a C wrapper library with public header
2021-07-10 11:01:21 +03:00
Vitaliy Filippov b52dd6843a Rename qemu_rbd_unescape and qemu_rbd_next_tok to *_vitastor_* 2021-07-03 23:14:44 +03:00
Vitaliy Filippov b66160a7ad Aggregate per-pool statistics in mon 2021-07-03 23:14:44 +03:00
Vitaliy Filippov aad7792d3f Check for loops in parent inode chains 2021-06-20 00:23:03 +03:00
Vitaliy Filippov 6ca8afffe5 Add CAS version parameter to the C wrapper 2021-06-19 01:00:52 +03:00
Vitaliy Filippov 511a89948b Rework qemu_proxy into a C wrapper library with public header 2021-06-19 00:39:11 +03:00
Vitaliy Filippov 3de553ecd7 Add a test for CAS write operation 2021-06-15 00:12:35 +03:00
Vitaliy Filippov 891250d355 Implement CAS writes
From now on, reads will return the server-side object version numbers
and writes and deletes will have an additional "version" parameter
which, if set to a non-zero value, will be atomically compared with
the current version of the object plus 1 and the modification will
fail if it doesn't match.

This feature opens the road to correct online flattening of snapshot
layers and other interesting things.
2021-06-15 00:12:35 +03:00
Vitaliy Filippov f9fe72d40a Release 0.6.4
- Implement a basic Kubernetes CSI driver
- Minor fixes for vitastor-nbd
- Fix build without RDMA broken in 0.6.3
2021-05-16 01:38:01 +03:00
Vitaliy Filippov eaac1fc5d1 Log to stderr in etcd_state_client, too 2021-05-16 01:09:25 +03:00
Vitaliy Filippov 57be1923d3 Daemonize NBD_DO_IT process, correctly cleanup unmounted NBD clients 2021-05-16 01:09:25 +03:00
Vitaliy Filippov c467acc388 Fix /v3 appendage to etcd URLs without /v3 2021-05-15 19:22:24 +03:00
Vitaliy Filippov bf591ba3ee Fix nbd module load check 2021-05-15 19:22:24 +03:00
Vitaliy Filippov 699a0fbbc7 Log to stderr instead of stdout in client 2021-05-15 19:22:24 +03:00
Vitaliy Filippov 6b2dd50f27 Fix build without RDMA 2021-05-08 18:20:43 +03:00
Vitaliy Filippov caf2f3c56f Release 0.6.3
- RDMA support
- Client performance optimisations (4k randread ~120k -> ~180k on 1 core)
- JSON configuration file (/etc/vitastor/vitastor.conf) support
- Bug fixes
2021-05-02 17:47:43 +03:00
Vitaliy Filippov 9174f188b1 Build packages with libibverbs
For CentOS 7 it also requires newer rdma-core as CentOS 7's native version doesn't have
implicit ODP support. The updated version is already uploaded into the vitastor repo.
2021-05-02 17:47:16 +03:00
Vitaliy Filippov d3978c6d0e Do not print RDMA connection messages when log_level=0
By the way, it's 1 by default in the OSD, so these messages will still be there in OSD logs
2021-05-01 00:26:09 +03:00
Vitaliy Filippov 4a7365660d Do not wait for down OSDs during sync
Fixes a hang introduced in 0.5.11 in the non-immediate_commit mode
2021-05-01 00:26:07 +03:00
Vitaliy Filippov 818ae5d61d Some config parsing fixes 2021-05-01 00:20:01 +03:00
Vitaliy Filippov f6f35f4127 Pass options correctly to not override /etc/vitastor/vitastor.conf 2021-04-30 01:17:44 +03:00
Vitaliy Filippov 72aa2fd819 Make OSD and client read common configuration from /etc/vitastor/vitastor.conf 2021-04-30 01:11:27 +03:00
Vitaliy Filippov 5010b0dd75 Use json11 instead of blockstore_config_t 2021-04-30 00:52:46 +03:00
Vitaliy Filippov 483c5ab380 Negotiate max_msg instead of max_sge, make buffer settings more conservative :-) 2021-04-29 11:10:35 +03:00
Vitaliy Filippov 6a6fd6544d Add RDMA options to the QEMU driver 2021-04-29 11:02:49 +03:00
Vitaliy Filippov 971aa4ae4f Implement RDMA receive with memory copying (send remains zero-copy)
This is the simplest and, as usual, the best implementation :)

100% zero-copy implementation is also possible (see rdma-zerocopy branch),
but it requires to create A LOT of queues (~128 per client) to use QPN as a 'tag'
because of the lack of receive tags and the server may simply run out of queues.
Hardware limit is 262144 on Mellanox ConnectX-4 which amounts to only 2048
'connections' per host. And even with that amount of queues it's still less optimal
than the non-zerocopy one.

In fact, newest hardware like Mellanox ConnectX-5 does have Tag Matching
support, but it's still unsuitable for us because it doesn't support scatter/gather
(tm_caps.max_sge=1).
2021-04-29 02:34:45 +03:00
Vitaliy Filippov 9e6cbc6ebc Negotiate max_sge between RDMA client & server 2021-04-29 02:15:20 +03:00
Vitaliy Filippov ce777319c3 WIP RDMA support
Basic naive implementation works, but it's highly non-optimal as
RNR retransmissions occur all the time. RDMA expects the receiver
to always have place for incoming WRs...
2021-04-29 02:03:54 +03:00
Vitaliy Filippov f8ff39b0ab Rework continue_ops() to remove a CPU hot spot
This rework increases fio -rw=randread -iodepth=128 result from ~120k to ~180k iops :)
2021-04-29 01:50:13 +03:00
Vitaliy Filippov d749159585 Linked list experiment
Rework client operation queue from a vector to a linked list.
This is required to rework continue_ops() as its current implementation
consumes ~25% of client process CPU.
2021-04-29 01:47:33 +03:00
Vitaliy Filippov 9703773a63 Fix has_flushes setting 2021-04-28 23:40:44 +03:00
Vitaliy Filippov 5d8d486f7c Add SOVERSION 2021-04-20 01:01:32 +03:00
Vitaliy Filippov 2b546cdd55 Link vitastor_blk with vitastor_common for timerfd_manager_t
Not really required to operate, but fixes a verify-elf error
2021-04-20 00:51:53 +03:00
Vitaliy Filippov bd7b177707 Report sensitive configuration values instead of the configuration source 2021-04-17 23:11:16 +03:00
Vitaliy Filippov 82e6aff17b Support mapping NBD by the image name 2021-04-17 17:39:55 +03:00
Vitaliy Filippov 57e2c503f7 Rename osd_t::c_cli to msgr 2021-04-17 16:32:09 +03:00
Vitaliy Filippov 715bc8d53d Release 0.6.2
- Fix a possible crash during SYNC when journal fsyncs are enabled
- Fix a memory leak in the chained read implementation
2021-04-15 23:40:06 +03:00
Vitaliy Filippov 0af077701c Fix a possible crash during SYNC when journal fsyncs are enabled 2021-04-15 02:01:50 +03:00
Vitaliy Filippov cac976ce25 Fix a memory leak in the chained read implementation 2021-04-15 01:42:18 +03:00
Vitaliy Filippov acf0646542 Build common sources once 2021-04-15 01:13:34 +03:00
Vitaliy Filippov ede1c1d667 Release 0.6.1
A bugfix for the new "chained read from snapshot" feature
2021-04-14 22:32:23 +03:00
Vitaliy Filippov 38bd51c97f Remove aio_context assertion, it seems it is unneeded 2021-04-14 22:32:15 +03:00
Vitaliy Filippov 966fb763ca Oooops, fix chained reads 2021-04-13 16:19:21 +03:00
Vitaliy Filippov 0b41ffc08d Release 0.6.0
Warning: upgrading from 0.5.x is currently not supported!
Please create an issue if you really need upgrade capability.

New features:
- Snapshots and Copy-on-Write clones
- Inode (image) names
- Inode I/O and space statistics
- Write throttling for smoothing random write workloads in SSD+HDD configurations
2021-04-11 00:49:18 +03:00
Vitaliy Filippov 64eeb79051 Prevent 0.6.x OSDs from talking to 0.5.x
The new protocol is almost compatible - it has bitmaps, but also it has
a "bitmap_length" field. It's not hard to make 0.5-0.6 OSDs and clients
compatible, but for now I just assume nobody needs it.

If I'm wrong and anybody requests to upgrade their production 0.5.x system
to 0.6.x I'll fix it.
2021-04-10 22:26:17 +03:00
Vitaliy Filippov 2a02f3c4c7 Add metadata superblock and check it on start
Refuse to start if the superblock is missing or bad version;
zero out the metadata area when initializing superblock.
2021-04-10 22:26:17 +03:00
Vitaliy Filippov f684d9101a Refuse to start with old journal version 2021-04-10 17:44:12 +03:00
Vitaliy Filippov a1f2f19489 Do not increment inode statistics if the object already exists 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 82c1a7ec67 Fix statistics reporting, split inode number into pool & inode 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 2ab423d4ef Implement journaled write throttling for the SSD+HDD case 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 4694811eab Add microsecond accuracy to set_timer 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 6b988de17d Remove timerfd_interval 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 37efdc2a83 Fix bitmap_set for replicated pools 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 591cad09c9 Fix bitmaps for objects larger than 128K 2021-04-10 17:44:12 +03:00
Vitaliy Filippov b907ad50aa Oops, forgot to add external bitmaps to blockstore in some places 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 5f5b6ef150 Enable chained reads in the client 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 38a3df4a0e Implement chained (optimized) read in the primary OSD code 2021-04-10 17:44:12 +03:00