Vitaliy Filippov
4319091bd3
Implement "inverse merge" optimisation
2021-09-26 12:59:04 +03:00
Vitaliy Filippov
6d307d5391
Ignore "readonly" flag when merging snapshots
2021-09-26 11:32:42 +03:00
Vitaliy Filippov
065dfef683
Rename vitastor-cmd to vitastor-cli
2021-09-26 00:52:05 +03:00
Vitaliy Filippov
4d6b85fe67
Split one big cmd.cpp into multiple files
2021-09-26 00:48:08 +03:00
Vitaliy Filippov
2dd2f29f46
Move get_inode_cfg to cli_tool_t
2021-09-25 23:36:45 +03:00
Vitaliy Filippov
fc3a1e076a
Fix minor bugs in snapshot removal, check it in tests
2021-09-25 19:30:29 +03:00
Vitaliy Filippov
3a3e168c42
Implement high-level snapshot flatten and remove commands
2021-09-25 01:36:44 +03:00
Vitaliy Filippov
95c55da0ad
Implement merge with CAS
2021-08-01 20:06:05 +03:00
Vitaliy Filippov
5cf1157f16
Return real version on CAS failure
2021-08-01 20:05:19 +03:00
Vitaliy Filippov
acf637950c
Implement layer merge
...
A new command merges multiple snapshot/clone layers into one of them,
so merged layers can be deleted after this procedure
2021-07-31 00:23:30 +03:00
Vitaliy Filippov
a02b02eb04
Use new listing methods in rm_inode
2021-07-20 00:19:34 +03:00
Vitaliy Filippov
7d3d696110
Implement object listing with controllable parallelism in cluster_client
2021-07-20 00:19:34 +03:00
Vitaliy Filippov
712576ca75
Merge pull request #13 from lnsyyj/wip-vitastor-debug
...
fix BLOCKSTORE_DEBUG, error: ‘dirty_it’ was not declared in this scope
2021-07-18 01:25:05 +03:00
Vitaliy Filippov
28bd94d2c2
Make diagnostics slightly better
2021-07-18 01:24:38 +03:00
Vitaliy Filippov
148ff04aa8
Do not lose flusher queue entries when an "older object rescan" happens in parallel with flushing of an older version of another object
2021-07-18 01:20:54 +03:00
JiangYu
e86df4a2a2
fix BLOCKSTORE_DEBUG, error: ‘dirty_it’ was not declared in this scope
...
Signed-off-by: JiangYu <lnsyyj@hotmail.com>
2021-07-18 00:46:05 +08:00
Vitaliy Filippov
e74af9745e
Print journal flusher diagnostics on slow ops
2021-07-17 16:13:41 +03:00
Vitaliy Filippov
0e0509e3da
Dump op states in slow operation log
2021-07-16 01:58:50 +03:00
Vitaliy Filippov
cb282d25e0
Release 0.6.5
...
- Basic support for OpenStack: Cinder driver, patches for Nova and libvirt
- Add missing "image" and "config_path" QEMU options
- Calculate aggregate per-pool statistics in monitor
- Implement writes with Check-And-Set semantics
- Add a C wrapper library with public header
2021-07-10 11:01:21 +03:00
Vitaliy Filippov
b52dd6843a
Rename qemu_rbd_unescape and qemu_rbd_next_tok to *_vitastor_*
2021-07-03 23:14:44 +03:00
Vitaliy Filippov
b66160a7ad
Aggregate per-pool statistics in mon
2021-07-03 23:14:44 +03:00
Vitaliy Filippov
aad7792d3f
Check for loops in parent inode chains
2021-06-20 00:23:03 +03:00
Vitaliy Filippov
6ca8afffe5
Add CAS version parameter to the C wrapper
2021-06-19 01:00:52 +03:00
Vitaliy Filippov
511a89948b
Rework qemu_proxy into a C wrapper library with public header
2021-06-19 00:39:11 +03:00
Vitaliy Filippov
3de553ecd7
Add a test for CAS write operation
2021-06-15 00:12:35 +03:00
Vitaliy Filippov
891250d355
Implement CAS writes
...
From now on, reads will return the server-side object version numbers
and writes and deletes will have an additional "version" parameter
which, if set to a non-zero value, will be atomically compared with
the current version of the object plus 1 and the modification will
fail if it doesn't match.
This feature opens the road to correct online flattening of snapshot
layers and other interesting things.
2021-06-15 00:12:35 +03:00
Vitaliy Filippov
f9fe72d40a
Release 0.6.4
...
- Implement a basic Kubernetes CSI driver
- Minor fixes for vitastor-nbd
- Fix build without RDMA broken in 0.6.3
2021-05-16 01:38:01 +03:00
Vitaliy Filippov
eaac1fc5d1
Log to stderr in etcd_state_client, too
2021-05-16 01:09:25 +03:00
Vitaliy Filippov
57be1923d3
Daemonize NBD_DO_IT process, correctly cleanup unmounted NBD clients
2021-05-16 01:09:25 +03:00
Vitaliy Filippov
c467acc388
Fix /v3 appendage to etcd URLs without /v3
2021-05-15 19:22:24 +03:00
Vitaliy Filippov
bf591ba3ee
Fix nbd module load check
2021-05-15 19:22:24 +03:00
Vitaliy Filippov
699a0fbbc7
Log to stderr instead of stdout in client
2021-05-15 19:22:24 +03:00
Vitaliy Filippov
6b2dd50f27
Fix build without RDMA
2021-05-08 18:20:43 +03:00
Vitaliy Filippov
caf2f3c56f
Release 0.6.3
...
- RDMA support
- Client performance optimisations (4k randread ~120k -> ~180k on 1 core)
- JSON configuration file (/etc/vitastor/vitastor.conf) support
- Bug fixes
2021-05-02 17:47:43 +03:00
Vitaliy Filippov
9174f188b1
Build packages with libibverbs
...
For CentOS 7 it also requires newer rdma-core as CentOS 7's native version doesn't have
implicit ODP support. The updated version is already uploaded into the vitastor repo.
2021-05-02 17:47:16 +03:00
Vitaliy Filippov
d3978c6d0e
Do not print RDMA connection messages when log_level=0
...
By the way, it's 1 by default in the OSD, so these messages will still be there in OSD logs
2021-05-01 00:26:09 +03:00
Vitaliy Filippov
4a7365660d
Do not wait for down OSDs during sync
...
Fixes a hang introduced in 0.5.11 in the non-immediate_commit mode
2021-05-01 00:26:07 +03:00
Vitaliy Filippov
818ae5d61d
Some config parsing fixes
2021-05-01 00:20:01 +03:00
Vitaliy Filippov
f6f35f4127
Pass options correctly to not override /etc/vitastor/vitastor.conf
2021-04-30 01:17:44 +03:00
Vitaliy Filippov
72aa2fd819
Make OSD and client read common configuration from /etc/vitastor/vitastor.conf
2021-04-30 01:11:27 +03:00
Vitaliy Filippov
5010b0dd75
Use json11 instead of blockstore_config_t
2021-04-30 00:52:46 +03:00
Vitaliy Filippov
483c5ab380
Negotiate max_msg instead of max_sge, make buffer settings more conservative :-)
2021-04-29 11:10:35 +03:00
Vitaliy Filippov
6a6fd6544d
Add RDMA options to the QEMU driver
2021-04-29 11:02:49 +03:00
Vitaliy Filippov
971aa4ae4f
Implement RDMA receive with memory copying (send remains zero-copy)
...
This is the simplest and, as usual, the best implementation :)
100% zero-copy implementation is also possible (see rdma-zerocopy branch),
but it requires to create A LOT of queues (~128 per client) to use QPN as a 'tag'
because of the lack of receive tags and the server may simply run out of queues.
Hardware limit is 262144 on Mellanox ConnectX-4 which amounts to only 2048
'connections' per host. And even with that amount of queues it's still less optimal
than the non-zerocopy one.
In fact, newest hardware like Mellanox ConnectX-5 does have Tag Matching
support, but it's still unsuitable for us because it doesn't support scatter/gather
(tm_caps.max_sge=1).
2021-04-29 02:34:45 +03:00
Vitaliy Filippov
9e6cbc6ebc
Negotiate max_sge between RDMA client & server
2021-04-29 02:15:20 +03:00
Vitaliy Filippov
ce777319c3
WIP RDMA support
...
Basic naive implementation works, but it's highly non-optimal as
RNR retransmissions occur all the time. RDMA expects the receiver
to always have place for incoming WRs...
2021-04-29 02:03:54 +03:00
Vitaliy Filippov
f8ff39b0ab
Rework continue_ops() to remove a CPU hot spot
...
This rework increases fio -rw=randread -iodepth=128 result from ~120k to ~180k iops :)
2021-04-29 01:50:13 +03:00
Vitaliy Filippov
d749159585
Linked list experiment
...
Rework client operation queue from a vector to a linked list.
This is required to rework continue_ops() as its current implementation
consumes ~25% of client process CPU.
2021-04-29 01:47:33 +03:00
Vitaliy Filippov
9703773a63
Fix has_flushes setting
2021-04-28 23:40:44 +03:00
Vitaliy Filippov
5d8d486f7c
Add SOVERSION
2021-04-20 01:01:32 +03:00
Vitaliy Filippov
2b546cdd55
Link vitastor_blk with vitastor_common for timerfd_manager_t
...
Not really required to operate, but fixes a verify-elf error
2021-04-20 00:51:53 +03:00
Vitaliy Filippov
bd7b177707
Report sensitive configuration values instead of the configuration source
2021-04-17 23:11:16 +03:00
Vitaliy Filippov
82e6aff17b
Support mapping NBD by the image name
2021-04-17 17:39:55 +03:00
Vitaliy Filippov
57e2c503f7
Rename osd_t::c_cli to msgr
2021-04-17 16:32:09 +03:00
Vitaliy Filippov
715bc8d53d
Release 0.6.2
...
- Fix a possible crash during SYNC when journal fsyncs are enabled
- Fix a memory leak in the chained read implementation
2021-04-15 23:40:06 +03:00
Vitaliy Filippov
0af077701c
Fix a possible crash during SYNC when journal fsyncs are enabled
2021-04-15 02:01:50 +03:00
Vitaliy Filippov
cac976ce25
Fix a memory leak in the chained read implementation
2021-04-15 01:42:18 +03:00
Vitaliy Filippov
acf0646542
Build common sources once
2021-04-15 01:13:34 +03:00
Vitaliy Filippov
ede1c1d667
Release 0.6.1
...
A bugfix for the new "chained read from snapshot" feature
2021-04-14 22:32:23 +03:00
Vitaliy Filippov
38bd51c97f
Remove aio_context assertion, it seems it is unneeded
2021-04-14 22:32:15 +03:00
Vitaliy Filippov
966fb763ca
Oooops, fix chained reads
2021-04-13 16:19:21 +03:00
Vitaliy Filippov
0b41ffc08d
Release 0.6.0
...
Warning: upgrading from 0.5.x is currently not supported!
Please create an issue if you really need upgrade capability.
New features:
- Snapshots and Copy-on-Write clones
- Inode (image) names
- Inode I/O and space statistics
- Write throttling for smoothing random write workloads in SSD+HDD configurations
2021-04-11 00:49:18 +03:00
Vitaliy Filippov
64eeb79051
Prevent 0.6.x OSDs from talking to 0.5.x
...
The new protocol is almost compatible - it has bitmaps, but also it has
a "bitmap_length" field. It's not hard to make 0.5-0.6 OSDs and clients
compatible, but for now I just assume nobody needs it.
If I'm wrong and anybody requests to upgrade their production 0.5.x system
to 0.6.x I'll fix it.
2021-04-10 22:26:17 +03:00
Vitaliy Filippov
2a02f3c4c7
Add metadata superblock and check it on start
...
Refuse to start if the superblock is missing or bad version;
zero out the metadata area when initializing superblock.
2021-04-10 22:26:17 +03:00
Vitaliy Filippov
f684d9101a
Refuse to start with old journal version
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
a1f2f19489
Do not increment inode statistics if the object already exists
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
82c1a7ec67
Fix statistics reporting, split inode number into pool & inode
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
2ab423d4ef
Implement journaled write throttling for the SSD+HDD case
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
4694811eab
Add microsecond accuracy to set_timer
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6b988de17d
Remove timerfd_interval
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
37efdc2a83
Fix bitmap_set for replicated pools
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
591cad09c9
Fix bitmaps for objects larger than 128K
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
b907ad50aa
Oops, forgot to add external bitmaps to blockstore in some places
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
5f5b6ef150
Enable chained reads in the client
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
38a3df4a0e
Implement chained (optimized) read in the primary OSD code
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6950b8e3a0
Watch inode metadata revisions
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
0cea3576fb
Add "read bitmaps" operation to secondary OSD protocol
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
f01eea07d3
Add simplified interface to read blockstore bitmaps synchronously
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
2c2f08aca2
Shorten some structure names
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
d6524670e1
Introduce data distribution locality
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
7aeb2cbac7
Capture all by value in qemu_proxy
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
2612d3198a
Introduce image names and metadata storage in etcd
...
Each inode has: image name, parent inode number & pool, size and readonly flag
Snapshots are created by switching image name to a different inode number
while using the older inode as parent.
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
ab39ce2bbb
Use clean_entry_bitmap_size instead of entry_attr_size back because of changed bitmap handling
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
d0c2e31312
Add a test for snapshots, fix bugs. Now the test passes
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
9038d42327
Fix several snapshot I/O bugs
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
691f066055
Actual snapshot support (untested)
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
ffe1cd4c79
Report inode I/O statistics, aggregate it in the monitor
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
4ae1b84c67
Report inode space usage statistics to etcd, aggregate it in the monitor
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
c35963967f
Add inode space usage statistics tracking to blockstore
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
0aa2dd2890
Send bitmaps with primary-reads, actually read bitmaps for READ ops
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6bf88883ac
Allocate bitmaps along with stripes to avoid memory fragmentation
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
004f265393
Remove cryptic bitmap inlining from bs_op_t and osd_op_t, use bitmap in primary OSD code
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
860ac24762
Add "external" bitmap support to the secondary OSD protocol
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6107a4d07b
Add "external" bitmap support to blockstore
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
95c29b9dc3
Add "external" bitmap support to osd_rmw
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6909807068
Allow to start the OSD just to flush the journal completely
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
18c72f4835
Correct reenterability fix (now verified with a test)
...
It's rather funny but 0.5.12 has to be re-published again
2021-04-09 12:10:16 +03:00
Vitaliy Filippov
40b7c21fb1
Followup to 307c1731c1
- fix mark_stable
2021-04-08 15:47:18 +03:00
Vitaliy Filippov
efb3678606
Fix qemu-img broken in 0.5.11
...
Caused by the lack of reenterability of the main cluster_client function
2021-04-08 14:59:20 +03:00
Vitaliy Filippov
8d87e32175
Fix msgr_op.h includes
2021-04-08 01:18:46 +03:00