vitastor

antilles

vitastor

Author	SHA1	Message	Date
Vitaliy Filippov	5d3317e4f2	Followup to 1.4.2 write stall fix - sadly, the previous version was not working correctly :)	2024-02-08 19:34:29 +03:00
Vitaliy Filippov	016115c0d4	Release 1.4.2 - Log to systemd by default - Fix excessive autosyncs after every operation with disabled immediate_commit (introduced in 1.1.0) - Fix a possible write stall with EC due to the lack of OSD wakeup after stabilizing previous writes - Change sync operation semantics as a final fix to possible write stalls with EC and disabled immediate_commit - Sync after deleting data in CLI rm / rm-data if immediate_commit is disabled - Fix OSDs ignoring syncs & autosyncs for delete operations - Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools) - Speed up monitor failover - change default etcd_mon_ttl from 30 to 5 seconds - Speed up operation retries - change default up_wait_retry_interval to 50 ms - Add patch for libvirt 9.10	2024-02-04 02:23:49 +03:00
Vitaliy Filippov	77c10fd1f8	In fact, do not autosync blockstore when autosync_writes=0	2024-02-03 20:37:36 +03:00
Vitaliy Filippov	581d02e581	Mark secondary OSDs with deletions as dirty to not forget to sync & autosync them	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	f03a9db4d9	Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	cb9c30bc31	Sync after sending all deletes to each PG in cli rm-data	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	a86a380d20	Fix invalid parsing of autosync_writes in blockstore leading to autosyncs after every operation with disabled immediate_commit :D	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	1cec62d25d	Sync only completed writes Should be a final remaining fix to EC + non-capacitor (non-immediate-commit) write hangs :). First it was breaking non-EC ("instantly stable") writes because they sometimes complete out of order which was leading to the following error: terminate called after throwing an instance of 'std::runtime_error' what(): BUG: Unexpected dirty_entry 1000000000001:29480000 v65540 unstable state during flush: 0x151 But it is easily fixed by scanning previous and next dirty_entries in mark_stable.	2024-01-27 15:17:22 +03:00
Vitaliy Filippov	1c322b33ed	Change default up_wait_retry_interval to 50 ms	2024-01-26 01:51:08 +03:00
Vitaliy Filippov	ba55f91409	Release 1.4.1 - Fix a monitor crash on primary OSD switching introduced in 1.4.0 - Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree - Fix a realloc memory leak in theory possible with too large listings (OSD_OP_LIST)	2024-01-18 02:31:42 +03:00
Vitaliy Filippov	d00d4dbac0	Initialize mod_revision field in etcd_state_client	2024-01-13 01:30:28 +03:00
Vitaliy Filippov	5d9d6f32a0	Fix common realloc memory leak mistakes found by cppcheck	2024-01-13 01:30:28 +03:00
Vitaliy Filippov	5280d1d561	Release 1.4.0 New features: - Intelligent recovery/rebalance speed auto-tuning to reduce its impact on clients (see README -> Features) - Auto-restoration of dead VDUSE daemons in CSI plugin - Add vitastor-disk update-sb command - Update QEMU for Debian Bookworm to 8.1 and use it for CSI plugin Bug fixes: - Fix pools SOMETIMES staying inactive after stopping a node due to OSDs not reacting to PG state changes caused by incorrect full reload of state from etcd on reconnection - Make monitors retry pool configuration changes quickier which fixes them being unable to apply changes when an ongoing rebalance is quickly making a lot of PGs clean - Fix CSI plugin not accepting array of strings as etcd address in /etc/vitastor/vitastor.conf - Allow multiple interfaces with the same IP address, for "simple routed" full mesh network - Do not ignore loopback addresses for OSD network (to make ECMP setups with frr possible) - Fix a rare client crash during OSD reconnections - Only treat data partitions as existing OSDs in vitastor-disk prepare - Remove etcd parameter from default command examples - Fix reported free space sometimes changing non-immediately after deletion of data from OSDs - Fix a possible OSD crash on print_slow when bs_op is NULL - Use the same etcd_ws_keepalive_interval in mon as in OSD - Fix mon not using values from config when /config/global is not present - Remove pve-storage-portal-dns-list format for vitastor_etcd_address - Parse log_level in cluster_client - Fix vitastor-nbd image existence check not working because of non-zeroed inode_watch fields - Do not warn on EPIPE in client unless log_level is raised explicitly - Fix incorrect error in CSI when searching for the device in /sys - Remove 2 last prints to stdout in etcd_state_client - Fix a possible OSD crash when checking corrupted journal entries	2024-01-12 01:28:33 +03:00
Vitaliy Filippov	2f228fa96a	Only treat data partitions as existing OSDs in vitastor-disk prepare	2023-12-31 11:46:47 +03:00
Vitaliy Filippov	a6ab54b1ba	Do not allow negative util_low/high	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	99ee8596ea	Rename min/max_util to util_low/high	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	c4928e6ecd	Protect from try_send completing the operation immediately Fixes a possible use-after-free in case of continue_ops() calling try_send(), then connect_peer() -> set_timer() -> trigger_nearest() -> handle_op_part() -> continue_ops() again	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	8b8c1179a7	Use a separate used_blocks counter for free space stats to hide possibly delayed on-flush deallocation	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	d5a6fa6dd7	Fix possible crash on print_slow when bs_op is NULL	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	1edf86ed26	Aggregate recovery delay using simple mean over last 10 observations (EWMA is shit)	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	5ca7cde612	Experiment/WIP: Try to track "secondary" recovery ops separately	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	751935ddd8	WIP Auto-tune recovery speed	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	d84dee7098	Track recovery op latencies + refactor into a structure	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	f72f14e6a7	Clear old PG states, history, and OSD states on etcd state reload Also add protection from etcd watcher messages being split into multiple websocket messages - I'm not sure if etcd actually does that, but it's better to have extra protection anyway. Also check that all etcd watchers are started in the keepalive routine, otherwise it sometimes tries to revive etcd watchers starting with revision=1 which obviously always fails because this revision is nearly always compacted. All these changes should fix an old rarely reproduced bug where SOMETIMES OSDs didn't react to PG config changes which was leading to offline pools on node reboot. It happened on the full reload of state from etcd.	2023-12-24 02:02:13 +03:00
Vitaliy Filippov	178bb0e701	Prevent re-entry into timerfd set_nearest	2023-12-22 02:32:40 +03:00
Vitaliy Filippov	7239cfb91a	Parse log_level in cluster_client	2023-12-20 02:21:23 +03:00
Vitaliy Filippov	7cea642f4a	Fix vitastor-nbd image existence check not working because of non-zeroed inode_watch fields	2023-12-19 01:11:37 +03:00
Vitaliy Filippov	dc615403d9	Do not warn on EPIPE in client unless log_level is raised explicitly	2023-12-17 13:42:26 +03:00
Vitaliy Filippov	1a704e06ab	Allow multiple interfaces with the same IP address, for "simple routed" full mesh network	2023-12-17 13:25:56 +03:00
Vitaliy Filippov	575475de71	Do not ignore loopback addresses for OSD network (to make ECMP setups with frr possible)	2023-12-17 11:55:13 +03:00
Vitaliy Filippov	aca2bef15f	Add vitastor-disk update-sb command	2023-12-14 01:11:42 +03:00
Vitaliy Filippov	691ebd991a	Move 2 last log printfs to stderr from stdout in etcd_state_client	2023-12-08 00:01:52 +03:00
Vitaliy Filippov	6d5df908a3	Fix possible out of bounds when checking invalid journal entries	2023-12-08 00:01:07 +03:00
Vitaliy Filippov	a1c7cc3d8d	Release 1.3.1 Hotfix to 1.3.0 - new "journal space reservation" had a bug which caused OSDs to crash with EC and without immediate_commit.	2023-12-04 18:35:09 +03:00
Vitaliy Filippov	a5e3dfbc5a	Oops, 1.3.0 needs a hotfix	2023-12-04 13:45:54 +03:00
Vitaliy Filippov	7972502eaf	Release 1.3.0 New features: - RDMA without ODP - much faster and all cards are now supported, not just Mellanox - VDUSE in CSI - faster, more stable and can even recover after CSI pod restart! - Reserve journal space for stabilize requests dynamically to prevent stalls under load with EC - Raise default NBD timeout from 30 to 300 seconds and allow to take it from /etc/vitastor/vitastor.conf - Remove explicit etcdUrl/etcdPrefix K8S storage class parameter support to prevent etcd migration issues for volumes created with these parameters - Support QEMU 8.1 and pve-qemu 8.1 Bug fixes: - Fix RDMA connection (and thus memory) leak - Fix rare crashes under load due to incorrect io_uring queue size tracking - Fix monitor statistics aggregation in case of empty /osd/stats keys - Fix crash on unknown long argument to vitastor-disk - Allow trailing comma in JSONs again - Fix crash on attempts to dump a long listing of objects "to stabilize" or "to rollback" in a slow op	2023-12-04 02:36:43 +03:00
Vitaliy Filippov	845454742d	Fix warning with QEMU 8.1	2023-12-04 01:59:07 +03:00
Vitaliy Filippov	628aa59574	Raise default NBD timeout from 30 to 300 seconds and allow to take it from /etc/vitastor/vitastor.conf	2023-12-02 14:11:14 +03:00
Vitaliy Filippov	19e2d9d6fa	Fix crash on unknown long argument to vitastor-disk	2023-12-01 00:55:51 +03:00
Vitaliy Filippov	b5c020ce0b	Use io_uring SQ size for ringloop capacity - otherwise get_sqe could return NULL when space_left() was > 0 under load Raise default io_uring size to 1024 for the same effective capacity as previously	2023-11-20 03:04:06 +03:00
Vitaliy Filippov	6b33ae973d	%d -> %lu	2023-11-20 03:02:26 +03:00
Vitaliy Filippov	cf36445359	Reserve journal space for stabilize requests dynamically to prevent stalls	2023-11-20 03:01:57 +03:00
Vitaliy Filippov	3fd873d263	Add -fno-omit-frame-pointer by default	2023-11-20 02:59:54 +03:00
Vitaliy Filippov	a00e8ae9ed	Fix mismatch journal pos format in vitastor-disk	2023-11-19 15:19:54 +03:00
Vitaliy Filippov	75674545dc	Limit the number of printed object versions in slow op dump (otherwise it may overflow the fixed buffer)	2023-11-13 01:10:28 +03:00
Vitaliy Filippov	225eb2fe3d	Support RDMA without ODP by stupidly copying memory. Disable ODP by default ODP is slower than regular RDMA even with memory copy overhead Example numbers: - 3950000 random read iops without ODP vs 240000 iops with ODP - 1447000 random write iops without ODP vs 101000 iops with ODP Reference: https://tkygtr6.github.io/pub/ISPASS21_slides.pdf	2023-11-12 15:03:47 +03:00
Vitaliy Filippov	7e82573ed0	Fix RDMA connection leak which was preventing stable functioning of RDMA :)	2023-11-11 23:40:47 +03:00
Vitaliy Filippov	5524dbdab7	Release 1.2.0 New features: - Implement CSI volume expansion - Implement CSI volume snapshots - CSI driver now requires Kubernetes >= 1.20 Bug fixes: - Important bug fix for EC: fix EC n+k, k>=2 read recovery in ISA-L version returning incorrect data when reading at least the second chunk out of multiple missing chunks without reading the first one. All users of EC n+k, k>=2 should upgrade as soon as possible, and upgrade should be conducted with downtime: first stop all clients (VMs/containers), then all OSDs, then upgrade and restart everything. - Fix unstable statistics aggregation in monitor (affecting vitastor-cli status and df) - Make udev not wait for OSDs to start during boot - Do not report negative numbers of offline PGs in vitastor-cli status when changing PG count - Report both old and new PG counts in vitastor-cli df when changing it - Fix OSDs sometimes not starting with "The code only supports journal versions 1 and 2, but it is 2 on disk" error after upgrading from pre-1.0 versions and letting OSDs run for some time - Fix monitors sometimes returning old PG count back after OSD configuration changes - Make monitor PG changes more stable and timeout errors less probable	2023-11-05 01:48:57 +03:00
Vitaliy Filippov	cd3dec06ac	Remove spaces from old->new PG count in df	2023-11-05 01:45:45 +03:00
Vitaliy Filippov	e15b6e7805	Fix "cannot be narrowed" in clang	2023-11-04 18:14:44 +03:00

1 2 3 4 5 ...

716 Commits (8e25a28a08e7265c9d30e3dec7077bc51bfc3de2)