Commit Graph

328 Commits (master)

Author SHA1 Message Date
Vitaliy Filippov d4bc10542c Fix compatibility with liburing >= 2.1 where it only has __pad2[2] 2022-01-31 22:49:40 +03:00
Vitaliy Filippov 140309620a Free recv_buf in nbd_proxy 2022-01-31 20:37:58 +03:00
Vitaliy Filippov 0a610ee943 Destroy the client after completing CLI command 2022-01-31 18:27:04 +03:00
Vitaliy Filippov f3ce166064 Do not print nan% in df when a pool has no available OSDs 2022-01-31 18:23:57 +03:00
Vitaliy Filippov 717d303370 Handle get_sqe failures, don't die with "will fall out of sync" in epoll_manager
Problem is that in recent kernels io_uring may return completions BEFORE
clearing the submission queue. I.e. for example its capacity is 512, there
were 512 requests, one of them completed, so when the request completion is
processed the queue "should have" 1 free slot. But sometimes it doesn't because
io_uring doesn't always clear the submission queue before sending CQE :-/
2022-01-31 02:52:20 +03:00
Vitaliy Filippov d9857a5340 Check for SQEs, not for completions
Should finally fix Assertion `sqe != NULL' failed introduced after journaling
refactor in 0.6.11...
2022-01-31 02:19:10 +03:00
Vitaliy Filippov eb5d9153e8 Fix build under centos 7 2022-01-30 20:29:44 +03:00
Vitaliy Filippov 4047ca606f Add missing cancel_op(currently being read op) when stopping a client
Fixes client hangs possible after stopping & restarting an osd.
Hangs happened when a connection was closed in the middle of reading a READ
operation reply from the network. In this case the operation being read was
in read_op and the client didn't free it when closing the connection.

Test case for msgr_read.cpp:
- Partially read reply for a READ operation
- stop_client()
- Check that the READ operation returns EPIPE

The bug was actually introduced in 0.5.11.
2022-01-28 01:53:52 +03:00
Vitaliy Filippov 218e294e9c > 0, of course 2022-01-24 13:36:09 +03:00
Vitaliy Filippov c1929cabe0 Release 0.6.12
etcd connection stability, clang & elbrus support

- Fix build under CLang and Elbrus LCC compilers, making Vitastor compatible
  with Elbrus CPUs :)
- Completely fix the bug where OSDs didn't connect to peers and incorrectly marked
  PGs as incomplete
- Limit I/O depth for deletes the same way as for small writes. Makes OSD crashes
  with "Assertion failed: sqe != NULL" during image deletion go away
- Fix a very old, but rare, journaling bug (credits to https://github.com/mirrorll)
- Fix flushing of unclean journaled objects leading to OSDs sometimes hanging
  after failover in EC setups (bug was introduced in 0.6.7)
- Fix several problems that could prevent smooth operation of a Vitastor cluster
  under the condition of partial etcd failure:
  - OSDs could randomly fail due to too strict error handling
  - New clients and OSDs could be unable to start because of the lack of retries
  - CLI could fail some commands because of the lack of retries
  - Monitor could stop receiving state updates because of the lack of websocket pings
- Fix monitor being unable to rebalance PGs after a downscale of pool pg_size (3->2)
- Exit with failure when trying to nbd map or benchmark a non-existing image
- Use HTTP keep-alive for etcd connections
- Allow to configure etcd request timeouts and retries
- Allow to configure NBD timeout, max devices and partitions, and set default to
  up to 64 devices with up to 3 partitions each
2022-01-24 01:15:25 +03:00
Vitaliy Filippov cc6b24e03a Allow to configure NBD timeout, max devices and partitions
Also set default NBD devices/partitions to 64/3, Linux default is 16/16 which is way too low
2022-01-24 01:15:19 +03:00
Vitaliy Filippov 0757ba630a Do not happily NBD "map" non-existing images, do not try to benchmark them too 2022-01-23 23:03:42 +03:00
Vitaliy Filippov 2a0b881685 Respect max_write_iodepth for deletes 2022-01-23 22:05:23 +03:00
Vitaliy Filippov 8dc1ffb13b Try to connect with PG peers before deciding it's incomplete :)
I already attempted to fix it in 0.6.11, but it happened so that the fix was
only partial :)
2022-01-23 19:19:26 +03:00
Vitaliy Filippov ba63af49b4 Add etcd retries everywhere (they were missing in some places) 2022-01-23 17:21:48 +03:00
Vitaliy Filippov 31b9c683ee Fix flushing of unclean objects
This was preventing OSD failover when there were some unclean objects.
Bug was introduced in aa436027c8
2022-01-23 00:45:11 +03:00
Vitaliy Filippov 3abcac058f Check for double response_callback call more 2022-01-23 00:26:20 +03:00
Vitaliy Filippov e01c4db702 Add paranoic if()s to prevent accidental double free of etcd_watch_ws 2022-01-23 00:16:09 +03:00
Vitaliy Filippov a5cf06acd0 Remove etcd timeout and keepalive interval hardcode 2022-01-23 00:00:00 +03:00
Vitaliy Filippov 9c3653b1e1 Handle EINTR 2022-01-22 23:59:37 +03:00
Vitaliy Filippov 7920414bee Fix build under older gcc (debian buster) 2022-01-20 10:34:52 +03:00
Vitaliy Filippov 098e369a3b Fix rand initialization, add etcd connection/disconnection logging 2022-01-20 00:45:49 +03:00
Vitaliy Filippov a43ef525a2 Remove two last end()s from http_client (should have been removed in the keepalive patch) 2022-01-20 00:44:18 +03:00
Mikhail Koshel d798e0821e #1 fix deps 2022-01-18 13:30:53 +06:00
Vitaliy Filippov e591a3e9f7 Include sys/stat.h in messenger.cpp
No idea why, but it builds without it on x86 and does not build on e2k
2022-01-17 13:43:29 +03:00
Vitaliy Filippov 77cc18420a Fix leaks detected by clang scan-build (only 1 of 4 may be important though) 2022-01-16 00:11:59 +03:00
Vitaliy Filippov 7bdd92ca4f Fix build under clang and some warnings
Build problems fixed:
- void* pointer arithmetic which is a GNU extension (works as byte*)
- "variable size object may not be initialized" which is OK under GCC
- nullptr_t related error in json11 (it lacks 'operator <' in clang)

Warnings fixed:
- empty nested struct initializer { 0 } replaced by {}
- removed several unused lambda captures
2022-01-16 00:02:54 +03:00
Vitaliy Filippov 515a2e6e33 Only die when detecting a real race condition, not just a CAS failure 2022-01-05 17:05:25 +03:00
Vitaliy Filippov 9c6168bf17 Remove fill_parsed_response 2022-01-03 20:08:26 +03:00
Vitaliy Filippov 5473d5b4a2 Rework HTTP client to use keepalive, move getifaddr_list to addr_util 2022-01-03 14:52:01 +03:00
Vitaliy Filippov c3304bce27
Merge pull request #38 from mirrorll/master
journal check_available error
2021-12-31 12:45:16 +03:00
Vitaliy Filippov b9f5c2a823 Support zero-copy send in fio_sec_osd to allow testing it
Prelimilary results:
- CPU usage drops significantly. For example, in T1Q8 128K write test against
  stub_uring_osd with 10G network and Athlon X4 860k CPU it drops from 100% to 30%
- Latency becomes slightly worse. In T1Q1 4K write test in the same environment
  latency increases from 56 to 63 us.
- Small write throughput also becomes slightly worse. In T1Q128 4K write test
  against stub iops decreases from 138k to ~110k (unstable, fluctuates 100k..120k).
  Note that this is without io_uring, of course.
2021-12-27 02:12:44 +03:00
Vitaliy Filippov e9d2f79aa7 Support reading bitmaps in fio_sec_osd 2021-12-27 02:12:44 +03:00
Vitaliy Filippov 0785bdf8b3 Release 0.6.11
- Slightly reduce journaling write amplification (requires no_same_sector_overwrites=false)
- Fix listen_backlog (it was 0) because it could more than halve OSD socket send speed
- Support IPv6 OSD addresses
- Do not try to initialize client in simple-offsets
- Fix OSDs sometimes marking PGs incomplete instead of trying to connect with peers
- Allow to configure OSD placement in node_placement
- Allow to run with 4k sector size block devices. Natural, but it was forbidden
2021-12-26 21:11:24 +03:00
Vitaliy Filippov b57e44748b Send 4 byte bitmap in stub_uring_osd 2021-12-25 11:38:13 +03:00
Vitaliy Filippov 1bbe62f29c Fix uninitialized listen_backlog which was leading to REALLY SLOW send speeds!!! 2021-12-25 11:38:13 +03:00
lihai 3061c30132 journal check_available error 2021-12-21 09:39:58 +08:00
Vitaliy Filippov 20a4406acc Support IPv6 OSD addresses 2021-12-19 10:42:17 +03:00
Vitaliy Filippov f93491bc6c Implement journal write batching and slightly refactor journal writes
Slightly reduces WA. For example, in 4K T1Q128 replicated randwrite tests
WA is reduced from ~3.6 to ~3.1, in T1Q64 from ~3.8 to ~3.4.

Only effective without no_same_sector_overwrites.
2021-12-16 00:27:17 +03:00
Vitaliy Filippov 999bed8514 Fix opening regular files as blockstore 2021-12-15 02:08:58 +03:00
Vitaliy Filippov 3f33095fd7 Do not try to initialize client in simple-offsets 2021-12-15 02:07:27 +03:00
Vitaliy Filippov dd74c5ce1b Fix OSDs marking PGs incomplete instead of trying to connect with peers 2021-12-14 01:57:51 +03:00
Vitaliy Filippov c6d104ecd6 Print object version on fatal overwrite 2021-12-14 01:57:04 +03:00
Vitaliy Filippov e544aef7d0 Fix test rw_blocking 2021-12-12 23:24:50 +03:00
Vitaliy Filippov 616c18c786 Fix stub_uring_osd 2021-12-12 23:06:11 +03:00
Vitaliy Filippov 2c7556e536 Allow to run with 4k sector size. Natural, but it was forbidden 2021-12-11 22:03:16 +00:00
Vitaliy Filippov 2020608a39 Release 0.6.10
- Implement a storage plugin for Proxmox. Now you can use Vitastor with Proxmox!
- Implement `vitastor-cli df` (pool space usage statistics) command
- Add glob pattern support for `vitastor-cli ls`
- Fix several bugs in other CLI commands (resize, create --parent, modify --readonly)
- Use 512 byte logical block size in QEMU driver by default (and thus don't require to set it in QEMU options)
2021-12-10 21:40:12 +03:00
Vitaliy Filippov f54ff6ad5d Do not crash in simple-offsets when some options are empty, too 2021-12-10 12:27:25 +03:00
Vitaliy Filippov b376ef2ed9 Do not crash on empty matched_addrs 2021-12-10 11:40:59 +03:00
Vitaliy Filippov 5a234588b9 Do not die when invoked via `vita` symlink 2021-12-10 02:45:16 +03:00
Vitaliy Filippov 0ee5e0a7fe Implement vitastor-cli df command 2021-12-10 02:37:02 +03:00
Vitaliy Filippov 3482bb0860 Fix readonly/readwrite option parsing 2021-12-10 00:52:59 +03:00
Vitaliy Filippov 526995f486 Do not skip empty iops in listings 2021-12-10 00:52:59 +03:00
Vitaliy Filippov 8dfbd7943c Use logical block size = 512 bytes by default 2021-12-08 23:43:40 +03:00
Vitaliy Filippov 3a83a32cb7 Aaand now fix create --parent :D 2021-12-08 23:00:34 +03:00
Vitaliy Filippov 20d5ed799a Add glob pattern matching for ls 2021-12-08 23:00:34 +03:00
Vitaliy Filippov b262938bca Fix naggy "Failed to get RDMA device list: Unknown error -38" 2021-12-08 02:02:30 +03:00
Vitaliy Filippov c3c2e68cc1 Now fix resize command :D 2021-12-05 01:38:08 +03:00
Vitaliy Filippov aa1e21dd99 Release 0.6.9
New features:
- Build Vitastor driver as part of QEMU
- Implement renaming images in CLI (vitastor-cli modify --rename)
- Add vitastor-cli alloc-osd and simple-offsets commands and use them in make-osd,
  thus removing the dependency on etcdctl
- Make monitor remove stale deleted inode statistics from etcd automatically
- Implement OSD address selection from a subnet, thus removing the need to specify
  OSD addresses in startup scripts explicitly

Bug fixes:
- Fix client failover in case of etcd shutdown or crash (make client survive etcd failures)
- Stick to the last live etcd in OSD and mon to prevent random failures when one of etcds is down
- Fix incorrect copying of data from journal to the data device which could lead to data corruption
- Prefer local etcd IPs in OSD
- Remove the total PG count restriction in optimize_change which was sometimes leading
  to inability to redistribute PGs over OSDs
- Fix error response parsing on a failed pg state report
- Fix slow linear writes with RDMA by changing default buffer settings
- Fix possible 'TypeError' in openstack nova when using Vitastor cinder driver
- Fix bugs in vitastor-cli create, ls, rm, modify commands

Patch changes:
- Add a patch for libvirt 7.6
- Add patches for QEMU 6.0 and 6.1
- Fix config file path XML location parsing in libvirt patches
- Replace _ with - in QEMU options
- Fix possible 'TypeError' in openstack nova when using Vitastor cinder driver
- Fix possible crashes of QEMU block driver in case of incorrect options
2021-12-03 10:58:54 +03:00
Vitaliy Filippov 9fca01dc62 Add a forgotten return statement 2021-12-03 00:41:49 +03:00
Vitaliy Filippov 0bd3a94efd Use qdict_get_try_int because qdict_get_int may segfault on a missing key 2021-12-03 00:22:17 +03:00
Vitaliy Filippov 5fe3a40416 More fixes for QEMU 2.x :) 2021-12-02 02:25:50 +03:00
Vitaliy Filippov 15957b7d13 Update QEMU 4.2 patch and CentOS 7 QEMU 4.2 spec patch 2021-12-02 01:03:19 +03:00
Vitaliy Filippov 5859f913fc Fix client failover in case of etcd shutdown or crash 2021-12-01 00:33:02 +03:00
Vitaliy Filippov 92362027a8 Build vitastor driver as part of the QEMU package by default
Old behaviour can be restored with cmake var WITH_QEMU=true
2021-11-29 02:05:26 +03:00
Vitaliy Filippov c4aeeda143 Fix index removal in vitastor-cli rm 2021-11-29 02:00:05 +03:00
Vitaliy Filippov 24f0f8278a Fix modify --readwrite 2021-11-29 01:52:21 +03:00
Vitaliy Filippov 95496d0845 Implement renaming images in CLI (vitastor-cli modify --rename) 2021-11-28 22:38:57 +03:00
Vitaliy Filippov 94b1f09ef2 Create snapshots in the same pool by default 2021-11-28 21:50:42 +03:00
Vitaliy Filippov 7a0b5212fe Exit if unable to restart watches
FIXME: It's probably not OK for the client to exit in this case
2021-11-28 01:43:31 +03:00
Vitaliy Filippov ce5b6253ab Make OSDs stick to the last successful etcd address
Previously OSDs were selecting a new random etcd from the cluster
on every request so they were failing randomly when part of etcds was down
2021-11-27 23:48:56 +03:00
Vitaliy Filippov 8398ad0117 Fix #36 - Fix old version data sometimes overriding new version data
Reproduction case:
- v3 = (offset 4kb, length 16kb)
- v2 = (offset 24kb, length 16kb)
- v1 = (offset 16kb, length 16kb)
- At the third step it was inserting 16..24kb instead of 20..24kb
2021-11-27 01:17:45 +03:00
Vitaliy Filippov fea451b4db Prefer local etcd in OSD 2021-11-27 00:36:53 +03:00
Vitaliy Filippov 7b7f20fb89
Merge pull request #34 from mirrorll/master
report pg state failed
2021-11-25 10:26:42 +03:00
Vitaliy Filippov 300d507026 Fix capture of out in alloc_osd 2021-11-25 10:20:01 +03:00
harley 6886171289
report pg state failed
after report pg state failed parse response error
2021-11-25 09:34:34 +08:00
Vitaliy Filippov 43f8ea47a0 Ok, something is not allowed somewhere in C99 2021-11-24 11:28:10 +03:00
Vitaliy Filippov 6e0e172e15 Implement OSD address selection from a specified subnet 2021-11-23 21:59:26 +03:00
Vitaliy Filippov 879fe9b2b4 Add a patch for qemu 6.1 and replace _ with - in qemu options 2021-11-21 16:24:30 +03:00
Vitaliy Filippov 660c3f7b0d Change default RDMA settings to 128x 129K buffers
129K to leave extra space for the header

The problem with 8x 1M buffers is that the following happens with,
for example, 2 OSDs and 4M T1Q1 write:
- Server posts 8 receives
- Client posts 8 sends
- WRs are processed by the RDMA stack, but the OSD doesn't have the time
  to handle them and doesn't refill buffers
- Client posts 1 more send
- RNR retransmission happens and performance drops to zero

Overall it seems that RDMA support should be reworked to use real 'RDMA'
operations i.e. operations writing into remote memory. This has an
additional advantage of avoiding a copy at the receive side of the OSD.
2021-11-21 12:05:52 +03:00
Vitaliy Filippov f0ebfae3b8 Fix vitastor-cli alloc-osd, use vitastor-cli in make-osd.sh 2021-11-21 00:01:03 +03:00
Vitaliy Filippov eb7ad2c114 Fix empty size syntax, use C version of simple-offsets in tests 2021-11-20 23:51:26 +03:00
Vitaliy Filippov cd21ff0b6a Rewrite simple-offsets.js in C/C++ 2021-11-19 02:39:56 +03:00
Vitaliy Filippov d3903f039c Implement alloc-osd (allocate a new OSD number) command 2021-11-19 02:39:37 +03:00
Vitaliy Filippov c5029961ea Oops. Fix vitastor-cli ls 2021-11-16 12:39:41 +03:00
Vitaliy Filippov 920345f7b6 Release 0.6.8
- Build separate packages for OSD, monitor, client, C header, fio and QEMU drivers
  instead of one package which included everything
2021-11-15 00:49:21 +03:00
Vitaliy Filippov 75b47a6298 Generate pkg-config file 2021-11-15 00:49:21 +03:00
Vitaliy Filippov 7eabc364bf Release 0.6.7
- Implement CLI commands for listing, viewing I/O statistics, creating,
  snapshotting, cloning, resizing and modifying images. All these operations
  are covered by 3 commands: ls, create, modify
- Implement an important fix to prior OSD set tracking for PGs. The previous
  version had an issue which could lead to data loss due to an OSD with older
  copy of the data thinking it has the newest copy
- Fix I/O statistics aggregation in the monitor
- Several minor fixes for Cinder driver
- Fix QEMU driver to be compatible with QEMU 2.x > 2.0
- Fix stalls sometimes possible in configurations without immediate_commit due
  to insufficient amount of automatic internal fsync operations
- Add `vita` alias for `vitastor-cli`
2021-11-13 23:23:55 +03:00
Vitaliy Filippov a346f84c69 Allow to show only specific images in listing 2021-11-13 23:23:55 +03:00
Vitaliy Filippov 71a0c1a7b9 Fix list sorting 2021-11-13 23:23:55 +03:00
Vitaliy Filippov 110b39900b Rename the new "set" command to "modify" 2021-11-13 22:39:17 +03:00
Vitaliy Filippov 42479b4590 Fix vitastor-nbd list, add ls alias 2021-11-13 22:39:17 +03:00
Vitaliy Filippov 6e82044e84 Add `vita` symlink 2021-11-13 22:39:17 +03:00
Vitaliy Filippov 2cb3e84882 Implement CLI set (resize, change readonly status) command 2021-11-13 22:39:17 +03:00
Vitaliy Filippov aa436027c8 Report pg/history from OSD on every degraded activation
Required to prevent data loss due to activation of an OSD with older data
when PG OSD set change doesn't occur. I.e. fixes the simplest case:
- Run 2 OSDs with 1 PG
- Start writing into the PG
- Stop OSD 2
- Stop OSD 1
- Start OSD 2

After this change the PG will refuse to start after the last step.
2021-11-13 22:39:17 +03:00
Vitaliy Filippov 577a563b91 Allow to disable colored output 2021-11-11 01:41:58 +03:00
Vitaliy Filippov e4efa2c08a Improve vitastor-cli ls - show I/O statistics, allow to sort & limit output 2021-11-11 01:41:58 +03:00
Vitaliy Filippov d528cd77f1 Fix install_symlink 2021-11-09 16:42:29 +03:00
Vitaliy Filippov 4d43774cbb Use 5s etcd_report_interval by default 2021-11-09 01:27:12 +03:00
Vitaliy Filippov a1488f7217 Fix qemu_driver to build with QEMU 2.x (previously it was only correct for QEMU 2.0) 2021-11-08 23:07:31 +03:00