Commit Graph

972 Commits (73ae57898151a02afa408ae3d840b9fdae80a17d)

Author SHA1 Message Date
Vitaliy Filippov 73ae578981 Add osd_memlock option 2022-02-02 01:40:22 +03:00
Vitaliy Filippov 20ee4ed758 Update some parameter docs 2022-02-01 22:46:13 +03:00
Vitaliy Filippov 63de79d1b2 Change > to | to preserve newlines 2022-02-01 22:45:12 +03:00
Vitaliy Filippov f712967079 And one more sqe starvation fix 2022-02-01 02:50:16 +03:00
Vitaliy Filippov df0cd85352 Fix another part of the "async sqe clear" bug (followup to d9857a5340) 2022-02-01 01:14:56 +03:00
Vitaliy Filippov ebaf4d7a72 Fix compatibility with fio 3.28+ 2022-01-31 23:39:14 +03:00
Vitaliy Filippov d4bc10542c Fix compatibility with liburing >= 2.1 where it only has __pad2[2] 2022-01-31 22:49:40 +03:00
Vitaliy Filippov 140309620a Free recv_buf in nbd_proxy 2022-01-31 20:37:58 +03:00
Vitaliy Filippov 0a610ee943 Destroy the client after completing CLI command 2022-01-31 18:27:04 +03:00
Vitaliy Filippov f3ce166064 Do not print nan% in df when a pool has no available OSDs 2022-01-31 18:23:57 +03:00
Vitaliy Filippov 717d303370 Handle get_sqe failures, don't die with "will fall out of sync" in epoll_manager
Problem is that in recent kernels io_uring may return completions BEFORE
clearing the submission queue. I.e. for example its capacity is 512, there
were 512 requests, one of them completed, so when the request completion is
processed the queue "should have" 1 free slot. But sometimes it doesn't because
io_uring doesn't always clear the submission queue before sending CQE :-/
2022-01-31 02:52:20 +03:00
Vitaliy Filippov d9857a5340 Check for SQEs, not for completions
Should finally fix Assertion `sqe != NULL' failed introduced after journaling
refactor in 0.6.11...
2022-01-31 02:19:10 +03:00
Vitaliy Filippov eb5d9153e8 Fix build under centos 7 2022-01-30 20:29:44 +03:00
Vitaliy Filippov ae6d1ed1d5 Remove completed items 2022-01-30 20:20:06 +03:00
Vitaliy Filippov d123e58ea3 Fix yaml syntax - remove ` in default 2022-01-29 02:08:48 +03:00
Vitaliy Filippov d9869d8116 Add parameter documentation 2022-01-28 02:45:54 +03:00
Vitaliy Filippov 4047ca606f Add missing cancel_op(currently being read op) when stopping a client
Fixes client hangs possible after stopping & restarting an osd.
Hangs happened when a connection was closed in the middle of reading a READ
operation reply from the network. In this case the operation being read was
in read_op and the client didn't free it when closing the connection.

Test case for msgr_read.cpp:
- Partially read reply for a READ operation
- stop_client()
- Check that the READ operation returns EPIPE

The bug was actually introduced in 0.5.11.
2022-01-28 01:53:52 +03:00
Vitaliy Filippov 218e294e9c > 0, of course 2022-01-24 13:36:09 +03:00
Vitaliy Filippov c1929cabe0 Release 0.6.12
etcd connection stability, clang & elbrus support

- Fix build under CLang and Elbrus LCC compilers, making Vitastor compatible
  with Elbrus CPUs :)
- Completely fix the bug where OSDs didn't connect to peers and incorrectly marked
  PGs as incomplete
- Limit I/O depth for deletes the same way as for small writes. Makes OSD crashes
  with "Assertion failed: sqe != NULL" during image deletion go away
- Fix a very old, but rare, journaling bug (credits to https://github.com/mirrorll)
- Fix flushing of unclean journaled objects leading to OSDs sometimes hanging
  after failover in EC setups (bug was introduced in 0.6.7)
- Fix several problems that could prevent smooth operation of a Vitastor cluster
  under the condition of partial etcd failure:
  - OSDs could randomly fail due to too strict error handling
  - New clients and OSDs could be unable to start because of the lack of retries
  - CLI could fail some commands because of the lack of retries
  - Monitor could stop receiving state updates because of the lack of websocket pings
- Fix monitor being unable to rebalance PGs after a downscale of pool pg_size (3->2)
- Exit with failure when trying to nbd map or benchmark a non-existing image
- Use HTTP keep-alive for etcd connections
- Allow to configure etcd request timeouts and retries
- Allow to configure NBD timeout, max devices and partitions, and set default to
  up to 64 devices with up to 3 partitions each
2022-01-24 01:15:25 +03:00
Vitaliy Filippov cc6b24e03a Allow to configure NBD timeout, max devices and partitions
Also set default NBD devices/partitions to 64/3, Linux default is 16/16 which is way too low
2022-01-24 01:15:19 +03:00
Vitaliy Filippov 0757ba630a Do not happily NBD "map" non-existing images, do not try to benchmark them too 2022-01-23 23:03:42 +03:00
Vitaliy Filippov 2a0b881685 Respect max_write_iodepth for deletes 2022-01-23 22:05:23 +03:00
Vitaliy Filippov 9a15b843ff Do not set pg_real_size to 0 2022-01-23 20:15:04 +03:00
Vitaliy Filippov 8dc1ffb13b Try to connect with PG peers before deciding it's incomplete :)
I already attempted to fix it in 0.6.11, but it happened so that the fix was
only partial :)
2022-01-23 19:19:26 +03:00
Vitaliy Filippov ba63af49b4 Add etcd retries everywhere (they were missing in some places) 2022-01-23 17:21:48 +03:00
Vitaliy Filippov 31b9c683ee Fix flushing of unclean objects
This was preventing OSD failover when there were some unclean objects.
Bug was introduced in aa436027c8
2022-01-23 00:45:11 +03:00
Vitaliy Filippov 3abcac058f Check for double response_callback call more 2022-01-23 00:26:20 +03:00
Vitaliy Filippov e01c4db702 Add paranoic if()s to prevent accidental double free of etcd_watch_ws 2022-01-23 00:16:09 +03:00
Vitaliy Filippov a5cf06acd0 Remove etcd timeout and keepalive interval hardcode 2022-01-23 00:00:00 +03:00
Vitaliy Filippov 9c3653b1e1 Handle EINTR 2022-01-22 23:59:37 +03:00
Vitaliy Filippov 23e578b6a2 Fix common.sh 2022-01-21 01:51:25 +03:00
Vitaliy Filippov 7920414bee Fix build under older gcc (debian buster) 2022-01-20 10:34:52 +03:00
Vitaliy Filippov 098e369a3b Fix rand initialization, add etcd connection/disconnection logging 2022-01-20 00:45:49 +03:00
Vitaliy Filippov a43ef525a2 Remove two last end()s from http_client (should have been removed in the keepalive patch) 2022-01-20 00:44:18 +03:00
Vitaliy Filippov 8a6b07d8f7 Add a 2/5 etcd failure test 2022-01-20 00:43:22 +03:00
Vitaliy Filippov 2c930d55fb
Merge pull request #41 from promobit-bitblaze/1-small-fix
#1 fix deps
2022-01-18 11:19:08 +03:00
Mikhail Koshel d798e0821e #1 fix deps 2022-01-18 13:30:53 +06:00
Vitaliy Filippov e591a3e9f7 Include sys/stat.h in messenger.cpp
No idea why, but it builds without it on x86 and does not build on e2k
2022-01-17 13:43:29 +03:00
Vitaliy Filippov 77cc18420a Fix leaks detected by clang scan-build (only 1 of 4 may be important though) 2022-01-16 00:11:59 +03:00
Vitaliy Filippov 7bdd92ca4f Fix build under clang and some warnings
Build problems fixed:
- void* pointer arithmetic which is a GNU extension (works as byte*)
- "variable size object may not be initialized" which is OK under GCC
- nullptr_t related error in json11 (it lacks 'operator <' in clang)

Warnings fixed:
- empty nested struct initializer { 0 } replaced by {}
- removed several unused lambda captures
2022-01-16 00:02:54 +03:00
Vitaliy Filippov 8f64fc61e7 Ignore empty events in mon 2022-01-08 11:41:00 +03:00
Vitaliy Filippov 4a9f001d9e Make mon also ping etcd websockets regularly 2022-01-05 17:28:51 +03:00
Vitaliy Filippov 8c908316d9 Add a test with an OSD being added 2022-01-05 17:06:24 +03:00
Vitaliy Filippov 515a2e6e33 Only die when detecting a real race condition, not just a CAS failure 2022-01-05 17:05:25 +03:00
Vitaliy Filippov 68b6763ebe Add asserts for lp-optimizer tests, pass `ordered` from the monitor 2022-01-03 20:37:07 +03:00
Vitaliy Filippov 9c6168bf17 Remove fill_parsed_response 2022-01-03 20:08:26 +03:00
Vitaliy Filippov 08e467270a Fix pg_size changing from 3 to 2 2022-01-03 17:56:54 +03:00
Vitaliy Filippov 5473d5b4a2 Rework HTTP client to use keepalive, move getifaddr_list to addr_util 2022-01-03 14:52:01 +03:00
Vitaliy Filippov c3304bce27
Merge pull request #38 from mirrorll/master
journal check_available error
2021-12-31 12:45:16 +03:00
Vitaliy Filippov ec2852c598 Add minsize_1 test 2021-12-28 10:54:36 +03:00