If the callback does anything fishy that modifies the linked list,
libnfs may crash after returning. So doing any pending list removals
before invoking the callbacks is safer.
Enables callers to pass any opaque data chunk without having to cast
it explicitly.
A write never modifies the source buffer, and thus the pointer should
be const.
Signed-off-by: Max Kellermann <max.kellermann@gmail.com>
rpc_read_from_socket can currently only read one PDU in each rpc_service invocation even
if there is more data available on the socket. This patch reads all PDUs until the socket
would block.
Signed-off-by: Peter Lieven <pl@kamp.de>
The ioctl version breaks Qemu. I will post an updated once we found
a good solution in libiscsi and then adapt it to libnfs.
This reverts commit 003b3c7ce2.
rpc_read_from_socket can currently only read one PDU in each rpc_service invocation even
if there is more data available on the socket. This patch reads all PDUs available on
the socket when rpc_read_from_socket is entered.
Signed-off-by: Peter Lieven <pl@kamp.de>
we always read 4 bytes to get the PDU size and than realloc
these 4 bytes to the full size of the PDU. Avoid this by
using a static buf for the record marker.
Signed-off-by: Peter Lieven <pl@kamp.de>
There is no need to allocate and deallocate this structue every time
we update the udp destinateion.
For the client side, where we set the destination just once per lifetime
of the context it might not matter too much but once we add udp server support
we will need to update the sockaddr for every rpc we receive.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Add a flags field to rpc_pdu and add a flag that indicates that the PDU
should be discarded as soon as it has been written to the socket.
We do not put it on the waitpdu queue nor do we wait for a reply.
This will later be used for when we are sending replies back to a client
when operating in a server context.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
This allows us to use the NULL function for any arbitrary
program/version from rpc_connect_program() instead of the hardcoded support
for mount v3 and nfs v3
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
In zdr_array we can not use the check that num_elements * element_size
will fit inside the remaining bytes in the ZDR buffer.
The reason for this is that IF it is an array of unions, then
element-size will have the size of the largest arm in that union.
If the array consists of union items that are smaller than the largest arm,
then it becomes likely that this will pack in less than num_elements *
element_size and this it is possible that the array WILL fir in the remaining
bytes.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
- Reduce the number of memory allocations in the ZDR layer.
- Check both seconds and nanoseconds field when validating dir cache.
- Invalidate the dir cache immediately if we do something that would cause
it to become stale, such as adding/removing objects from the cache.
- Add options to enable/disable dir caching.
- Discard readahead cache on [p]write and truncate.
- Android fixes
- Windows fixes
- Support timeouts for sync functions
- Add an internal pagecache
- Add nfs_rewinddir(), nfs_seekdir() and nfs_telldir()
- Fix crash in nfs_truncate()
- Fix segfault that can trigger if we rpc_disconnect() during the mount.
- Add support to bind to a specific interface (linux only)
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
`nfs_set_interface` and `rpc_set_interface` APIs, or via the
NFS URL `if=<interface>` parameter. This feature requires
`root` permissions.
NOTE: This has only been compiled and tested on Ubuntu 14.04. It's
unlikely that it'll work on other platforms without modification,
particularly around the inclusion of <net/if.h> and IFNAMSIZ define
in `libnfs-private.h`.
This addresses a bug causing a segfault if we destroy the nfs context/
disconnect the session while the mount_8_cb callbacks for checking the
filehandle for nested mountpoints are still in flight.
Issue found and reported by doktorstick
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
the only call thar really needs a big encodebuf is WRITE. So give each
PDU its private encodebuf. This avoids the need to memcpy the data from
the static rpc->encodebuf to the pdu->outdata.data.
Signed-off-by: Peter Lieven <pl@kamp.de>
before this setting readahead would always modify the pagecache,
but it might be desireable to have a greater pagecache and only
a reasonable small readahead.
Signed-off-by: Peter Lieven <pl@kamp.de>
in commit b319b97 the check for count == 0 was introduced, but
it was accidently reverted in commit f681a2c if pdu->inpos < 4.
This patch fixes this issue resulting in deadlocks and removes
the somewhat redundant receive code.
Signed-off-by: Peter Lieven <pl@kamp.de>
this adds support for a simple read cache to avoid unnecassary request
to the NFS storage. libnfs by design cannot benefit from the kernel page
cache and suffers from performance penalties in some cases when compared
with a file accessed via kernel NFS.
This patch exposes 3 new API calls:
void nfs_set_pagecache(struct nfs_context *nfs, uint32_t v);
void nfs_set_pagecache_ttl(struct nfs_context *nfs, uint32_t v);
void nfs_pagecache_invalidate(struct nfs_context *nfs, struct nfsfh *nfsfh);
As well as the two new URL parameters pagecache and pagecache_ttl.
pagecache is defined in number of pages where a page is always NFS_BLKSIZE (4kB).
pagecache_ttl takes the page timeout in seconds where 0 means infinite.
Signed-off-by: Peter Lieven <pl@kamp.de>
If we are decoding a zdr string and the called did not provide
a pointer/buffer for us. We can just return the string to the rx buffer
immediately and avoid calling libnfs_zdr_opaque().
This avoids wasting cpu cycles on running a memcpy where src and dst buffers
are the same.
Add support to timeout sync functions.
Add a field to the rpc context to specify the timeout for functions.
Currently only sync functions support a timeout.
Link and Rename are special since they will process two (often) different
directories. We need to drop both directories from the cache and also do so
BEFORE we clear/steal the data->fh.data.data_val pointer.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Add calls to explicitely drop the directory cache everytime we do things
that might make the data stale.
Such as adding / removing objects to a directory, changing metadata for
objects etc.
Instead of just dropping the cache, we could use the wcc data.
IF wcc->before is the same timestamp as what we have in the cache
then we can just perform the same mutate on what we have in the cache
to reflect the new state and bump the timestamp of the cache to be wcc->after.
That would be a lot of work though and I am not convinced it is worth it.
After all, the cache is just an optimization to make directory operation
faster.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Compare BOTH the seconds field and the nanoseconds field when checking if
the cached directory structure is valid or not.
Linux knfsd and other modern servers actually do set the nanosecond field
so why not check it.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
We can save one malloc by storing both the rpc_pdu and decoding buffer
in the same memory block.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
The biggest changes in this release are:
- Fix a leak where we leaked one rdpe_cb_data structure on each open_dir()
- Make building the utils optional
- Android: the correct define is __ANDROID__ not ANDROID
- Win32: Use _U_ instead of ATTRIBURE((unused))
- Win32: Fix nfs_stat declaration for Win32
- Various fixes for mingw builds
- Make rpc->connect_cb a one shot callback and improve documentation
- Remove the FUSE module. It now lives in its own repo
- Fix POLLERR/POLLHUP handling to properly handle session failures and to
try to auto-reconnect
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
POLLERR and POLLHUP handling in rpc_service() could not deal with
session failures or auto reconnect.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
It makes no sense to have socket.c keep invoking this callback over and over.
Just change it to become one-shot.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
The linux kernel does not check the UDP checksum until the application tries
to read if from the socket.
This means that the socket might be readable, but when we try to read
the data, or inspect how much data is available, the packets will be discarded
by the kernel.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
If we are trying to read (part of?) the RM, we can not assume that as long
as recv() returned non-error that we have the full RM.
We must check before we proceed to try to read the actual PDU data.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
We can not have a static rpc->inbuf buffer since that will no longer guarantee
that the received buffer is valid for the duration of callbacks.
One of the problems is that if we issue new (sync) RPCs from within a
callback, that will overwrite and invalidate the receive buffer that
we passed to the callback.
Revert "init: do not leak rpc->inbuf"
This reverts commit f7bc4c8bb1.
Revert "socket: we have to use memmove in rpc_read_from_socket"
This reverts commit 24429e95b8.
Revert "socket: make rpc->inbuf static and simplify receive logic"
This reverts commit 7000a0aa04.
This funciton is called from rpc_service when it has detected that
a socket has errored out during reading/writing.
However, since this fucntion returns 0 (==success) for the case where
autoreconnect is not enabled, this means that for an errored socket we
will return 0 (==success) from rpc_service() back to the application.
Change rpc_reconnect_requeue to return -1 when invoked and autoreconnect
is disabled so that applications will receive an error back from rpc_service.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
There is no guarantee that we get the same fd again when
reestablishing a session. But if the fd changes during a
reconnect we might up with a client application busy polling
on the old fd.
Qemu registers a read handler on the current fd, but is
not realizing fd changes. So we busy poll on the old fd for good.
Things are working (except for the busy polling) until
a drain all is issued. At this point Qemu deadlocks.
Signed-off-by: Peter Lieven <pl@kamp.de>
otherwise we end up eating up all socket errors in rpc_service and then
believe we are connected, but the next call to rpc_read_from_socket
fails because the socket is closed. we then reconnect anyway.
Signed-off-by: Peter Lieven <pl@kamp.de>
the requeueing code is broken because we access pdu->next
after we mangled it in rpc_return_to_queue.
This leads to losing of waitqueue elements and more severe
a deadlock as soon as more than one waitpdu queue has elements.
Reason for that is that the first elements of the first
two queues are linked to each other.
Example:
waitpdu[0]->head = pduA ; pduA->next = pduB; pduB->next = NULL;
waitpdu[1]->head = pduC ; pduC->next = NULL;
outqueue->head = NULL;
After the for loop for waitpdu[0] queue the outqueue looks like
outqueue->head = pduA; pduA->next = NULL;
At this point pduB is lost!
In the for loop for waitpdu[1] queue the outqueue looks like this
after the first iteration:
outqueue->head = pduC; pduC->next = pduA; pduA->next = NULL;
We now fetch pdu->next of pduC which is pduA.
In the next iteration we put pduA in front of pduC. pduA->next
is then pduC and pduC->next is pduA. => Deadlock.
Signed-off-by: Peter Lieven <pl@kamp.de>
An EOF is signalled through a POLLIN event and subsequen recvs return
always 0. Handle this condition and reconnect. Otherwise we might
deadlock here.
Signed-off-by: Peter Lieven <pl@kamp.de>
- Disable multithreading in fuse_nfs
- Add -Wall and -Werror compiler flags (and fix issues found by it)
- Add nfs-cat utility
- Switch to using nfs_[f]stat64 instead of the deprecated nfs_[f]stat call
in all examples
- If the server does not return any atttributes for entries in READDIRPLUS
then try to fetch them using lookup instead.
- Reconnection fixes
- Enforce the max pdu size and add sanity checks when reading PDUs from
the socket.
- Stop using ioctl(FIONREAD) to find out how many bytes to read, and treat
0 as an indication of a problem. Some applications call their POLLIN handlers
spuriosly even when there is no data to read, which breaks this check in
libnfs.
- Add basic support to do logging.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
only logging to stderr is supported at the moment. Per default
there is no output. Its possible to set the log level via
debug url parameter.
Example:
nfs-ls nfs://127.0.0.1/export?debug=2
Signed-off-by: Peter Lieven <pl@kamp.de>
the write limit of libnfs has been 1M since a long time.
Restrict rtmax and wrmax to 1M and error out otherwise.
Limit the PDU size when reading from socket to rule out
malicious servers forcing us to allocate a lot of memory.
Signed-off-by: Peter Lieven <pl@kamp.de>
under Linux poll might return POLLIN even if there are no bytes available for read.
See select(2) manpage for surious readiness under BUGS.
As a consequence we start dropping TCP connections which are still alive.
Signed-off-by: Peter Lieven <pl@kamp.de>
Conflicts:
lib/socket.c
Some servers sometimes do not return attrivbutes for files in the RDP
replies. So we need to fallback to using LOOKUPs for these entries
just like we always have to do in the READDIR case.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Update the configure to add some sanity -W arguments.
A good start is probably :
-Wall -Werror -Wshadow -Wno-write-strings -Wstrict-prototypes
-Wpointer-arith -Wcast-align -Wno-strict-aliasing
Fixup the paces in the code that triggers.
(one of which is readahead code which is perhaps broken?)
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
- Auto-traverse mounts. With this option (default to on) libnfs will
autodiscover and handle any nested submounts.
- Remove nfs_get_current_offset. Applications should use seek instead of this function.
- Add umask() support.
- Change set_tcp_sockopt() to be static.
- Android fix for nfs-ls
- Make S_IFLNK available on windows.
- Fix a use after free.
- Fix a bug where truncate() treated offset as 32bit.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Make the continue_int field we use through the internal callbacks a uint64_t
instead of an int.
This fixes a bug for the truncate function where we pass the offset to truncate
to via this field.
The bug is that otherwise we first truncate the field to an int and then
in the callback we cast this int back to a uint64_t again.
If the user called truncate with an offset that is >= 2^31
then :
IF the 1<<31 bit is cleared then we would truncate to (offset & 0xffffffff)
IF the 1<<31 bit is set, we would instead truncate to (offset | 0xffffffff00000000)
Reported-by: doktorstick at github
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Fix retrieving the attributes for a submount in nfs_opendir when
the submount is an entry of '/'.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
Add a URL argument to enable/disable the use of automatic traversal of nested
mounts for a libnfs context. Default it to enabled.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
When opendir_cb encounters an entry that refers to a nested mount, then
replace the attributes with those attributes we collected for this export
during the mount and return those instead.
This makes traversing a director that crosses into a different filesystem
on the server transparent to the client.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Durng nfs_lookuppath_async, check if the requested path traverses into
a nested mountpoint and if so, skip resolving that part of thre path and just
use the filehandle for the nested mount.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
During the mount process, once we have connected to NFSd we need to collect
the file attributes for all the filehandles that are associated with
nested mounts.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
During the mount process, call MOUNT3 EXPORT and collect a list of all
exports on the server. For any export that is a nested mount to the current
directory we are trying to mount call out to MOUNT3 MNT and collect the
filehandle for those mounts.
Track all nested mounts and their filehandles in a list from the nfs_context.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Rename these callback functions to make space for two new functions
nfs_mount_7/8_cb which we will be using to collect information about
nested mounts in a later patch.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
NFS reports type in a separate to mode, where mode only contains
the protection bits. This makes it inconvenient to use when porting programs
since under posix the type is supposed to be part of the mode bits (S_IFMT)
When unmarshalling the directory entries into a nfsdirent structure, bake
the S_IF* file type into where the S_IFMT bits would be.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
- Add O_TRUNC support for nfs_create
- Handle OOM during create
- Return more stats fields as part of readdir since we get these for "free"
when we use READDIRPLUS
- Follow symlinks during path resolution
- Add lchown, lstat and lutimes
- Replace all [u_]quad types with [u]int types in our RPC layer
- Solaris build fixes
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Add nfs_access2(), like nfs_access() but it returns the individual
statuses of R_OK, W_OK and X_OK rather than a single success or failure
status. This saves the latency and overhead of multiple lookups if an
application tries to determine the status of each of R_OK, W_OK and
X_OK.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
Map ACCESS3_{MODIFY,EXTEND,DELETE} to W_OK and ACCESS3_{LOOKUP,EXECUTE}
to X_OK so that nfs_access() gives sensible results for directories.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
All current platforms have a quad type that maps to a 64bit scalar.
But there are platforms where quad maps to a 64bit non-scalar.
Replace quad with int64 in the protocol definitions and the ZDR layer
so that these fields will map to a 64 bit scalar also on those platforms
where quad can not be used.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Include the proper headers to fix warnings like:
libnfs-sync.c:1529:3: warning: implicit declaration of function 'gettimeofday' [-Wimplicit-function-declaration]
libnfs-zdr.c:506:2: warning: implicit declaration of function 'getuid' [-Wimplicit-function-declaration]
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
sys/time.h needs to be protected with an ifdef
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Since the user callback may perform operations on the nfsfh (e.g. it
might close it), all updates should be done before the user callback is
called.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
Add lchmod which is like chmod but operates on the symbolic link itself
if the destination is a symbolic link.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
Add lutimes which is like utimes but operates on the symbolic link
itself if the destination is a symbolic link.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
Add lstat which is like stat but operates on the symbolic link itself if
the destination is a symbolic link.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
Add lchown which is like chown but operates on the symbolic link itself
if the destination is a symbolic link.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
Follow symlinks during path resolution. If the symlink points outside
the mount, -ENOENT is returned. This is slightly different behavior
from the in-kernel NFS client where symlinks pointing outside the mount
get resolved to local paths.
The algorithm for symlink resolution is simple and stupid. If a symlink
is encountered, the path is rewritten and path resolution begins again
from the root filehandle. A count is kept to prevent loops. This is
not particularly efficient but it is good enough for now.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
Set as much stat information as possible for stat, stat64, fstat and
readdir.
Fill in dev to the given fsid.
Fill in rdev to the given major and minor numbers.
Set the file type bits in the mode from the type returned by the server.
Set the number of blocks used based on the number of bytes used in
blocks of size 512 (which is what stat(2) uses), rounded up.
Fill in the nanosecond timestamps.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com>
This version removes old ONC-RPC symbols and automatically includes the
RPC/ZDR layer include from the raw low level headers.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>