Recently I read the article How (not) to use readdir_r(3)
(2012). It describes an issue with the readdir_r
function in file systems that allow file names longer that 255 bytes.
From the article:
Here's an AOSP patch: libc: readdir_r smashed the stack. Some engineers point out that FAT32 lets you store 255 UTF-16 characters, Android's
struct dirent
has a 256-byted_name
field, so if you have the right kind of filename on a FAT32 file system, it won't fit in astruct dirent
.
Since std::fs::read_dir
uses readdir64_r
(code) I was curious to see what happens in Rust. As shown below, fs::read_dir
throws a code: 0
error.
I tested the problem in Linux, but I think that it may happen in other Unix-like systems.
Steps to reproduce
-
Create a FAT32 image.
dd if=/dev/zero of=/tmp/vfat bs=1M count=32 mkfs.vfat -F 32 /tmp/vfat sudo mount -o loop,uid=$UID /tmp/vfat /mnt/disk
-
Create some files which names need more than 255 bytes to be stored.
create_file() { ruby -e 'File.open("/mnt/disk/" + ARGV[0] * ARGV[1].to_i, "a")' -- "$@"; } create_file á 127 # 254 bytes create_file é 128 # 256 bytes create_file í 129 # 258 bytes
create_file S N
creates a file which names consists on theS
string repeatedN
times.Only the first file (
á
× 127) can be copied to a non-FAT32 file system:$ cp -a /mnt/disk/ $(mktemp -d) cp: cannot create regular file '/tmp/tmp.ZBXQO63Syn/disk/éééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééé': File name too long cp: cannot create regular file '/tmp/tmp.ZBXQO63Syn/disk/ííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííí': File name too long
Other programs, like
ls
andfind
does not have any issue. -
Read the contents of the directory with a Rust program.
// ls.rs fn main() { for arg in std::env::args_os().skip(1) { for f in std::fs::read_dir(arg).unwrap() { println!("{:?}", f); } } }
Executes it against the FAT32 image:
$ rustc -O ls.rs $ ./ls /mnt/disk/ Ok(DirEntry("/mnt/disk/ááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááááá")) Err(Os { code: 0, kind: Uncategorized, message: "Success" })
Cause
The problem happens because readdir64_r
uses a fixed size for the file name. In the definition of struct dirent64
, the field d_name
is a 256 bytes array.
The struct dirent
for macOS uses a 1024 bytes array, so I assume that macOS is not affected by this issue.
Other File Systems
The post mentioned above is related to FAT32, but according to Wikipedia there are other file systems that allows more than 255 bytes in the file name:
File System | Limit |
---|---|
APFS | 255 UTF-8 characters |
GPFS | 255 UTF-8 characters |
HFS Plus | 255 UTF-16 characters |
NSS | 256 characters |
NTFS | 255 UTF-16 characters |
ReFS | 255 UTF-16 characters |
Reiser4 | 3976 bytes |
ReiserFS | 4032 bytes |
SquashFS | 256 bytes |
exFAT | 255 UTF-16 characters |
Conclusion
I posted the issue here because I'm not sure if this is actually a bug. The conditions to trigger it are very rare.
However, the current manpage for readdir_r(3)
recommends using readdir
instead:
It is recommended that applications use
readdir(3)
instead ofreaddir_r()
. Furthermore, since version 2.24, glibc deprecatesreaddir_r()
. The reasons are as follows:
On systems where
NAME_MAX
is undefined, callingreaddir_r()
may be unsafe because the interface does not allow the caller to specify the length of the buffer used for the returned directory entry.On some systems,
readdir_r()
can't read directory entries with very long names. When the glibc implementation encounters such a name,readdir_r()
fails with the errorENAMETOOLONG
after the final directory entry has been read. On some other systems,readdir_r()
may return a success status, but the returnedd_name
field may not be null terminated or may be truncated.In the current POSIX.1 specification (POSIX.1-2008),
readdir(3)
is not required to be thread-safe. However, in modern implementations (including the glibc implementation), concurrent calls toreaddir(3)
that specify different directory streams are thread-safe. Therefore, the use ofreaddir_r()
is generally unnecessary in multithreaded programs. In cases where multiple threads must read from the same directory stream, usingreaddir(3)
with external synchronization is still preferable to the use ofreaddir_r()
, for the reasons given in the points above.It is expected that a future version of POSIX.1 will make
readdir_r()
obsolete, and require thatreaddir(3)
be thread-safe when concurrently employed on different directory streams.