High memory usage on random file reads using `seek_read` (Windows)

Here is a small app:
//! ```cargo
//! [dependencies]
//! libc = "0.2.151"
//! rand = "0.8.5"
//!
//! [target.'cfg(windows)'.dependencies]
//! winapi = { version = "0.3.9", features = ["winbase"] }
//! ```

use rand::prelude::*;
use std::fs::{File, OpenOptions};
use std::io::{Seek, SeekFrom};
use std::{env, io};

const SECTOR_SIZE: usize = 1024 * 1024 * 1024;
const CHUNK_SIZE: usize = 32;

fn main() -> io::Result<()> {
    let file = env::args().nth(1).unwrap();

    let sectors = (File::open(&file)?.seek(SeekFrom::End(0))? / SECTOR_SIZE as u64) as usize;

    let file = OpenOptions::new()
        .read(true)
        .advise_random_access()
        .open(&file)?;
    file.advise_random_access()?;

    let mut result = vec![[0u8; CHUNK_SIZE]; sectors];

    for i in 0.. {
        (0..sectors)
            .into_iter()
            .zip(&mut result)
            .try_for_each(|(offset, result)| {
                let sector_offset = offset * SECTOR_SIZE;
                let offset_within_sector =
                    thread_rng().gen_range(0..SECTOR_SIZE / CHUNK_SIZE) * CHUNK_SIZE;

                file.read_at(result, sector_offset + offset_within_sector)
            })?;

        if i > 0 && i % 10_000 == 0 {
            println!("{i} iterations");
        }
    }

    Ok(())
}

trait ReadAtSync: Send + Sync {
    /// Fill the buffer by reading bytes at a specific offset
    fn read_at(&self, buf: &mut [u8], offset: usize) -> io::Result<()>;
}

impl ReadAtSync for File {
    fn read_at(&self, buf: &mut [u8], offset: usize) -> io::Result<()> {
        self.read_exact_at(buf, offset as u64)
    }
}

/// Extension convenience trait that allows setting some file opening options in cross-platform way
trait OpenOptionsExt {
    /// Advise OS/file system that file will use random access and read-ahead behavior is
    /// undesirable, only has impact on Windows, for other operating systems see [`FileExt`]
    fn advise_random_access(&mut self) -> &mut Self;
}

impl OpenOptionsExt for OpenOptions {
    #[cfg(target_os = "linux")]
    fn advise_random_access(&mut self) -> &mut Self {
        // Not supported
        self
    }

    #[cfg(target_os = "macos")]
    fn advise_random_access(&mut self) -> &mut Self {
        // Not supported
        self
    }

    #[cfg(windows)]
    fn advise_random_access(&mut self) -> &mut Self {
        use std::os::windows::fs::OpenOptionsExt;
        self.custom_flags(winapi::um::winbase::FILE_FLAG_RANDOM_ACCESS)
    }
}

/// Extension convenience trait that allows pre-allocating files, suggesting random access pattern
/// and doing cross-platform exact reads/writes
trait FileExt {
    /// Advise OS/file system that file will use random access and read-ahead behavior is
    /// undesirable, on Windows this can only be set when file is opened, see [`OpenOptionsExt`]
    fn advise_random_access(&self) -> io::Result<()>;

    /// Read exact number of bytes at a specific offset
    fn read_exact_at(&self, buf: &mut [u8], offset: u64) -> io::Result<()>;
}

impl FileExt for File {
    #[cfg(target_os = "linux")]
    fn advise_random_access(&self) -> io::Result<()> {
        use std::os::unix::io::AsRawFd;
        let err = unsafe { libc::posix_fadvise(self.as_raw_fd(), 0, 0, libc::POSIX_FADV_RANDOM) };
        if err != 0 {
            Err(std::io::Error::from_raw_os_error(err))
        } else {
            Ok(())
        }
    }

    #[cfg(target_os = "macos")]
    fn advise_random_access(&self) -> io::Result<()> {
        use std::os::unix::io::AsRawFd;
        if unsafe { libc::fcntl(self.as_raw_fd(), libc::F_RDAHEAD, 0) } != 0 {
            Err(std::io::Error::last_os_error())
        } else {
            Ok(())
        }
    }

    #[cfg(windows)]
    fn advise_random_access(&self) -> io::Result<()> {
        // Not supported
        Ok(())
    }

    #[cfg(unix)]
    fn read_exact_at(&self, buf: &mut [u8], offset: u64) -> io::Result<()> {
        std::os::unix::fs::FileExt::read_exact_at(self, buf, offset)
    }

    #[cfg(windows)]
    fn read_exact_at(&self, mut buf: &mut [u8], mut offset: u64) -> io::Result<()> {
        while !buf.is_empty() {
            match std::os::windows::fs::FileExt::seek_read(self, buf, offset) {
                Ok(0) => {
                    break;
                }
                Ok(n) => {
                    buf = &mut buf[n..];
                    offset += n as u64;
                }
                Err(ref e) if e.kind() == std::io::ErrorKind::Interrupted => {
                    // Try again
                }
                Err(e) => {
                    return Err(e);
                }
            }
        }

        if !buf.is_empty() {
            Err(std::io::Error::new(
                std::io::ErrorKind::UnexpectedEof,
                "failed to fill whole buffer",
            ))
        } else {
            Ok(())
        }
    }
}

What it does is reading small chunks of a large file at random in a loop.

It can be stored in file.rs and then used as cargo +nightly -Zscript file.rs D:\large-file.bin, where large-file.bin in my case is 500G.

What happens afterwards on Windows is saying application uses 0.5MiB of RAM:

Screenshot of Task Manager and Sysinternals Process Explorer

However, total Windows memory usage in a minute or two grows by ~1GiB (sometimes less, sometimes more) and I can't find where it goes. This looks like either bug in Windows or something fishy happening in Rust standard library, but either way seems to be outside of my knowledge to find out.

Screenshot of task manager while application is running and after application is stopped:

Screenshots

The example above is cross-platform and doesn't have such issues on Linux. Happens on both Windows 10 and Windows 11.

This is a reduced example of a real-world app where a single app sometimes manages tens of terabytes of large files. Appreciate any hints.

(normally I'd say this should go on users and maybe it still should but it does seem somewhat borderline tbf)

If the memory usage isn't being attributed to the program, it's probably consumed by OS-managed buffers. The next two possible steps would probably be

  • use a #[global_allocator] which just tracks current total requested allocated size then forwards to System to validate the test program heap usage is <1MB
  • port the test program to C++ with <Windows.h> to see if the program replicates there

If it's a Windows thing, Windows version will probably matter. Configuration, too, since it'll probably have something to do with read caching/optimization. It'd also be interesting to test without the random access flag to see if the memory usage happens then; it might be doing more harm than good.

Is this reading from an HDD or an SSD (or network or…)? FAT32 or ReFS or other? I vaguely remember seeing that Windows IO drivers for different drive types and different file systems can do different things by default and in response to various flags.

(Also, minor thing: if you're going to use it repeatedly, it's better to call thread_rng() once and reuse the handle it gives you, to bypass the repeated thread-local lookups.)

If nobody else has, I might test this on my machine tomorrow, as it is somewhat interesting. Although with a more medium sized file, I don't think putting that wear on my SSD for low reason is a particularly great idea. (Is there a good / reasonably quick way to make a big test file?)

Program doesn't do dynamic allocations (not explicitly anyway), so I don't see how this makes a difference.

That would be my last resort, I try not to touch C++ unless I absolutely have to.

Literally stock Windows 10 and Windows 11 installation. Both VMs and physical machines. 100% reproducible in all cases I have seen so far. And the app above does set random access flag already (not setting it also uses about the same amount of memory).

It is NTFS on SSD (when running in a VM it sees disk as HDD, so I don't think it matters either way).

I would really appreciate that! I know how trivial it is to do on Linux, not sure about Windows though. My app creates these files itself, but it is VERY compute-intensive, so will not recommend it. Writing a few hundred gigabytes will hardly make a dent in SSD life, even for the cheapest QLC drives.

My (admittedly vague) understanding is that disk cache is a thing: when you access a file, the filesystem driver assumes you might access it again, and keeps at least some of what it’s read around if there’s available memory. This memory is easily evicted if you open other files, or need it for running programs, but if you have a lot of memory it’s more useful for modern OSs to use it as disk cache than to leave it idle.

Am I sure that’s what’s happening? No, it could easily be something else. But I am saying it’s not necessarily something to worry about.

1 Like

I agree, but file is explicitly opened with a random access hint, suggesting to the OS that there is no point in caching anything. I also tried reopening the file from time to time without restarting the process and it didn't make a difference, which is even more disturbing.

What you're saying is plausible, but it is definitely against expectations I have. Also memory usage is not shown as "cache" in Task Manager :confused:

Ah, okay. I’m used to macOS, where “random access mode” means there’s no lookahead caching, but not necessarily that there’s no caching at all. Unless I’m wrong about that too. :sweat_smile:

I just googled FILE_FLAG_RANDOM_ACCESS to try to find exact what msft says the flag does and found Issue when you access files with FILE_FLAG_RANDOM_ACCESS - Windows Server | Microsoft Learn which seems somewhat likely to be relevant…

Operating system performance may degrade when … large files using the CreateFile() API and the FILE_FLAG_RANDOM_ACCESS flag … system cache consuming available memory (visible in the performance counter Memory\Cache Bytes).

The FILE_FLAG_RANDOM_ACCESS … prevents automatic unmapping of views … keeps previously read views in the Cache Manager working set … may be detrimental … can consume a large amount of physical RAM.

Remove the FILE_FLAG_RANDOM_ACCESS flag … allows the views to be unmapped and moved to the standby list after page reads are completed.

This sounds like exactly what you're encountering. Unfortunately for you, disabling read-ahead seems to also be tied to preventing unmapping — presumably because unmapping is tied to the non-random-access prediction of what read offsets you (aren't) likely to be interested in.

See also the caching behavior docs for CreateFile; additionally adding the FILE_FLAG_NO_BUFFERING may get the desired behavior (imposing alignment requirements I don't quite understand), though the docs don't specify what the result of combining those flags is.

1 Like

It does sound relevant and I did try to remove FILE_FLAG_RANDOM_ACCESS and got the same exact memory usage pattern. So while it might be a part of the issue in theory, it is at very least not the only issue here.

Yeah, alignment requirements are something I'd really like to avoid dealing with, it potentially has far-reaching consequences.

It's beside the point here but I would just note that I'm not aware of any situation in which ErrorKind::Interrupted is returned on Windows.

On topic, I'm not sure that Rust itself is doing anything particularly interesting.here. All the read operations end up as synchronous_read. It is slightly weird because it needs to act defensively against a potential asynchronous file handle but at the end of the day it is basically just a wrapper around NtReadFile (the lower level read function).

1 Like

If I recall correctly (don't have a windows machine at my fingertips) sysinternals process explorer also has a system memory statistics view when you click on the memory bar. Maybe that shows something of interest.

Unix has fadvise which allows some manual cache hinting, including the POSIX_FADV_NOREUSE (will only read once, don't keep in caches) and POSIX_FADV_DONTNEED (won't be needed any longer, can be dropped from caches) flags.

Perhaps something similar exists for windows.

In terms of influencing the built-in caching behaviour there's really only FILE_FLAG_SEQUENTIAL_SCAN and FILE_FLAG_RANDOM_ACCESS. Though these are just hints to the cache manager, which does seem quite eager to use free memory if it's available.

For large files with custom needs disabling caching entirely using FILE_FLAG_NO_BUFFERING (with or without FILE_FLAG_WRITE_THROUGH) is the only real solution. But in this case the idea is that your application (probably via a library) would take on the job of caching rather than directly using system read/write operations. So you don't have to worry about alignment and everything but somebody does.

I'd say this is completely normal and nothing to worry about. Windows will automatically buffer files and release the memory if it is needed for something else.

It does mean that when you are testing you will typically see reading a file is much quicker if it is already in operating system memory.

This is not normal and is definitely a big thing to worry about. The reason is that this is not normal cache that can be freed any time like on Linux, this is actively used memory that leads to excessive swapping to disk, slowing down to a crawl and even running out of memory in some cases.

As I mentioned from the very beginning this is not applicable. The file is many times larger than available RAM and reads are completely random. There isn't much OS can cache about the data that would bring any value, which is why code has a hint that reads are random to the OS that Linux and macOS respect, but not Windows.

You can have a look at it with rammap.

There is a long video Defrag Tools: #6 - RAMMap | Microsoft Learn

that goes with it explaining the numerous categories of memory, kinds (soft/hard) of page-faults etc.

That is exactly how I found out the details shared above. Screenshots and some discussion can be found here: Windows is leaking memory when reading random chunks of a large file - Microsoft Q&A