[Pre-RFC] XRay instrumentation support

XRay instrumentation support has been inquired about in the past but apparently that never went anywhere. I want XRay to be in.

As noted, it's not complex technically: most of the heavy lifting has been already done by LLVM and client runtime libraries. The language only needs to provide sufficient controls, and a few sheds need to painted to decide how these controls will look like.

Proof of concept is available:

  1. rustc patches

    git remote add -t xray-basic ilammy https://git.sr.ht/~ilammy/rust
    git fetch ilammy
    git checkout xray-basic
    
  2. clang-rt.xray bindings

    git clone --branch pre-rfc-poc https://git.sr.ht/~ilammy/clang-rt-xray
    cd clang-rt-xray
    

    You will need to have clang installed.

  3. instrumented "Hello, world!"

    RUSTFLAGS="-Z instrument-xray=always" \
    XRAY_OPTIONS="xray_mode=xray-basic patch_premain=true verbosity=1" \
    cargo +stage1 run --example hello
    
    llvm-xray graph xray-log.hello.* \
      --instr_map=target/debug/examples/hello \
      | dot -Tsvg > call-graph.svg
    open call-graph.svg
    

I write this post in first person, since this is not even a draft of RFC, more like a prompt for initial discussion and feedback.

Summary

Add support for XRay instrumentation, along with associated #[instrument::xray] attributes for fine control over instrumented code.

Motivation

XRay provides a number of advantages over traditional automatic instrumentation used for profiling and coverage analysis, as well as over (semi-)manual application-level instrumentation.

My main admiration/inspiration for applied use of this option is ftrace, which is an enormously successful tracing framework in Linux kernel. I believe that making something like that possible and easily available for use by Rust applications will be a great improvement of user experience in performance optimization.

New attributes are added to fine-tune which functions should or should not be instrumented. This is essential for performance, debugging, and/or correctness in some contexts.

Guide-level explanation

XRay provides dynamic instrumentation which can be enabled and disabled at runtime. Tracing applications using XRay can be split into two parts:

  1. Building instrumented binaries – that's the part done by the compiler.
  2. Using instrumentation – that remains completely up to applications.

rustc deals with the first part, facilitating use of arbitrary runtime libraries designed to work with XRay instrumentation. For example, XRay Runtime Library or uftrace can be chosen by the application.

How to instrument a binary

  1. Enable instrumentation.

    XRay instrumentation is enabled using unstable compiler flag: -Z instrument-xray. (To be stabilized later, maybe? possibly? TBD: harmonize with other instrumentations.)

    RUSTFLAGS="-Z instrument-xray" cargo +nightly build ...
    

    A number of options is available to tweak the default behavior of instrumentation, passed like this: -Z instrument-xray=skip-exit,instruction-threshold=100.

    • always – always instrument all functions (after inlining)
    • never – do not instrument any functions, unless opted in with attributes
    • instruction-threshold=N – tweak the default threshold for functions
    • ignore-loops – do not use presence of loops to guide instrumentation
    • skip-entry – do not instrument function prologue
    • skip-exit – do not instrument function epilogue

    Multiple options can be combined with commas.

  2. Fine-tune instrumentation using attributes.

    The default settings set by compiler flag are intended to be suitable for typical code. However, some specific pieces of code need exceptions which are provided by attributes mirroring capabilities of the compiler flags at more granular level:

    • #[instrument::xray(always)]
    • #[instrument::xray(never)]
    • #[instrument::xray(instruction_threshold = 100)]
    • #[instrument::xray(ignore_loops)]
    • #[instrument::xray(skip_entry)]
    • #[instrument::xray(skip_exit)]

    These attributes can be applied to

    • individual functions:

      #[instrument::xray(always)]
      fn interesting_function_for_debugging() {
          // ...
      }
      
    • modules – applying to all functions defined in the module:

      #[instrument::xray(instruction_threshold = 400, ignore_loops)]
      mod not_so_interesting_functions {
          // ...
      }
      
    • crates – applying to all modules of the crate:

      // lib.rs
      #![instrument::xray(never)]
      
      //! My Tracing Framework
      //! ====================
      //!
      //! ...
      

    Attributes on a narrower scope override ones from the wide scope. E.g., an attribute on a function overrides the one provided by module, like attributes on a crate override the settings from compiler flags.

    If -Z instrument-xray flag is not activated, attributes are linted but otherwise ignored.

  3. Link with a suitable runtime library.

    This is performed by Cargo (or any other build system) in the usual manner. Linkage of the library is out of scope for this RFC.

    [dependencies]
    clang-rf-xray = "1"
    

    The library should know how to interface with instrumentation in a platform-specific manner, by using the embedded instruction maps to safely inject instrumentation into the control flow.

    Applications may choose to not link with any library, in which case nothing gets linked by default and instrumentation remains deadweight in the binary.

  4. Run the application.

    What exactly happens afterwards depends on the library and is out of scope for this RFC.

    For example, Clang's library accepts options via environment variables:

    XRAY_OPTIONS="xray_mode=xray-basic patch_premain=true" cargo run ...
    

    and logs data to xray-log.XXXXXX files.

    Typically, traces are collected and made accessible by some means, such as being written into a file or served over the network from memory. Instrumentation may be controlled at runtime, using the library API.

Drawbacks

  • Increased tie-in with LLVM.

    While XRay is not specific to LLVM, adding this feature increases dependency on LLVM.

    If this feature is stabilized, alternative implementations of Rust – or ports of existing compiler to a different backend – will have to implement this functionality, which may not be readily available.

  • Stomping on #[instrument] attribute.

    This a attribute is provided by some libraries, most prominently tracing. Having built-in #[instrument] introduces a bunch of hairiness that did not exist before.

Rationale and alternatives

What about -Z instrument-mcount option?

There currently exists an unstable option to instrument binaries for traditional prof profiling.

The venerable mcount instrumentation is still relevant, allowing to integrate with existing profiling infrastructure with minimal effort. However, it is somewhat limited in capabilities so alternatives still have their place.

The main limitation of mcount-based instrumentation is that it is inserted only into the function prologue. This is not convenient if you want to trace both entry and exit. It's not impossible to do, but having tools that allow it properly is better.

What about adding support for fentry, __cyg_profile_func_{enter,exit}?

Aside from prof support via -pg flag, GCC & Clang offer some others:

  • -finstrument-functions
  • -pg -mfentry

These options also insert calls to special functions in defined places, such as either before the function prologue, or after the prologue and after epilogue.

These features are similar to XRay instrumentation in intent and location of the calls. However, XRay has a slightly different ABI providing more flexibility, and as such it is distinct.

Nothing in this RFC precludes adding support for fentry-based or other instrumentation in the future.

Why retain XRay branding?

Because this type of instrumentation needs a name for reference and there is an established one.

Applications could be instrumented with multiple instruments simultaneously, such as:

  • sanitizers
  • PGO & code coverage
  • embedded profilers

Sanitizers are typically very intimately coupled with the compiler (or rather, the backend). While it should be technically possible to use an alternative implementation with existing framework, I am not aware of such endeavors in practice and production use. Essentially, one can assume that there is one blessed implementation of sanitizers provided by the toolchain.

Branch profiling and code coverage are less tightly coupled with the toolchain. However, in practice these instrumentations serve a rather limited use case, and as such users typically do not have a need for alternative implementations, preferring to use the one bundled with the toolchain which produces output in standard formats for coverage maps. Multiple options might be available though; for example, LLVM supports both its own and GCOV-based one.

Embedded profiling, on the other hand, is mostly an application matter. The compiler is expected to provide hooks into the control flow: usually around function execution, sometimes near loops as well, with additional manual trace points inserted by application developers themselves. How exactly these hooks are used or whether they are used at all is up to the application to decide.

However, there are multiple choices for embedded profiler hook location and form:

  • mcount
  • fentry
  • __cyg_profile_func_enter
  • __xray_FunctionEntryStub

XRay instrumentation has a distinct ABI, and as such it needs a distinct name for identification.

Why no default runtime library?

Clang automatically links its XRay Runtime Library when -finstrument-xray option is used.

I believe Rust should not behave like that for a number of reasons:

  • Avoid making a decision which library is going to be the default one.

    Clang has a natural choice, but Rust is not inherently tied to LLVM for the same choice to be valid for the same reasons.

  • Avoid shipping an extra component from LLVM's compiler-rt library.

    No issues with logistics, licensing, dealing with upstream bugs, etc.

  • Avoid compatibility issues in the future.

    No problems arise if Rust wants to change the default library, since there isn't any.

  • Promote explicit choice and opt-in.

    Profiling instrumentation might have significant impact on application performance. Making the choice explicit promotes responsible use. Even if the compiler option is enabled by accident, the impact should stay minimal unless application explicitly links with a suitable library.

    Not having a default choice made by the toolchain avoids making an impression that the bundled library is somehow "better" than the alternatives. It also makes it obvious that you indeed have a choice of the alternative in the first place.

Why #[instrument::xray] attributes are necessary?

First of all, manual overrides are essential. It's better to have them than not have them when you need them. Sometimes there is critical piece of code that should not be affected, sometimes you're debugging something and want to profile specific functions.

Another problem is writing instrumentation libraries in Rust. In absence of an opt-out, the library code will be instrumented as well – which is an excellent way to swiftly crash with a stack overflow due to recursive invocation of the instrumentation from within itself.

Existing libraries often have to bend over backwards and employ insane hacks to avoid undesired recursion: paranoid setting and checking of global atomic guard flags, using C to write (non-instrumented) trampolines calling into Rust code, compiling Rust code separately into a static library with different compiler options, or all of the above.

Having a built-in way to avoid possible recursion is nice. It does not prevent footguns of course, but it saves a lot of nerve cells.

Prior art

  • Naturally, GCC & Clang with their profiling options are a prior art. Clang have native support for XRay as well.

    Both have a variation of __attribute__((no_instrument_function)) or [[clang::xray_always_instrument]] attributes as well.

  • There are several projects bringing ftrace capabilities to userspace, for example

    which can be readily used by Rust applications right now, with mtrace-based instrumentation.

  • tracing crate may be a "native" example of consumer for this compiler capability.

Unresolved questions

Here are some for starters, in no particular order. More to be identified later when I actually get to writing a proper RFC.

  • Instrumenting function groups.

    Clang allows to instrument a random subset of functions, say 1/4 of them.

  • Instrumenting dependencies.

    What if I want to instrument a particular dependency of mine or avoid doing so? What about a specific function or a module in that dependency crate?

    Should the answer be, "edit the crate code locally", since it requires recompilation anyway?

    Clang can accept a list of functions on command line, or a file with appropriate instructions, which effectively adds extra attributes like you would by editing code.

    While this is mostly Cargo territory, how would Cargo implement that?

  • How profiling instrumentation interacts with (stabilized) sanitizers?

    Command-line-option- and syntax-wise.

  • Interaction of #[instrument::xray] attribute with generics, traits, impls.

    If that were supported, what would be the semantics?

    Can you instrument based on the type (specialization)? Can a trait add instrumentation to implementations? What about default implementations?

  • Is it okay to extend and appropriate the #[instrument] attribute?

    Should it be in the initial RFC right away, instead of a separate one?

  • Heated bikeshedding about naming of all the things.

    • Say, #[instrument], #[instrumented], #[instrumentation]?
    • Or, #[instrument(never)] vs #[no_instrument]?
  • While we're here, how this will extend to other instrumentation options in the future?

    • Think, if -Z instrument-mcount were to be stabilized.
  • Should there be some #[cfg] available for the code to know it's being instrumented?

  • Should this instrumentation require additional unsafe opt-in?

    While XRay instrumentation by itself is just a bunch of nops which are obviously safe, it's not so nice as the instrumentation is patched in.

    Users adding #[instrument(always)] to their applications might not realize that they opt into using unsafe code.

    At the same time, adding pervasive unsafe for instrumentation does not meaningfully change anything about the code, except being an annoyance.

Future possibilities

  • Prompts addition of features to Cargo for controlling instrumentation.

  • Extract common bits from #[instrument::xray] attribute into a new #[instrument] attribute, applying to all instrumentations.

  • Increased adoption in Rust might drive improvements in LLVM around XRay.

4 Likes

I don't see it like this.. other backends (and other compilers, like gcc and mrustc) may just not support #[instrument], and I think that's okay, since it's not needed for normal operation