Making struct construction unsafe, or warn in safe context

TL;DR: I sometimes want to make Foo to be unsafe where pub struct Foo(Inner);, or want to be warned for use of Foo(_) in safe context.

Consider the case you want to define newtypes for slice.

/// A string slice type that consists of one or more ASCII digits.
#[repr(transparent)]
pub struct AsciiDigitStr(str);

/// An owned string type that consists of one or more ASCII digits.
pub struct AsciiDigitString(String);

(Another examples would be Utf7Str(str), RelativePath(std::path::Path), AsciiStr(str), ... this pattern would be useful and be used practically in many situations.)

Then, you'll want to define constructors. Consider you don't want to expose unchecked version for some reason.

impl AsciiDigitStr {
    /// Creates an `&AsciiDigitStr` from the given string slice, without validation.
    ///
    /// # Safety
    /// The given string should not be empty, and should consist of only ASCII digits.

    unsafe fn new_unchecked(s: &str) -> &Self {
        unsafe { &*(s as *const str as *const Self) }
    }

    fn new(s: &str) -> Result<&Self, AsciiDigitError> {
        validate_ascii_digits(s)?;
        unsafe { Self::new_unchecked(s) }
    }
}

impl AsciiDigitString {
    /// Creates an `AsciiDigitString` from the given string, without validation.
    ///
    /// # Safety
    /// The given string should not be empty, and should consist of only ASCII digits.
    unsafe fn new_unchecked(s: String) -> Self {
        // `Self` (`AsciiDigitString`) is safe `fn(String) -> Self` function, though it shouldn't.
        Self(s)
    }

    fn new(s: String) -> Result<Self, FromStringError> {
        if let Err(e) = validate_ascii_digits(&s) {
            return Err(FromStringError::new(e, s);
        }
        Ok(unsafe { Self::new_unchecked(s) })
        // It is possible to use `Ok(Self(s))`, but I think it is not desirable.
    }
}

In this example, I made AsciiDigitStr::new_unchecked and AsciiDigitString::new_unchecked unsafe functions, even when they are private. This is because they have unchecked preconditions, and they'll result in UB when the conditions are violated. Making private methods unsafe is still useful to prevent writing UB by accident.

However, I cannot make AsciiDigitString function (whose type is fn(String) -> Self) unsafe. If I write Self(some_unchecked_string) in safe context by mistake, it compiles without any warnings or errors even if it causes UB.

So what I wanted is:

  • a way to make tuple struct typename unsafe function, or
  • a way to forbid or warn the use of tuple struct typename as a function in safe context.

By this, developers can prevent silent UB by being notified about AsciiDigitString(some_unchecked_string) in safe context.

What do you think?

1 Like

Privacy?

  • AsciiDigitString is "exported" from its own module
  • you're extra careful inside the mod not to step on this land-mine

No fn outside the mod can invoke AsciiDigitString(..)

1 Like

Yes, AsciiDigitString function itself is private, but I think it should be possible to make it unsafe even when it is private, to prevent more bugs. AsciiDigitString::new_unchecked() in the example is private but still it should be unsafe (rather than being safe), and I want AsciiDigitString() to be unsafe for the same reason.

To be clear: your suggestion helps prevent bugs (only and precisely)
in the module where AsciiDigitString is defined, isn't that right?

Yes, within my own module. The idea is effective only for the developers implementing the type.

You can make this field effectively private this way:

mod make_it_private {
   pub struct AsciiDigitString(String);
}
use make_it_private::AsciiDigitString;
1 Like

Putting the type to the private module makes the function AsciiDigitString private, but also makes trait impl impossible (outside the private module). I want the function to be unsafe or unavailable, during implementing methods and traits.

More detailed example:

pub struct AsciiDigitError(());

pub struct AsciiDigitString(String);
impl AsciiDigitString {
    unsafe fn new_unchecked(s: String) -> Self {
        Self(s)
    }
    fn validate(s: &str) -> Result<(), AsciiDigitError> { todo!() }
}

impl std::convert::TryFrom<&str> for AsciiDigitString {
    type Error = AsciiDigitError;

    fn try_from(s: &str) -> Result<Self, Self::Error> {
        validate(s)?;
        Ok(unsafe {
            // SAFETY: The string is successfully validated.
            Self::new_unchecked(s.to_owned())
        })
    }
}

This is the example of the correct implementation using unsafe new_unchecked(). In TryFrom<&str> for AsciiDigitString impl, unsafe block and SAFETY comment makes it clear that some precondition should be satisfied by the caller.

What if the developer forgot to validate?

    fn try_from(s: &str) -> Result<Self, Self::Error> {
        Ok(unsafe {
            // SAFETY: The string is successfully validated. // <- This is a lie...
            Self::new_unchecked(s.to_owned())
        })
    }

This would be (relatively) easy to find, especially if the reviewers pay attention to unsafe stuff. Also, when the developer forget to validate, SAFETY comment cannot be written correctly (i.e. they cannot justify the use of unsafe function), so this mistake would be found earlier.


However, what happens when the developer uses Self (or typename of tuple struct) by mistake?

    fn try_from(s: &str) -> Result<Self, Self::Error> {
        validate(s)?;
        Ok(Self(s.to_owned())
    }

This compiles and works as expeced, but...

    fn try_from(s: &str) -> Result<Self, Self::Error> {
        Ok(Self(s.to_owned())
    }

this wrong version also compiles, and unexpected panic or UB may happen at the use of other methods.

This is the motivation of making Self() unsafe, or at least making it possible to be warned in safe context.

lo48576, thank you for elaborating

Your desire has been expressed pretty clearly now, I think
You would like to avoid bugs in your own module

That is a reasonable ask
On the other hand the status quo isn't too bad either

Let us wait and see if you gain support from people whose voices matter
That is indeed all we can do by posting to this forum..

I from my side just tried to help you articulate your suggestion better

2 Likes

I think construction isn't the only problem here. It would be just as bad to self.0.push('‽') on an AsciiDigitString.

So that brings things back to the general idea of "unsafe fields" that has been floating around for a while: pub struct AsciiDigitString(unsafe String); would mean that the type would require unsafe to be constructed and the field would require unsafe to access mutably.

That would help solve things like the "why is Vec::set_len unsafe when it doesn't call anything that needs unsafe" problem that even published papers seem to run into.

8 Likes

Yes, modification to field(s) may unsafe in some cases, but I think bad unsafety usually comes from inconsistency between fields, rather than from a single field. And, for now, I don't think about controlling unsafety of fields mutations because I don't have clear image of how it should be like.

Idea of unsafe Self() function would be a good starting point, as it could be thought as subset of unsafe field mutations --- unsafe Self() treats all fields as the source of unsafety, only at the value creation.

I've generally thought of an unsafe field as "there's an invariant that includes this field that's critical for soundness" -- which, yes, would usually be because of a combination of fields.

I suppose it's possible that it's rare for there to be a struct that has some unsafe fields and some safe fields -- it could also just be an "unsafe struct" instead of marking all the fields unsafe.

2 Likes

That's not true. Your example:

pub struct AsciiDigitError(());

mod inner {
    pub struct AsciiDigitString(String);
    impl AsciiDigitString {
        pub(super) unsafe fn new_unchecked(s: String) -> Self {
            Self(s)
        }
    }
}

pub use self::inner::AsciiDigitString;

impl AsciiDigitString {
    fn validate(s: &str) -> Result<(), AsciiDigitError> {
        todo!()
    }
}

impl std::convert::TryFrom<&str> for AsciiDigitString {
    type Error = AsciiDigitError;

    fn try_from(s: &str) -> Result<Self, Self::Error> {
        AsciiDigitString::validate(s)?;
        Ok(unsafe {
            // SAFETY: The string is successfully validated.
            Self::new_unchecked(s.to_owned())
        })
    }
}

Only a crate barrier impacts coherence rules. Module barriers impact privacy but not coherence; a (trait) impl can be anywhere[2] in the crate.

I agree that it would be better if we could make field access (to construct or &mut) unsafe, but privacy can accomplish (almost[1]) the same effect.

[1][3] (*muffled screaming*) partial borrows, oh no!

[2] Well, anywhere that allows items, anyway. Yes, this includes function bodies (even for unrelated types). Yes, this is a nightmare for IDEs' intellisence. They handle impls in separate modules fine, though, because that's a super common thing, unlike "weird" locations for impls.

[3] yes I know these are out of order shut up

4 Likes

Thank you! I've misunderstood the technique. It is important that I should split method impls by use of Self(..), and should put impls using Self(..) into the private module (and put impls without Self(..) outside the private module).

This seems to be the best workaround available for now.

I don't think we need another syntax/language feature for it. Only thing we need is just another library type - let's call it UnsafeMut<T>.

pub struct UnsafeMut<T>(T);

impl<T> UnsafeMut<T> {
    pub unsafe fn new(value: T) -> Self {
        UnsafeMut(value)
    }

    pub unsafe fn get_mut(this: &mut Self) -> &mut T {
        &mut this.0
    }
}

impl<T> Deref for UnsafeMut<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}
1 Like

That doesn't work because you can swap the UnsafeMut itself. (I've seen this exact discussion before, but I can't seem to find the IRLO thread)

3 Likes