Programming language vulnerability prevention recommendations from ISO WG23


#1

ISO WG23, the programming language vulnerabilities group, recently released a whitepaper detailing the recommended features to prevent vulnerabilities in programming languages. It’s a pretty extensive list and worth looking through. You’ll definitely see a number that Rust does, though I thought it might give us ideas for possible new Rust features.

http://www.open-std.org/jtc1/sc22/wg23/docs/ISO-IECJTC1-SC22-WG23_N0727-language-designer-advice-after-meeting-49-20170127.pdf


#2

How many of those points is Rust “missing”?


#3

So I just finished reading through the whole thing. There are a lot of vague items, a lot of duplicate items, some that I’m pretty sure don’t apply to or aren’t an issue for Rust, and some that didn’t really make sense, but there were also a lot that Rust definitely “passes” and quite a few that I think Rust is “missing” currently. I’m not sure how many of these we actually want to add, but some of them we obviously do. All the bullet points below are copied directly from items in the whitepaper except for the “group headings” I added.

  • Language Specification:

    • A language should provide a list of undefined, unspecified and implementation-defined behaviours.
    • Language specification should include the definition of a common versioning method
    • Clearly state whether translators can extend the set of intrinsic procedures or not.
    • Language designers should provide mechanisms that permit the disabling or diagnosing of constructs that may produce undefined behaviour.
    • Portability guidelines for a specific language should provide a list of common implementation-defined behaviours.
  • Expressing and validating preconditions, postconditions and argument ranges:

    • Provide language mechanisms to formally specify preconditions and postconditions.
    • Ensure that all library functions defined operate as intended over the specified range of input values and react in a defined manner to values that are outside the specified range.
    • Languages should define libraries that provide the capability to validate parameters during compilation, during execution or by static analysis
  • Allocation:

    • Implementations of the free function could tolerate multiple frees on the same reference/pointer or frees of memory that was never allocated.
    • A storage allocation interface should be provided that will allow the called function to set the pointer used to NULL after the referenced storage is deallocated.
  • Programming language specifications could provide labels—such as in, out, and inout—that control the subprogram’s access to its formal parameters, and enforce the access

  • Languages can provide syntax and semantics to guarantee program-wide that dynamic memory is not used (such as the configuration pragmas feature offered by some programming languages)

  • Compilers should provide an option to report the class in which a resolved method resides

  • Runtime environments should provide a trace of all runtime method resolutions.

  • Provide a means so that a program can either automatically or manually check that the digital signature of a library matches the one in the compile/test environment.

  • Provide correct linkage even in the absence of correctly specified procedure signatures. (Note that this may be very difficult where the original source code is unavailable.) (Look at Part 1 and rework)

  • Concurrency and threads:

    • Provide a mechanism (either a language mechanism or a service call) to signal either another thread or an entity that can be queried by other threads when a thread terminates.
    • Languages that do not presently consider concurrency should consider creating primitives that let applications specify regions of sequential access to data. Mechanisms such as protected regions, Hoare monitors or synchronous message passing between threads result in significantly fewer resource access mistakes in a program.
    • Provide a mechanism that, within critical pieces of code, defers the delivery of asynchronous exceptions or asynchronous transfers of control

My overall impression is that everything on this list we probably want to do in Rust we’re already working on in some form.


#4

Some points from your list:

Provide language mechanisms to formally specify preconditions and postconditions.

In theory having Design by Contract in Rust is nice, in practice I’ve seen a limited usage of it in D language… Perhaps it could have more usage in Rust.

Programming language specifications could provide labels – such as in, out, and inout – that control the subprogram’s access to its formal parameters, and enforce the access

This too is in D language and Ada. Ada also allows to put the same labels on the access of global variables by each function. This allows to both have global variables, and control their usage (the flow of data).

Languages can provide syntax and semantics to guarantee program-wide that dynamic memory is not used (such as the configuration pragmas feature offered by some programming languages)

In D language there are various ways to do this, like the @nogc annotation for functions, and more.


#5

Out of curiosity, I decided to check every single point in the recommendations to check how Rust meets those. Sorry for a long post, feel free to skip quite long “Rust implements this” section. I’m not saying that those should be implemented, for instance having a syntax for explicit termination of conditional statements sounds like a bad idea to me, is there even a programming language with such a syntax?

I think that 39 is an interesting idea about requiring parenthesis for confusing operator ordering, and may be considered as a warning. Other than that, there are allocators (I believe this is what 60 is about??) and design by contract (67). Design by contract is implemented by hoare crate which may be sufficient?

Along with that, there is strict floating point standards conformance. What is not implemented from the standards is not really useful in most situations, LLVM doesn’t provide a way of implementing those, and it doesn’t seem like many people care, what with gcc explicitly not implementing it (see “IEC 60559 (also known as IEC 559 or IEEE arithmetic) support” in this table) as it’s a tricky feature to implement without providing anything particularly useful.

Not implemented

6. Languages that do not already adhere to or only adhere to a subset of IEC 60559 [7] should consider adhering completely to the standard. Examples of standardization that should be considered: Languages should consider providing a means to generate diagnostics for code that attempts to test equality of two floating point values.

  • Implemented in Clippy.

7. Languages should consider standardizing their data type to ISO/IEC 10967-1:1994 and ISO/IEC 10967-2:2001.

  • Rust does implement a lot from that specification, but not everything. For instance, rounding modes and ldexp (known as scale in ISO/IEC 10967-1:1994) aren’t implemented. Additionally LLVM optimizer assumes that there won’t ever be a floating number value representing signaling NaN.

28. Languages should not provide logical shifting on arithmetic values or should consider flagging such usage for reviewers.

  • I’m not entirely sure what’s the issue with logical shift right, but Rust does provide it, and doesn’t warn for its usage.

31. Languages should consider requiring mandatory diagnostics for unused variables.

  • It’s possible to use #[allow(unused_variables)] to quiet such warnings, so strictly speaking they aren’t mandatory.

32. Languages should require mandatory diagnostics for variables with the same name in nested scopes.

  • Considered to be idiomatic Rust. Clippy does provide lints to prevent it, however this is allowed by default.

39. Language definitions should avoid providing precedence or a particular associativity for operators that are not typically ordered with respect to one another in arithmetic, and instead require full parenthesization to avoid misinterpretation.

45. Syntax for explicit termination of loops and conditional statements.

  • Provided for loops, not for conditional statements.

46. Features to terminate named loops and conditionals and determine if the structure as named matches the structure as inferred.

  • Provided for loops, not for conditional statements.

50. Programming language specifications could provide labels—such as in, out, and inout—that control the subprogram’s access to its formal parameters, and enforce the access.

  • in is called &, inout is called &mut, there is no out. It’s not as needed as in C due to functions being able to output multiple values and move semantics however.

60. Languages can document or specify that implementations must document choices for dynamic memory management algorithms, to hope designers decide on appropriate usage patterns and recovery techniques as necessary

  • Not sure what this is about. Custom allocators? Rust doesn’t provide custom ones as of now.

67. Provide language mechanisms to formally specify preconditions and postconditions.

81. Provide a mechanism to determine which exceptions might be thrown by a called library routine.

  • Panics shouldn’t be used for error handling. However, Rust doesn’t provide a way for a function to say that it may panic.

Cannot determine (not understanding the issue)

19. Languages should consider providing compiler switches or other tools to check the size and bounds of arrays and their extents that are statically determinable.

24. Languages should consider creating a mode that provides a runtime check of the validity of all accessed objects before the object is read, written or executed.

61. Language specifiers should standardize on a common, uniform terminology to describe generics/templates so that programmers experienced in one language can reliably learn and refer to the type system of another language that has the same concept, but with a different name.

68. Find a solution to the problem. (remove from Part 1)

  • Solution to… what exactly.

69. Do not allow unchecked casts.

  • Point 56 literally asks for this feature, I don’t understand this.

78. Provide correct linkage even in the absence of correctly specified procedure signatures. (Note that this may be very difficult where the original source code is unavailable.) (Look at Part 1 and rework)

  • I believe it is impossible when C is involved, not very difficult. It may be interesting to be able to load .h files for C libraries instead of manually writing extern function declarations which is error-prone.

Specification issues (no Rust specification yet)

1. Language specifiers should standardize on a common, uniform terminology to describe their type systems so that programmers experienced in other languages can reliably learn the type system of a language that is new to them

70. Clearly state whether translators can extend the set of intrinsic procedures or not.

Rust implements this

2. Provide a mechanism for selecting data types with sufficient capability for the problem at hand.

  • signed, unsigner types of varying sizes

3. Provide a way for the computation to determine the limits of the data types actually selected.

  • std::*::{MIN, MAX}

4. Language implementers should consider providing compiler switches or other tools to provide the highest possible degree of checking for type errors.

  • Rust has many warnings by default, and there are more that can be enabled warning for stuff like missing documentation. The full list is available with rustc -W help command.

5. For languages that are commonly used for bit manipulations, an API (Application Programming Interface) for bit manipulations that is independent of word size and machine instruction set should be defined and standardized.

8. Languages that currently permit arithmetic and logical operations on enumeration types could provide a mechanism to ban such operations program-wide.

  • Enumeration types aren’t integers in Rust without explicit cast.

9. - Languages that provide automatic defaults or that do not enforce static matching between enumerator definitions and initialization expr essions could provide a mechanism to enforce such matching.

  • Rust doesn’t provide defaults for types.

10. Languages should provide mechanisms to prevent programming errors due to conversions

  • Rust doesn’t provide implicit coercions that will cause data to be lost avoiding programming errors.

11. Languages should consider making all type-conversions explicit or at least generating warnings for implicit conversions where loss of data might occur.

  • Strictly speaking Rust does have implicit Deref coercions, coercions of &Object to &Trait, coercion of [T; N] to [T], &T to *const T, &mut T to *mut T, and probably more. However those do not cause implicit loss of data when they occur so they meet the requirements set up by second part of a sentence.

12. Eliminating library calls that make assumptions about string termination characters. (C Annex, SVP)

  • The only Rust APIs making such an assumptions are unsafe CStr::from_ptr and CStr::from_bytes_with_nul_unchecked designed for interaction with C code.

13. Checking bounds when an array or string is accessed, see C Bounds Checking Library. (C Annex, SVP)

14. Specifying a string construct that does not need a string termination character. (C Annex, SVP)

15. Languages should provide safe copying of arrays as built-in operation.

16. Languages should consider only providing array copy routines in libraries that perform checks on the parameters to ensure that no buffer overrun can occur.

17. Languages should perform automatic bounds checking on accesses to array elements, unless the compiler can statically determine that the check is unnecessary. This capability may need to be optional for performance reasons. (Fix as in top 10, and in Part 1)

18. Languages that use pointer types should consider specifying a standardized feature for a pointer type that would enable array bounds checking. (Remove in Part 1)

20. Languages should consider providing whole array operations that may obviate the need to access individual elements.

21. Languages should consider the capability to generate exceptions or automatically extend the bounds of an array to accommodate accesses that might otherwise have been beyond the bounds.

22. Language-defined libraries should perform checks on the parameters to ensure that no buffer overrun can occur. (make corresponding change in Part 1)

23. Languages should consider providing full array assignment.

25. A language feature that would check a pointer value for NULL before performing an access should be considered.

  • Strictly there is no such feature because you cannot implicitly unwrap Option<T> value, it’s always required to specify what happens when the value is None.

25a. Implementations of the free function could tolerate multiple frees on the same reference/pointer or frees of memory that was never allocated.

  • Language makes it impossible to call free function multiple times (although I’m curious, how would you possibly go about implementing this in a language that has free function)

25b. Language specifiers should design generics in such a way that any attempt to instantiate a generic with constructs that do not provide the required capabilities results in a compile-time error.

26. For properties that cannot be checked at compile time, language specifiers should provide an assertion mechanism for checking properties at run-time. It should be possible to inhibit assertion checking if efficiency is a concern.

  • unwrap, RefCell, probably more. There are ways to force language to assume that the requirements are met like intentionally causing undefined behaviour on None branch and using UnsafeCell instead of RefCell.

26a. A storage allocation interface should be provided that will allow the called function to set the pointer used to NULL after the referenced storage is deallocated.

  • Borrow checker prevents access to a reference after deallocating its contents.

27. Language standards developers should consider providing facilities to specify either an error, a saturated value, or a modulo result when numeric overflow occurs. Ideally, the selection among these alternatives could be made by the programmer.

29. Languages that do not require declarations of names should consider providing an option that does impose that requirement.

30. Languages should consider providing optional warning messages for dead store.

33. Languages should require mandatory diagnostics for variable names that exceed the length that the implementation considers unique.

  • Is there even such a limit?

34. Languages should consider requiring mandatory diagnostics for overloading or overriding of keywords or standard library function identifiers.

  • Not possible, I think?

35. Languages should not have preference rules among mutable namespaces. Ambiguities should be invalid and avoidable by the user, for example, by using names qualified by their originating namespace.

36. Some languages have ways to determine if modules and regions are elaborated and initialized and to raise exceptions if this does not occur. Languages that do not, could consider adding such capabilities.

37. Languages could consider setting aside fields in all objects to identify if initialization has occurred, especially for security and safety domains.

38. Languages that do not support whole-object initialization, could consider adding this capability.

40. Languages should consider providing warnings for statements that are unlikely to be right such as statements without side effects. A null (no-op) statement may need to be added to the language for those rare instances where an intentional null statement is needed. Having a null statement as part of the language will reduce confusion as to why a statement with no side effects is present in the code.

41. Languages should consider not allowing assignments used as function parameters.

  • Assignment is allowed as a function parameter, however it is always of () type, so not really useful and very likely to type error.

42. Languages should consider not allowing assignments within a Boolean expression.

43. Language definitions should avoid situations where easily confused symbols (such as = and ==, or ; and :, or != and /=) are valid in the same context. For example, = is not generally valid in an if statement in Java because it does not normally return a Boolean value.

44. Adding a mode that strictly enforces compound conditional and looping constructs with explicit termination, such as “end if” or a closing bracket.

47. Language designers should consider the addition of an identifier type for loop control that cannot be modified by anything other than the loop control construct.

48a. Languages should provide encapsulations for arrays that prevent the need for the developer to be concerned with explicit bounds values.

48b. Languages should provide encapsulations for arrays that provide the developer with symbolic access to the array start, end and iterators.

49. Languages should support and favor structured programming through their constructs to the extent possible.

51. Do not provide means to obtain the address of a locally declared entity as a storable value; or

  • This rule allows implementing rule 52 instead of it. Rust does provide means of getting addresses of local variables.

52. Define implicit checks to implement the assurance of enclosed lifetime expressed in sub-clause 5 of this vulnerability. Note that, in many cases, the check is statically decidable, for example, when the address of a local entity is taken as part of a return statement or expression.

  • Pretty much borrowck here.

53. Language specifiers could ensure that the signatures of subprograms match within a single compilation unit and could provide features for asserting and checking the match with externally compiled subprograms.

54. A standardized set of mechanisms for detecting and treating error conditions should be developed so that all languages to the extent possible could use them. This does not mean that all languages should use the same mechanisms as there should be a variety, but each of the mechanisms should be standardized.

55. Languages should consider providing a means to perform fault handling. Terminology and the means should be coordinated with other languages.

56. Because the ability to perform reinterpretation is sometimes necessary, but the need for it is rare, programming language designers might consider putting caution labels on operations that permit reinterpretation. For example, the operation in Ada that permits unconstrained reinterpretation is called Unchecked_Conversion.

  • unsafe

57. Because of the difficulties with undiscriminated unions, programming language designers might consider offering union types that include distinct discriminants with appropriate enforcement of access to objects.

58. Provide means to create abstractions that guarantee deep copying where needed.

59. Languages can provide syntax and semantics to guarantee program-wide that dynamic memory is not used (such as the configuration pragmas feature offered by some programming languages).

  • It’s called no_std in Rust.

62. Language specifiers should design generics in such a way that any attempt to instantiate a generic with constructs that do not provide the required capabilities results in a compile-time error.

63. Language specifiers should provide an assertion mechanism for checking properties at run-time, for those properties that cannot be checked at compile time. It should be possible to inhibit assertion checking if efficiency is a concern.

64. Language specification should include the definition of a common versioning method.

65. Compilers should provide an option to report the class in which a resolved method resides.

66. Runtime environments should provide a trace of all runtime method resolutions.

  • Not a runtime environment.

71. Clearly state what the precedence is for resolving collisions.

  • They are simply not allowed other than glob imports.

72. Clearly provide ways to mark a procedure signature as being the intrinsic or an application provided procedure.

73. Require that a diagnostic is issued when an application procedure matches the signature of an intrinsic procedure.

  • No global namespace, colissions between multiple items aren’t allowed.

74. Ensure that all library functions defined operate as intended over the specified range of input values and react in a defined manner to values that are outside the specified range.

75. Languages should define libraries that provide the capability to validate parameters during compilation, during execution or by static analysis.

76. Develop standard provisions for inter-language calling with languages most often used with their programming language.

77. Provide a means so that a program can either automatically or manually check that the digital signature of a library matches the one in the compile/test environment.

  • I believe that checksums in Cargo.lock do that?

79. Provide specified means to describe the signatures of subprograms.

80. For languages that provide exceptions, provide a mechanism for catching all possible exceptions (for example, a ‘catch-all’ handler). The behaviour of the program when encountering an unhandled exception should be fully defined.

  • When unwind is enabled (doesn’t have to be), std::panic::catch_unwind can be used to do so.

82. Reduce or eliminate dependence on lexical-level pre-processors for essential functionality (such as conditional compilation).

83. Provide capabilities to inline functions and procedure calls, to reduce the need for pre-processor macros.

84. Language designers should consider removing or deprecating obscure, difficult to understand, or difficult to use features.

85. Language designers should provide language directives that optionally disable obscure language features.

  • Are there even obscure language features in Rust that you may want to disable?

86. Languages should minimize the amount of unspecified behaviours, minimize the number of possible behaviours for any given “unspecified” choice, and do cument what might be the difference in external effect associated with different choices.

87. Language designers should minimize the amount of undefined behaviour to the extent possible and practical.

88. Language designers should enumerate all the cases of undefined behaviour.

89. Language designers should provide mechanisms that permit the disabling or diagnosing of constructs that may produce undefined behaviour.

  • #[deny(unsafe-code)]

89a. Portability guidelines for a specific language should provide a list of common implementation-defined behaviours.

89b. Language specifiers should enumerate all the cases of implementation-defined behaviour.

90. Language designers should provide language directives that optionally disable obscure language features.

  • This is the same as 85.

91. Obscure language features for which there are commonly used alternatives should be considered for removal from the language standard.

92. Obscure language features that have routinely been found to be the root cause of safety or security vulnerabilities, or that are routinely disallowed in software guidance documents should be considered for removal from the language standard.

93. Language designers should provide language mechanisms that optionally disable deprecated language features.

  • #[deny(deprecated)]

94. Consider including automatic synchronization of thread initiation as part of the concurrency model.

95. Provide a mechanism permitting query of activation success.

96. Provide a mechanism (either a language mechanism or a service call) to signal either another thread or an entity that can be queried by other threads when a thread terminates.

97. Languages that do not presently consider concurrency should consider creating primitives that let applications specify regions of sequential access to data. Mechanisms such as protected regions, Hoare monitors or synchronous message passing between threads result in significantly fewer resource access mistakes in a program.

98. Provide the possibility of selecting alternative concurrency models that support static analysis, such as one of the models that are known to have safe properties. For examples, see [9], [10], and [17].

99. Provide a mechanism to preclude the abort of a thread from another thread during critical pieces of code. Some languages (for example, Ada or Real-Time Java) provide a notion of an abort-deferred region.

  • Threads cannot be aborted from outside of that thread in Rust.

100. Provide a mechanism to signal another thread (or an entity that can be queried by other threads) when a thread terminates.

101. Provide a mechanism that, within critical pieces of code, defers the delivery of asynchronous exceptions or asynchronous transfers of control.

102. Raise the level of abstraction for concurrency services.

103. Provide services or mechanisms to detect and recover from protocol lock failures.

104. Design concurrency services that help to avoid typical failures such as deadlock.

105. Ensure all format strings are verified to be correct in regard to the associated argument or parameter.


#6

No, Rust does not provide it.

Logical means unsigned and arithmetic means signed (used in the names of corresponding assembly instructions). Since rust has just one right shift operator and decides by the value type, it does not provide logical (=unsigned) right shift for arithmetic (=signed) values.

I believe this is mainly about managed runtimes, i.e. whether the language may use reference-counting, tracing collector, or either, whether and when it may be copying, whether timely destruction shall be provided and in which cases, what guarantees are there for the finalizers.

Rust only uses explicit management with implicitly inserted Drop::drop calls, so it is easy to document and should be mostly done.


I think this points to the fact that in C you can declare int x[5] and a bit later use x[12] and nothing will notice. Rust always checks bounds, both static and dynamic.

This I think refers to address sanitizer. Rust avoids these problems statically in safe code and for unsafe code LLVM asan and valgrind should both work IIRC.

This has it’s limits. I think the basic concepts mostly do have universally understood names. But for the language-specific instances, there is a problem that while similar, they are often slightly different between the languages. And then I think it’s clearer to use different term, because when you use the same term, people will tend assume they understand it when they really don’t. E.g., traits are similar to Java interfaces and to Haskell type classes, but renaming them to either would add to confusion, not clarify anything.

I think Rust is generally pretty sensible in choosing names for its constructs and concepts.

I think this is aimed at dynamic libraries. Rust does have the interface hash to pick the correct version of the library, but it does not have cryptographic signatures.

.NET does have signed assemblies and can verify them at load time and I think Java also supports signed JARs, though I am not sure it verifies them at load time or only servers verify them at install time (which makes more sense).

Now I don’t actually think this has much practical use at the language level. When the integrity of the package needs to be ensured, the integrity of the whole system must be and that can only be done at the level of that operating system (and only with a help of a sealed hardware module, because with physical access, everything can be compromised).

Anyway, Rust, using system native shared libraries, does not really have option to do this on its own anyway.


#7
  1. Languages should consider creating a mode that provides a runtime check of the validity of all accessed objects before the object is read, written or executed.

This sounds related to @RalfJung’s work with miri to detect and specify UB. There’s some discussion about it over here.

To quote his blog post (quoting his other blog post):

I also think that tooling to detect UB is of paramount importance […]. In fact, specifying a dynamic UB checker is a very good way to specify UB! Such a specification would describe the additional state that is needed at run-time to then check at every operation whether we are running into UB.


#8

We can address many issues here, like trait tests, part of design by contract, using custom testing frameworks: https://github.com/rust-lang/rfcs/pull/2318