While looking into this, I found a discussion thread for the main Side Effects wiki, Talk:Side effect (computer science) - Wikipedia where they explicitly mention that diverging (by looping forever) is not a side-effect. And I couldn't find any reference that mentions looping forever as a side-effect. So to me, it looks like looping forever is pure, but not total.
It differs from unwinding the stack, since unwinding is non-local and can have different results each time it is done. But looping forever is always the same, regardless of where it's done.
Do you have some references for looping indefinitely being impure? Maybe I just didn't look in the right places.
I think you word it a little more strongly than what I read in the link, they explicitly say there is no consensus and give the reasoning behind the not-an-effect argument.
Another good place to look is the koka docs, The Koka Programming Language even mentions two types of purity "mathematical purity", and "Haskell's notion of purity", where their definition of total corresponds to mathematical purity
The absence of any effect is denoted as total (or <>) and corresponds to pure mathematical functions. If a function can raise an exception the effect is exn, and if a function may not terminate the effect is div (for divergence). The combination of exn and div is pure and corresponds directly to Haskell's notion of purity.
I concede that the common terminology includes non-termination and exceptions in "pure" functions, and uses the term "total" for functions that are pure and always return with a value. I don't like that terminology, I think "pure" should mean "mathematically pure", but there's no point in fighting windmills -- Haskell apparently successfully (re-)defined this term.
However, when it comes to effects (I assume you are using "effect" and "side-effect" as synonyms):
We seem to all agree that raising an exception is an effect. Even Wikipedia says that. That means even pure functions (in the sense that Haskell and Koka use the word) can have the effect of raising an exception. It is therefore already not correct to say that "pure functions have no effects".
When it comes to non-termination, I think it must be treated as an effect if you want to have a consistent definition of "effect". An effect is everything a function can do that is not "return a value"; equivalently: an effect is everything a function can do that a mathematical (total) function cannot do. By both definitions, non-termination is an effect. This also makes sense if you think of it from the perspective of program analysis: what could possibly be the reason that (f(), g()) might be different from (g(), f()).swap()? The only reason can be some sort of side-effect. Non-termination suffices to make these two snippets behave differently (in a non-lazy language), therefore, non-termination is an effect.
You can of course define side-effect as a list of things and then manually exclude non-termination, but I would argue that is an awkward and entirely arbitrary definition. There should be some underlying principle to what is and is not an effect, it shouldn't be just an arbitrary list. And I am not aware of any principle by which non-determinism and exceptions are effects, but non-termination is not.
You linked to the discussion side of Wikipedia; note that it says "many scholars and programmers consider divergence (nontermination) not to be an effect" (without giving a source). In my experience, it is equally correct to say that "many scholars do consider nontermination to be an effect" (I don't have a source either, other than "I am one of those scholars and regularly talk with my colleagues").
Koka agrees with me: div is the effect of non-termination. Here are some academic references taking a similar stance:
FWIW that Wikipedia page also has some really odd verbiage in the "referential transparency" section which seems to imply that non-determinism is not an effect. That's definitely wrong, non-determinism is clearly an effect in typical PL usage (see e.g. here). I don't know who wrote that Wikipedia page but it seems to not even try to actually carve out an underlying principle of what an effect is, when such a principle definitely exists and is widely explored by modeling effects monadically or with algebraic effects.
One other thing even though koka's defines the "pure" effect using the haskell notion of purity
It seems to also differ from haskell's notion of purity in that it considers allocation an effect alloc.
Presumably rust which considers allocation failure a form of panic also triggers the nontermination aspect of this thread. Also a notion typically absent from the mathematical definitions of purity. One might also consider even successful allocations as having an effect even a simple bump allocator bumps a pointer.
Allocations are entirely un-observable in Haskell (except for performance), so I don't think there is an actual difference here. Effects you can't see don't happen.
My main point is that a function which is pure in haskell can allocate, while a function which is pure in koka cannot, since the definition of pure in koka can only either cause exceptions or divergence and it considers alloc neither of those. Thus I don't think koka and haskell use exactly the same definition.
Haskell has no notion of "allocation", so I don't know what you mean by that. Koka has an explicit ref for allocations and mutable state, and an effect to track it, but there is no Haskell equivalent. (Well, there is the ST monad, but that's not a pure function any more, the monad marks it as impure. Same for the IO monad.)
The fact that the language runtime has such a thing as a heap with things allocated on it is an implementation detail and not part of the language semantics, the programmer can't even tell that that is happening.
Well I can see this argument is going nowhere, so i'm going to duck out. But the point is that haskell runs on machines of finite space. At some point it runs out of memory and requires more to complete it's computation. When rust hits this physical limit, it produces a panic. Whatever haskell does when it runs into physical limits, it really seems to have two choice of how to handle it, one being termination, and the other being divergence, potentially waiting forever for space.
That the theoretical model doesn't capture the limits of the hardware doesn't matter much to me.
I am still not entirely sure which "allocations" in Haskell you are referring to, unfortunately you have not clarified that. It could be ST/IO, or it could be the hidden allocations inside the runtime that the programmer can never "see".
But even if you go down to the implementation level, as far as I can see there is no difference between Koka's and Haskell's idea of pure functions. total and pure Koka functions can equally run out of space by constructing sufficiently big objects or by overflowing the stack. So if you want the theoretical model to account for the inherent finiteness of real executions, you have the same problem either way.
Yes, the hidden GC allocations. Sure both can run out of space, the problem doesn't go away but in a system which models allocation as an effect has the choice of how to handle the effect. I.e. the program could either terminate or diverge in response to reaching the limit. Where the built in effect handler such as the gc, or rust's allocation panic prescribes such a choice.
I agree that it should be treated as effect but I can make the case for not treating it as one.
It's possible to design languages where functions have a bound on the number of steps they take to compute and this bound is part of the function's signature. If your language doesn't have this then your class of "pure" functions includes functions which take can take arbitrarily long to compute. And diverging functions are indistinguishable from functions which take an arbitrarily long time to compute. (And that can include not just divergence-via-loop {} but also divergence-via-panic!() depending on where you draw boundary of your semantic model).
Can anyone tell me the practical benefit of considering non-termination an effect? Yes, for some safety reasons you might want to cancel a branch in a proof by diverging, but in practical sense I don't see a lot of difference between loop {} and for _ in 0..i64::MAX {}. At this point, to bring some benefit into Rust, we might start talking about async and its "poll should return in a timely manner`, which is a real pain point real developers are facing daily.
One way it matters is for which code is reachable. (even if that is after i64::MAX steps), If some code isn't reachable it isn't sound for the compiler to "optimize" it and make it reachable. (for example, to enable some other optimizations)
Also, for _ in 0..i64::MAX can be optimized to a nop, and loop {} cannot. (Sure, if you do something non-trivial in the loops, then it won't be, but the point still stands).
yes, point take, it wasn't a really good argument, but I couldn't find anything more concrete.
Thanks for this! This will be interesting to look through.
The problem is that that property is not composable, so isn't really an effect at all.
It's like the problem with trying to make a QuickToCopy trait, where clearly u8: QuickToCopy and [u8; 4]: QuickToCopy, but then trying to extend that to [impl QuickToCopy; 4] doesn't work because of [[[[[u8; 4]; 4]; 4]; 4]; 4].
You need more than just a binary effect, you need a full refinement types system so that you can attribute a non-binary cost to things, and track that all the way through stuff. And while that's cool and has its place, I can't imagine it'll ever be a good fit for Rust.
One way it matters is for which code is reachable. (even if that is after i64::MAX steps), If some code isn't reachable it isn't sound for the compiler to "optimize" it and make it reachable. (for example, to enable some other optimizations)
We are talking about the type system, not optimizations, so I can't consider it as a answer or a partial answer to my question
Also, for _ in 0..i64::MAX can be optimized to a nop, and loop {} cannot. (Sure, if you do something non-trivial in the loops, then it won't be, but the point still stands).
Thank you for the technicality, I should've expressed myself better. I meant that I don't see the practical difference between the loop {} and long running pure code (like some extremely long computation that doesn't include any OS etc) and is proven to be terminating.
I don't see how for a real developer and real work there would be a distinction. I think we should not overcomplicate and let loop {} be no different, for every real use case loop {} is as bad as 2 days in a pure loop with an upper bound - so no difference for any practical purposes, the difference in the formalities instead.
The only way I see it change something is when you are writing an unsafe code and want to close some case, but even there I'm not sure this is a productive way to put it.
And again, optimizations are not up to the type system itself.
Enumerating 2^63 values is quite feasible on modern hardware, especially if you parallelize. Supercomputers are measured in exaflops now. 10^18 ops per second (let's ignore the float part). Ten seconds if it were parallelized perfectly.
So this isn't some "number doesn't fit in the universe" kind of unobservable outcome. That's why 128bit is "merely" boiling-the-oceans territory and you need something around 256bits to make it physically unobservable.
One point of effects is to give the compiler more information about what happens "inside" a function, so it can do more optimizations. "It loops forever" is one of the things that could happen. If a compiler wants to reorder code by moving it up across a function call, it can be crucial to know whether the function will terminate or not.
By making non-termination an effect, we ensure that everything that could happen "inside" a function is tracked as an effect. Conversely, by making it not-an-effect, we are blind to one of the key things a function can do.
Really I see no justification for saying that e.g. mutating global state would be an effect but non-termination is not. They are both just examples of "things a function can do that must be taken into account when reasoning about it, including during optimizations", or equivalently "things that make a function behave differently from a mathematical function".
For optimizations, compiler can track this information outside of the type system. I can see the difference with global variables, because you cannot cache the functions, for example. As a developer, I can't understand the benefit of making functions with "maybe" loop {} special (not talking about never type, it is very useful), while at the same time accepting (because we can't really can do anything else about them) the functions that run for a really long time (with really long depending on the kind of application).
I think that the two snippets behave differently only if at least one of f and g (say f) has side effects other than termination, and the other (say g) diverges. So if we adopt @RustyYato 's definitions, we can use the following rules:
Pure total function calls can be freely reordered with anything.
Pure non-total function calls can be freely reordered with other pure non-total function calls.
This is like saying the compiler can track aliasing information outside the type system. Sure, it can, to some extent, but by comparing this with a compiler that tracks it in the type system you are comparing apples and oranges. If you have it inside the type system like Rust does, you get much better optimizations.
And it's not just compilers that care, the exact same arguments apply when a programmer reasons about their code: what do I have to take into account when refactoring/optimizing this?
This boils down to "is it actually useful to show liveness properties when what we really want is bounded liveness (which is a safety property)". It's a great question, and the answer is longer than what I have time to spell out now.
But one point I do want to mention is that we want to be able to state program correctness and compiler correctness in a principled way. "Takes a really long time" is not a principled statement, but "takes forever" is. A function that does loop {}; unsafe { unreachable_unchecked() } is sound, but a function that counts to 2^256 and then calls unreachable_unchecked() is not. (The compiler may optimize away that counting loop and then you have insta-UB! But the compiler promises to never optimize away an infinite loop.) So when thinking about what a program does and whether it is correct, we do make a fundamental distinction between "takes very long" and "never finishes". You may say that that is a case of the formalism losing touch with reality, but experience shows that the formalism is still useful, and in the end that is all that matters. Maybe one day someone finds a useful formalism that equates "takes longer than the heat death of the universe" and "takes forever", but so far, nobody was able to come up with such a formalism, so we use what we have.
This is not true if "pure non-total" includes "functions that can throw an exception", and if there is more than one kind of exception -- reordering the functions can change which exception you get.
But in a language without exceptions, I agree this is correct.
Oh, there's tons of other allowed reorderings. For instance, two functions that definitely terminate but may also read arbitrary parts of the memory can be rerecorded with each other.