From fw at deneb.enyo.de Sat Oct 3 14:59:26 2009 From: fw at deneb.enyo.de (Florian Weimer) Date: Sat Oct 3 14:37:13 2009 Subject: Unsafe hGetContents In-Reply-To: <1253192223-sup-3700@peray> (Nicolas Pouillard's message of "Thu, 17 Sep 2009 14:58:59 +0200") References: <877hvylk57.fsf@mid.deneb.enyo.de> <1253192223-sup-3700@peray> Message-ID: <87zl888fsh.fsf@mid.deneb.enyo.de> * Nicolas Pouillard: > Excerpts from Florian Weimer's message of Wed Sep 16 22:17:08 +0200 2009: >> Are there any plans to get rid of hGetContents and the semi-closed >> handle state for Haskell Prime? >> >> (I call hGetContents unsafe because it adds side effects to pattern >> matching, stricly speaking invalidating most of the transformations >> which are expected to be valid in a pure language.) > > Would you consider something like [1] as an acceptable replacement? > > [1]: http://hackage.haskell.org/package/safe-lazy-io It only addresses two known issues with lazy I/O, doesn't it? It still injects input operations into pure code not in the IO monad. From marlowsd at gmail.com Tue Oct 6 08:59:06 2009 From: marlowsd at gmail.com (Simon Marlow) Date: Tue Oct 6 08:36:50 2009 Subject: Unsafe hGetContents In-Reply-To: <87zl888fsh.fsf@mid.deneb.enyo.de> References: <877hvylk57.fsf@mid.deneb.enyo.de> <1253192223-sup-3700@peray> <87zl888fsh.fsf@mid.deneb.enyo.de> Message-ID: <4ACB3F1A.9090803@gmail.com> On 03/10/2009 19:59, Florian Weimer wrote: > * Nicolas Pouillard: > >> Excerpts from Florian Weimer's message of Wed Sep 16 22:17:08 +0200 2009: >>> Are there any plans to get rid of hGetContents and the semi-closed >>> handle state for Haskell Prime? >>> >>> (I call hGetContents unsafe because it adds side effects to pattern >>> matching, stricly speaking invalidating most of the transformations >>> which are expected to be valid in a pure language.) >> >> Would you consider something like [1] as an acceptable replacement? >> >> [1]: http://hackage.haskell.org/package/safe-lazy-io > > It only addresses two known issues with lazy I/O, doesn't it? It > still injects input operations into pure code not in the IO monad. While what you say is true, and I've complained about the same thing myself in the past, it turns out to be quite difficult to demonstrate the unsafety. Try it! Here's the rules. - write a program that gives different results when compiled with different optimisation flags only. (one exception: you're not allowed to take advantage of -fno-state-hack). - Using exceptions is not allowed (they're non-determinstic). - A difference caused by resources (e.g. stack overflow) doesn't count. - The only "unsafe" operation you're allowed to use is hGetContents. - You're allowed to use any other I/O operations, including from libraries, as long as they're not unsafe, and as long as the I/O itself is deterministic. The reason it's hard is that to demonstrate a difference you have to get the lazy I/O to commute with some other I/O, and GHC will never do that. If you find a way to do it, then we'll probably consider it a bug in GHC. You can get lazy I/O to commute with other lazy I/O, and perhaps with some cunning arrangement of pipes (or something) that might be a way to solve the puzzle. Good luck! Cheers, Simon From nicolas.pouillard at gmail.com Tue Oct 6 09:18:27 2009 From: nicolas.pouillard at gmail.com (Nicolas Pouillard) Date: Tue Oct 6 08:53:03 2009 Subject: Unsafe hGetContents In-Reply-To: <4ACB3F1A.9090803@gmail.com> References: <877hvylk57.fsf@mid.deneb.enyo.de> <1253192223-sup-3700@peray> <87zl888fsh.fsf@mid.deneb.enyo.de> <4ACB3F1A.9090803@gmail.com> Message-ID: <1254835066-sup-8275@peray> Excerpts from Simon Marlow's message of Tue Oct 06 14:59:06 +0200 2009: > On 03/10/2009 19:59, Florian Weimer wrote: > > * Nicolas Pouillard: > > > >> Excerpts from Florian Weimer's message of Wed Sep 16 22:17:08 +0200 2009: > >>> Are there any plans to get rid of hGetContents and the semi-closed > >>> handle state for Haskell Prime? > >>> > >>> (I call hGetContents unsafe because it adds side effects to pattern > >>> matching, stricly speaking invalidating most of the transformations > >>> which are expected to be valid in a pure language.) > >> > >> Would you consider something like [1] as an acceptable replacement? > >> > >> [1]: http://hackage.haskell.org/package/safe-lazy-io > > > > It only addresses two known issues with lazy I/O, doesn't it? It > > still injects input operations into pure code not in the IO monad. > > While what you say is true, and I've complained about the same thing > myself in the past, it turns out to be quite difficult to demonstrate > the unsafety. > > Try it! Here's the rules. > > - write a program that gives different results when compiled with > different optimisation flags only. (one exception: you're not > allowed to take advantage of -fno-state-hack). > > - Using exceptions is not allowed (they're non-determinstic). > > - A difference caused by resources (e.g. stack overflow) doesn't > count. > > - The only "unsafe" operation you're allowed to use is hGetContents. > > - You're allowed to use any other I/O operations, including from > libraries, as long as they're not unsafe, and as long as the I/O > itself is deterministic. > > The reason it's hard is that to demonstrate a difference you have to get > the lazy I/O to commute with some other I/O, and GHC will never do that. > If you find a way to do it, then we'll probably consider it a bug in GHC. > > You can get lazy I/O to commute with other lazy I/O, and perhaps with > some cunning arrangement of pipes (or something) that might be a way to > solve the puzzle. Good luck! Oleg's example is quite close, don't you think? URL: http://www.haskell.org/pipermail/haskell/2009-March/021064.html Cheers, -- Nicolas Pouillard http://nicolaspouillard.fr From marlowsd at gmail.com Tue Oct 6 09:50:32 2009 From: marlowsd at gmail.com (Simon Marlow) Date: Tue Oct 6 09:28:13 2009 Subject: Unsafe hGetContents In-Reply-To: <1254835066-sup-8275@peray> References: <877hvylk57.fsf@mid.deneb.enyo.de> <1253192223-sup-3700@peray> <87zl888fsh.fsf@mid.deneb.enyo.de> <4ACB3F1A.9090803@gmail.com> <1254835066-sup-8275@peray> Message-ID: <4ACB4B28.6080907@gmail.com> On 06/10/2009 14:18, Nicolas Pouillard wrote: > Excerpts from Simon Marlow's message of Tue Oct 06 14:59:06 +0200 2009: >> On 03/10/2009 19:59, Florian Weimer wrote: >>> * Nicolas Pouillard: >>> >>>> Excerpts from Florian Weimer's message of Wed Sep 16 22:17:08 +0200 2009: >>>>> Are there any plans to get rid of hGetContents and the semi-closed >>>>> handle state for Haskell Prime? >>>>> >>>>> (I call hGetContents unsafe because it adds side effects to pattern >>>>> matching, stricly speaking invalidating most of the transformations >>>>> which are expected to be valid in a pure language.) >>>> >>>> Would you consider something like [1] as an acceptable replacement? >>>> >>>> [1]: http://hackage.haskell.org/package/safe-lazy-io >>> >>> It only addresses two known issues with lazy I/O, doesn't it? It >>> still injects input operations into pure code not in the IO monad. >> >> While what you say is true, and I've complained about the same thing >> myself in the past, it turns out to be quite difficult to demonstrate >> the unsafety. >> >> Try it! Here's the rules. >> >> - write a program that gives different results when compiled with >> different optimisation flags only. (one exception: you're not >> allowed to take advantage of -fno-state-hack). >> >> - Using exceptions is not allowed (they're non-determinstic). >> >> - A difference caused by resources (e.g. stack overflow) doesn't >> count. >> >> - The only "unsafe" operation you're allowed to use is hGetContents. >> >> - You're allowed to use any other I/O operations, including from >> libraries, as long as they're not unsafe, and as long as the I/O >> itself is deterministic. >> >> The reason it's hard is that to demonstrate a difference you have to get >> the lazy I/O to commute with some other I/O, and GHC will never do that. >> If you find a way to do it, then we'll probably consider it a bug in GHC. >> >> You can get lazy I/O to commute with other lazy I/O, and perhaps with >> some cunning arrangement of pipes (or something) that might be a way to >> solve the puzzle. Good luck! > > Oleg's example is quite close, don't you think? > > URL: http://www.haskell.org/pipermail/haskell/2009-March/021064.html Ah yes, if you have two lazy input streams both referring to the same underlying stream, that is enough to demonstrate a problem. As for whether Oleg's example is within the rules, it depends whether you consider fdToHandle as "unsafe": Haskell's IO library is carefully designed to not run into this problem on its own. It's normally not possible to get two Handles with the same FD, however GHC.IO.Handle.hDuplicate also lets you do this. Cheers, Simon From duncan.coutts at googlemail.com Tue Oct 6 09:57:41 2009 From: duncan.coutts at googlemail.com (Duncan Coutts) Date: Tue Oct 6 09:35:19 2009 Subject: Unsafe hGetContents In-Reply-To: <1254835066-sup-8275@peray> References: <877hvylk57.fsf@mid.deneb.enyo.de> <1253192223-sup-3700@peray> <87zl888fsh.fsf@mid.deneb.enyo.de> <4ACB3F1A.9090803@gmail.com> <1254835066-sup-8275@peray> Message-ID: <1254837461.28525.1371.camel@localhost> On Tue, 2009-10-06 at 15:18 +0200, Nicolas Pouillard wrote: > > The reason it's hard is that to demonstrate a difference you have to get > > the lazy I/O to commute with some other I/O, and GHC will never do that. > > If you find a way to do it, then we'll probably consider it a bug in GHC. > > > > You can get lazy I/O to commute with other lazy I/O, and perhaps with > > some cunning arrangement of pipes (or something) that might be a way to > > solve the puzzle. Good luck! > > Oleg's example is quite close, don't you think? > > URL: http://www.haskell.org/pipermail/haskell/2009-March/021064.html I didn't think that showed very much. He showed two different runs of two different IO programs where he got different results after having bypassed the safety switch on hGetContents. It shows that lazy IO is non-deterministic, but then we knew that. It didn't show anything was impure. As a software engineering thing, it's recommended to use lazy IO in the cases where the non-determinism has a low impact, ie where the order of the actions with respect to other actions doesn't really matter. When it does matter then your programs will probably be more comprehensible if you do the actions more explicitly. For example we have the shoot-yourself-in-the-foot restriction that you can only use hGetContents on a handle a single time (this is the safety mechanism that Oleg turned off) and after that you cannot write to the same handle. That's not because it'd be semantically unsound if those restrictions were not there, but it would let you write some jolly confusing non-deterministic programs. Duncan From oleg at okmij.org Sat Oct 10 05:51:16 2009 From: oleg at okmij.org (oleg@okmij.org) Date: Sat Oct 10 05:32:26 2009 Subject: Unsafe hGetContents Message-ID: <20091010095116.97262176DE@Adric.metnet.navy.mil> Simon Marlow wrote: > Ah yes, if you have two lazy input streams both referring to the same > underlying stream, that is enough to demonstrate a problem. As for > whether Oleg's example is within the rules, it depends whether you > consider fdToHandle as "unsafe" I wasn't aware of the rules. Fortunately, UNIX (FreeBSD and Linux) give plenty of opportunities to shoot oneself. Here is the code from the earlier message without the offending fdToHandle: > {- Haskell98! -} > > module Main where > > import System.IO > > -- f1 and f2 are both pure functions, with the pure type. > -- Both compute the result of the subtraction e1 - e2. > -- The only difference between them is the sequence of > -- evaluating their arguments, e1 `seq` e2 vs. e2 `seq` e1 > -- For really pure functions, that difference should not be observable > > f1, f2:: Int ->Int ->Int > > f1 e1 e2 = e1 `seq` e2 `seq` e1 - e2 > f2 e1 e2 = e2 `seq` e1 `seq` e1 - e2 > > read_int s = read . head . words $ s > > main = do > let h1 = stdin > h2 <- openFile "/dev/stdin" ReadMode > s1 <- hGetContents h1 > s2 <- hGetContents h2 > -- print $ f1 (read_int s1) (read_int s2) > print $ f2 (read_int s1) (read_int s2) It exhibits the same behavior that was described in http://www.haskell.org/pipermail/haskell/2009-March/021064.html I think Windows may have something similar. > The reason it's hard is that to demonstrate a difference you have to get > the lazy I/O to commute with some other I/O, and GHC will never do that. The keyword here is GHC. I may well believe that GHC is able to divine programmer's true intent and so it always does the right thing. But writing in the language standard ``do what the version x.y.z of GHC does'' does not seem very appropriate, or helpful to other implementors. > Haskell's IO library is carefully designed to not run into this > problem on its own. It's normally not possible to get two Handles > with the same FD... Is this behavior is specified somewhere, or is this just an artifact of a particular GHC implementation? From duncan.coutts at googlemail.com Sat Oct 10 10:38:32 2009 From: duncan.coutts at googlemail.com (Duncan Coutts) Date: Sat Oct 10 11:33:58 2009 Subject: Unsafe hGetContents In-Reply-To: <20091010095116.97262176DE@Adric.metnet.navy.mil> References: <20091010095116.97262176DE@Adric.metnet.navy.mil> Message-ID: <1255185512.28525.6806.camel@localhost> On Sat, 2009-10-10 at 02:51 -0700, oleg@okmij.org wrote: > > The reason it's hard is that to demonstrate a difference you have to get > > the lazy I/O to commute with some other I/O, and GHC will never do that. > > The keyword here is GHC. I may well believe that GHC is able to divine > programmer's true intent and so it always does the right thing. But > writing in the language standard ``do what the version x.y.z of GHC > does'' does not seem very appropriate, or helpful to other > implementors. With access to unsafeInterleaveIO it's fairly straightforward to show that it is non-deterministic. These programs that bypass the safety mechanisms on hGetContents just get us back to having access to the non-deterministic semantics of unsafeInterleaveIO. > > Haskell's IO library is carefully designed to not run into this > > problem on its own. It's normally not possible to get two Handles > > with the same FD... > Is this behavior is specified somewhere, or is this just an artifact > of a particular GHC implementation? It is in the Haskell 98 report, in the design of the IO library. It does not not mention FDs of course. The IO/Handle functions it provides give no (portable) way to obtain two read handles on the same OS file descriptor. The hGetContents behaviour of semi-closing is to stop you from getting two lazy lists of the same read Handle. There's nothing semantically wrong with you bypassing those restrictions (eg openFile "/dev/fd/0") it just means you end up with a non-deterministic IO program, which is something we typically try to avoid. I am a bit perplexed by this whole discussion. It seems to come down to saying that unsafeInterleaveIO is non-deterministic and that things implemented on top are also non-deterministic. The standard IO library puts up some barriers to restrict the non-determinism, but if you walk around the barrier then you can still find it. It's not clear to me what is supposed to be surprising or alarming here. Duncan From iavor.diatchki at gmail.com Sat Oct 10 13:59:33 2009 From: iavor.diatchki at gmail.com (Iavor Diatchki) Date: Sat Oct 10 13:36:57 2009 Subject: Unsafe hGetContents In-Reply-To: <1255185512.28525.6806.camel@localhost> References: <20091010095116.97262176DE@Adric.metnet.navy.mil> <1255185512.28525.6806.camel@localhost> Message-ID: <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> Hello, well, I think that the fact that we seem to have a program context that can distinguish "f1" from "f2" is worth discussing because I would have thought that in a pure language they are interchangable. The question is, does the context in Oleg's example really distinguish between "f1" and "f2"? You seem to be saying that this is not the case: in both cases you end up with the same non-deterministic program that reads two numbers from the standard input and subtracts them but you can't assume anything about the order in which the numbers are extracted from the input---it is merely an artifact of the GHC implementation that with "f1" the subtraction always happens the one way, and with "f2" it happens the other way. I can (sort of) buy this argument, after all, it is quite similar to what happens with asynchronous exceptions (f1 (error "1") (error "2") vs f2 (error "1") (error "2")). Still, the whole thing does not "smell right": there is some impurity going on here, and trying to offload the problem onto the IO monad only makes reasoning about IO computations even harder (and it is petty hard to start with). So, discussion and alternative solutions should be strongly encouraged, I think. -Iavor On Sat, Oct 10, 2009 at 7:38 AM, Duncan Coutts wrote: > On Sat, 2009-10-10 at 02:51 -0700, oleg@okmij.org wrote: > >> > The reason it's hard is that to demonstrate a difference you have to get >> > the lazy I/O to commute with some other I/O, and GHC will never do that. >> >> The keyword here is GHC. I may well believe that GHC is able to divine >> programmer's true intent and so it always does the right thing. But >> writing in the language standard ``do what the version x.y.z of GHC >> does'' does not seem very appropriate, or helpful to other >> implementors. > > With access to unsafeInterleaveIO it's fairly straightforward to show > that it is non-deterministic. These programs that bypass the safety > mechanisms on hGetContents just get us back to having access to the > non-deterministic semantics of unsafeInterleaveIO. > >> > Haskell's IO library is carefully designed to not run into this >> > problem on its own. ?It's normally not possible to get two Handles >> > with the same FD... > >> Is this behavior is specified somewhere, or is this just an artifact >> of a particular GHC implementation? > > It is in the Haskell 98 report, in the design of the IO library. It does > not not mention FDs of course. The IO/Handle functions it provides give > no (portable) way to obtain two read handles on the same OS file > descriptor. The hGetContents behaviour of semi-closing is to stop you > from getting two lazy lists of the same read Handle. > > There's nothing semantically wrong with you bypassing those restrictions > (eg openFile "/dev/fd/0") it just means you end up with a > non-deterministic IO program, which is something we typically try to > avoid. > > I am a bit perplexed by this whole discussion. It seems to come down to > saying that unsafeInterleaveIO is non-deterministic and that things > implemented on top are also non-deterministic. The standard IO library > puts up some barriers to restrict the non-determinism, but if you walk > around the barrier then you can still find it. It's not clear to me what > is supposed to be surprising or alarming here. > > Duncan > > _______________________________________________ > Haskell-prime mailing list > Haskell-prime@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-prime > From fw at deneb.enyo.de Sun Oct 11 04:26:10 2009 From: fw at deneb.enyo.de (Florian Weimer) Date: Sun Oct 11 04:03:34 2009 Subject: Unsafe hGetContents In-Reply-To: <4ACB4B28.6080907@gmail.com> (Simon Marlow's message of "Tue, 06 Oct 2009 14:50:32 +0100") References: <877hvylk57.fsf@mid.deneb.enyo.de> <1253192223-sup-3700@peray> <87zl888fsh.fsf@mid.deneb.enyo.de> <4ACB3F1A.9090803@gmail.com> <1254835066-sup-8275@peray> <4ACB4B28.6080907@gmail.com> Message-ID: <87skdql4kd.fsf@mid.deneb.enyo.de> * Simon Marlow: >> Oleg's example is quite close, don't you think? >> >> URL: http://www.haskell.org/pipermail/haskell/2009-March/021064.html > > Ah yes, if you have two lazy input streams both referring to the same > underlying stream, that is enough to demonstrate a problem. As for > whether Oleg's example is within the rules, it depends whether you > consider fdToHandle as "unsafe": Is relying on seq to show the difference allowed, according to your rules on an insecurity proof? What about handles from System.Process? Do they count as well? From apfelmus at quantentunnel.de Sun Oct 11 05:35:55 2009 From: apfelmus at quantentunnel.de (Heinrich Apfelmus) Date: Sun Oct 11 05:13:42 2009 Subject: Unsafe hGetContents In-Reply-To: <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> References: <20091010095116.97262176DE@Adric.metnet.navy.mil> <1255185512.28525.6806.camel@localhost> <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> Message-ID: Iavor Diatchki wrote: > Hello, > > well, I think that the fact that we seem to have a program context > that can distinguish "f1" from "f2" is worth discussing because I > would have thought that in a pure language they are interchangable. > The question is, does the context in Oleg's example really distinguish > between "f1" and "f2"? You seem to be saying that this is not the > case: in both cases you end up with the same non-deterministic > program that reads two numbers from the standard input and subtracts > them but you can't assume anything about the order in which the > numbers are extracted from the input---it is merely an artifact of the > GHC implementation that with "f1" the subtraction always happens the > one way, and with "f2" it happens the other way. > > I can (sort of) buy this argument, after all, it is quite similar to > what happens with asynchronous exceptions (f1 (error "1") (error "2") > vs f2 (error "1") (error "2")). Still, the whole thing does not > "smell right": there is some impurity going on here, and trying to > offload the problem onto the IO monad only makes reasoning about IO > computations even harder (and it is petty hard to start with). So, > discussion and alternative solutions should be strongly encouraged, I > think. To put it in different words, here an elaboration on what exactly the non-determinism argument is: Consider programs foo1 and foo2 defined as foo :: (a -> b -> c) -> IO String foo f = Control.Exception.catch (evaluate (f (error "1") (error "2")) >> return "3") (\(ErrorCall s) -> return s) foo1 = foo f1 where f1 x y = x `seq` y `seq` () foo2 = foo f2 where f2 x y = y `seq` x `seq` () Knowing how exceptions and seq behave in GHC, it is straightforward to prove that foo1 = return "1" foo2 = return "2" which clearly violates referential transparency. This is bad, so the idea is to disallow the proof. In particular, the idea is that referential transparency can be restored if we only allow proofs that work for all evaluation orders, which is equivalent to introducing non-determinism. In other words, we are only allowed to prove foo1 = return "1" or return "2" foo2 = return "1" or return "2" Moreover, we can push the non-determinism into the IO type and pretend that pure functions A -> B are semantically lifted to Nondet A -> Nondet B with some kind of fmap . The same goes for hGetContents : if you use it twice on the same handle, you're only allowed to prove non-deterministic behavior, which is not very useful if you want a deterministic program. But you are allowed to prove deterministic results if you use it with appropriate caution. In other words, the language semantics guarantees less than GHC actually does. In particular, the semantics only allows reasoning that is independent of the evaluation order and this means to treat IO as non-deterministic in certain cases. Regards, apfelmus -- http://apfelmus.nfshost.com From duncan.coutts at googlemail.com Sun Oct 11 10:05:41 2009 From: duncan.coutts at googlemail.com (Duncan Coutts) Date: Sun Oct 11 09:43:12 2009 Subject: Unsafe hGetContents In-Reply-To: <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> References: <20091010095116.97262176DE@Adric.metnet.navy.mil> <1255185512.28525.6806.camel@localhost> <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> Message-ID: <1255269941.8777.54.camel@localhost> On Sat, 2009-10-10 at 10:59 -0700, Iavor Diatchki wrote: > Hello, > > well, I think that the fact that we seem to have a program context > that can distinguish "f1" from "f2" is worth discussing because I > would have thought that in a pure language they are interchangable. Crucially they are contexts in an IO program. > The question is, does the context in Oleg's example really distinguish > between "f1" and "f2"? You seem to be saying that this is not the > case: in both cases you end up with the same non-deterministic > program that reads two numbers from the standard input and subtracts > them but you can't assume anything about the order in which the > numbers are extracted from the input---it is merely an artifact of the > GHC implementation that with "f1" the subtraction always happens the > one way, and with "f2" it happens the other way. Right. > I can (sort of) buy this argument, after all, it is quite similar to > what happens with asynchronous exceptions (f1 (error "1") (error "2") > vs f2 (error "1") (error "2")). Still, the whole thing does not > "smell right": there is some impurity going on here, No, there's no impurity. > and trying to offload the problem onto the IO monad only makes > reasoning about IO computations even harder (and it is petty hard to > start with). Sure, reasoning about non-deterministic IO programs is tricky. But then nobody here is advocating writing non-deterministic IO programs. Lazy IO is sensible and useful when the non-determinism doesn't make any difference to the results. Lets look at a simplified case, instead of general IO and the whole OS API at our disposal, lets look at the case of a single thread of control and mutable variables, specifically the ST monad. We can construct a semantics for this based on a sequence of read / write events for the mutable variables. The ST monad bind gives guarantees about the ordering of the events. So the ST programs are deterministic. do x <- readSTRef v writeSTRef v (x+1) writeSTRef v (x+2) The semantics of this ST program is the trace read(v,x) write(v,x+1) write(v,x+2) We could introduce non-determinism to this system by allowing read / write events to be arbitrarily interleaved with other subsequent events: do x <- readSTRef v unsafeInterleaveST $ writeSTRef v (x+1) writeSTRef v (x+2) now we can have two traces: read(v,x) write(v,x+1) write(v,x+2) or read(v,x) write(v,x+2) write(v,x+1) The semantics is the set of traces, in this case just the two. Of course with this modified ST system we cannot allow a pure runST because we've got non-deterministic ST programs (or we could make it pure by returning the full set of traces). But it'd be ok for IO. Now working with and reasoning about these non-deterministic ST programs is tricky. Depending on the implementation choice for the interleaving we'll get different results and under some implementation choices we'll be able to influence the result by coding pure bits of the program differently. None of this changes the semantics since the semantics just says any possible interleaving is OK. Another interesting thing to note is that we can limit the interleaving somewhat by forcing deferred events to come before other subsequent events: do writeSTRef v 1 x <- unsafeInterleaveST $ readSTRef v writeSTRef v 2 evaluate x writeSTRef v 3 So in the traces for this program, x can have the value 1 or 2 but not 3 because of the partial order on events that we impose using evaluate. We can also do something like Oleg's example (simplified to only a single getChar rather than reading the whole input stream) do let fileContent = "hello" seekPoint <- newSTRef 0 let getChar = do s <- readIORef seekPoint writeIORef seekPoint (s+1) return (fileContent !! s) s1 <- unsafeInterleaveST getChar s2 <- unsafeInterleaveST getChar --evaluate (f1 s1 s2) evaluate (f2 s1 s2) Under some implementations of the interleaving we can expect to get different event interleavings for the f1 program vs the f2 program. So we apparently have a pure function influencing the event ordering. Of course the semantics says we have both event orderings anyway. It is also possible to write ST programs that produce the same result irrespective of the event interleaving. These programs might actually be useful. For example: do writeSTRef v 1 x <- unsafeInterleaveST $ readSTRef v ... -- no more writes to v So here we allow the read from v do be performed any time. We still have loads of different possible traces, but the value of x is the same in each, because the v variable is never written to again. In IO with the full OS API and other programs running concurrently it is harder to reason about. But we can see similar possibilities for non-deterministic primitives where we can still get a deterministic result. One of those is if we read from a mutable variable (a file) and can be sure that there are no other writes to that mutable variable. It's further complicated by the fact that we don't directly read from files but rather from a file descriptor which is a mutable variable containing a seek point into a file. When we read from a file we mutate the seek point. So we also have to guarantee that we are not doing that more than once on the same buffer/seek point (OS FD). If we can assure ourselves of all that then we should be able to show that the result is the same irrespective of the event interleavings (because we've banned the "interfering" events that would influence the result). Of course if we do not maintain those side conditions then we just get back to the non-deterministic hard-to-reason-about IO programs. Duncan From ml at isaac.cedarswampstudios.org Sun Oct 11 13:48:36 2009 From: ml at isaac.cedarswampstudios.org (Isaac Dupree) Date: Sun Oct 11 13:26:07 2009 Subject: Unsafe hGetContents In-Reply-To: <1255269941.8777.54.camel@localhost> References: <20091010095116.97262176DE@Adric.metnet.navy.mil> <1255185512.28525.6806.camel@localhost> <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> <1255269941.8777.54.camel@localhost> Message-ID: <4AD21A74.5030704@isaac.cedarswampstudios.org> Hmm, Don't you think forkIO deserves some of the same complaints as unsafeInterleaveIO? Things happen in a nondeterministic order! I think what irritates us about unsafeInterleaveIO is that it's IO that tinkers with the internals of the Haskell evaluation system. The OS can't do it: in a C program it might, because there's libc and debuggers and all kinds of things that understand compiled C to some extent. But the Haskell runtime system is pretty much obfuscated to anyone except ourselves. This obfuscation maintains its conceptual purity to a greater extent than is really guaranteed by the standards. This obfuscation is supported in our minds by the fact that functions (->) cannot be compared for equality or deconstructed or serialized in any way, only applied. forkIO causes IO to happen in a nondeterministic order. So does unsafeInterleaveIO. But for unsafeInterleaveIO, the nondeterminism depends in part on how pure functions are written: partly because there is a compiler that makes arbitrary choices, and also partly affected by the strictness properties of the functions. This feels disconcerting to us. And worse: I am not sure if forkIO has a formal guarantee that its IO will complete, but we tend to assume that it will, sooner or later; unsafeInterleaveIO might not happen at all, and frequently does not, due to the observations of how pure functions are written. It's disconcerting. It can affect how we choose to write our pure code, the same way that efficiency and memory concerns can. But if 'catch' can catch a different exception depending even, conceptually, on the phase of the moon, it is a similarly strange stretch to imagine unsafeInterleaveIO doing so. It plays with chronology (like forkIO does) and with the ways Haskell functions are written (like 'catch' does) at the same time. A result is that it makes a lot of people confused when they do something they didn't intend with it. Also, it's a powerful enough tool that when you want to replace its formal nondeterminism with something more precise, you may have quite a bit of work cut out for you, restructuring your code (like Darcs did, IIRC). -Isaac From marlowsd at gmail.com Mon Oct 12 04:21:43 2009 From: marlowsd at gmail.com (Simon Marlow) Date: Mon Oct 12 03:59:13 2009 Subject: Unsafe hGetContents In-Reply-To: <87skdql4kd.fsf@mid.deneb.enyo.de> References: <877hvylk57.fsf@mid.deneb.enyo.de> <1253192223-sup-3700@peray> <87zl888fsh.fsf@mid.deneb.enyo.de> <4ACB3F1A.9090803@gmail.com> <1254835066-sup-8275@peray> <4ACB4B28.6080907@gmail.com> <87skdql4kd.fsf@mid.deneb.enyo.de> Message-ID: <4AD2E717.8070103@gmail.com> On 11/10/2009 09:26, Florian Weimer wrote: > * Simon Marlow: > >>> Oleg's example is quite close, don't you think? >>> >>> URL: http://www.haskell.org/pipermail/haskell/2009-March/021064.html >> >> Ah yes, if you have two lazy input streams both referring to the same >> underlying stream, that is enough to demonstrate a problem. As for >> whether Oleg's example is within the rules, it depends whether you >> consider fdToHandle as "unsafe": > > Is relying on seq to show the difference allowed, according to your > rules on an insecurity proof? Absolutely. > What about handles from System.Process? Do they count as well? Sure - we hopefully don't consider System.Process to be unsafe. Cheers, Simon From marlowsd at gmail.com Tue Oct 20 08:58:40 2009 From: marlowsd at gmail.com (Simon Marlow) Date: Tue Oct 20 08:35:40 2009 Subject: Unsafe hGetContents In-Reply-To: <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> References: <20091010095116.97262176DE@Adric.metnet.navy.mil> <1255185512.28525.6806.camel@localhost> <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> Message-ID: <4ADDB400.1020902@gmail.com> On 10/10/2009 18:59, Iavor Diatchki wrote: > Hello, > > well, I think that the fact that we seem to have a program context > that can distinguish "f1" from "f2" is worth discussing because I > would have thought that in a pure language they are interchangable. > The question is, does the context in Oleg's example really distinguish > between "f1" and "f2"? You seem to be saying that this is not the > case: in both cases you end up with the same non-deterministic > program that reads two numbers from the standard input and subtracts > them but you can't assume anything about the order in which the > numbers are extracted from the input---it is merely an artifact of the > GHC implementation that with "f1" the subtraction always happens the > one way, and with "f2" it happens the other way. > > I can (sort of) buy this argument, after all, it is quite similar to > what happens with asynchronous exceptions (f1 (error "1") (error "2") > vs f2 (error "1") (error "2")). Still, the whole thing does not > "smell right": there is some impurity going on here, and trying to > offload the problem onto the IO monad only makes reasoning about IO > computations even harder (and it is petty hard to start with). So, > discussion and alternative solutions should be strongly encouraged, I > think. Duncan has found a definition of hGetContents that explains why it has surprising behaviour, and that's very nice because it lets us write the compilers that we want to write, and we get to tell the users to stop moaning because the strange behaviour they're experiencing is allowed according to the spec. :-) Of course, the problem is that users don't want the hGetContents that has non-deterministic semantics, they want a deterministic one. And for that, they want to fix the evaluation order (or something). The obvious drawback with fixing the evaluation order is that it ties the hands of the compiler developers, and makes a fundamental change to the language definition. Things will get a lot worse in the future as we experiment with more elaborate compiler optimisations and evaluation strategies. I predict that eventually we'll have to ditch hGetContents, at least in its current generality. Cheers, Simon > -Iavor > > > > > > > > On Sat, Oct 10, 2009 at 7:38 AM, Duncan Coutts > wrote: >> On Sat, 2009-10-10 at 02:51 -0700, oleg@okmij.org wrote: >> >>>> The reason it's hard is that to demonstrate a difference you have to get >>>> the lazy I/O to commute with some other I/O, and GHC will never do that. >>> >>> The keyword here is GHC. I may well believe that GHC is able to divine >>> programmer's true intent and so it always does the right thing. But >>> writing in the language standard ``do what the version x.y.z of GHC >>> does'' does not seem very appropriate, or helpful to other >>> implementors. >> >> With access to unsafeInterleaveIO it's fairly straightforward to show >> that it is non-deterministic. These programs that bypass the safety >> mechanisms on hGetContents just get us back to having access to the >> non-deterministic semantics of unsafeInterleaveIO. >> >>>> Haskell's IO library is carefully designed to not run into this >>>> problem on its own. It's normally not possible to get two Handles >>>> with the same FD... >> >>> Is this behavior is specified somewhere, or is this just an artifact >>> of a particular GHC implementation? >> >> It is in the Haskell 98 report, in the design of the IO library. It does >> not not mention FDs of course. The IO/Handle functions it provides give >> no (portable) way to obtain two read handles on the same OS file >> descriptor. The hGetContents behaviour of semi-closing is to stop you >> from getting two lazy lists of the same read Handle. >> >> There's nothing semantically wrong with you bypassing those restrictions >> (eg openFile "/dev/fd/0") it just means you end up with a >> non-deterministic IO program, which is something we typically try to >> avoid. >> >> I am a bit perplexed by this whole discussion. It seems to come down to >> saying that unsafeInterleaveIO is non-deterministic and that things >> implemented on top are also non-deterministic. The standard IO library >> puts up some barriers to restrict the non-determinism, but if you walk >> around the barrier then you can still find it. It's not clear to me what >> is supposed to be surprising or alarming here. >> >> Duncan >> >> _______________________________________________ >> Haskell-prime mailing list >> Haskell-prime@haskell.org >> http://www.haskell.org/mailman/listinfo/haskell-prime >> From duncan.coutts at googlemail.com Tue Oct 20 10:24:32 2009 From: duncan.coutts at googlemail.com (Duncan Coutts) Date: Tue Oct 20 10:01:28 2009 Subject: Unsafe hGetContents In-Reply-To: <4ADDB400.1020902@gmail.com> References: <20091010095116.97262176DE@Adric.metnet.navy.mil> <1255185512.28525.6806.camel@localhost> <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> <4ADDB400.1020902@gmail.com> Message-ID: <1256048672.8777.3916.camel@localhost> On Tue, 2009-10-20 at 13:58 +0100, Simon Marlow wrote: > Duncan has found a definition of hGetContents that explains why it has > surprising behaviour, and that's very nice because it lets us write the > compilers that we want to write, and we get to tell the users to stop > moaning because the strange behaviour they're experiencing is allowed > according to the spec. :-) :-) > Of course, the problem is that users don't want the hGetContents that > has non-deterministic semantics, they want a deterministic one. And for > that, they want to fix the evaluation order (or something). The obvious > drawback with fixing the evaluation order is that it ties the hands of > the compiler developers, and makes a fundamental change to the language > definition. I've not yet seen anyone put forward any practical programs that have confusing behaviour but were not written deliberately to be as wacky as possible and avoid all the safety mechanism. The standard use case for hGetContents is reading a read-only file, or stdin where it really does not matter when the read actions occur with respect to other IO actions. You could do it in parallel rather than on-demand and it'd still be ok. There's the beginner mistake where people don't notice that they're not actually demanding anything before closing the file, that's nothing new of course. Duncan From marlowsd at gmail.com Tue Oct 20 10:45:09 2009 From: marlowsd at gmail.com (Simon Marlow) Date: Tue Oct 20 10:22:08 2009 Subject: Unsafe hGetContents In-Reply-To: <1256048672.8777.3916.camel@localhost> References: <20091010095116.97262176DE@Adric.metnet.navy.mil> <1255185512.28525.6806.camel@localhost> <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> <4ADDB400.1020902@gmail.com> <1256048672.8777.3916.camel@localhost> Message-ID: <4ADDCCF5.20106@gmail.com> On 20/10/2009 15:24, Duncan Coutts wrote: > On Tue, 2009-10-20 at 13:58 +0100, Simon Marlow wrote: > >> Duncan has found a definition of hGetContents that explains why it has >> surprising behaviour, and that's very nice because it lets us write the >> compilers that we want to write, and we get to tell the users to stop >> moaning because the strange behaviour they're experiencing is allowed >> according to the spec. :-) > > :-) > >> Of course, the problem is that users don't want the hGetContents that >> has non-deterministic semantics, they want a deterministic one. And for >> that, they want to fix the evaluation order (or something). The obvious >> drawback with fixing the evaluation order is that it ties the hands of >> the compiler developers, and makes a fundamental change to the language >> definition. > > I've not yet seen anyone put forward any practical programs that have > confusing behaviour but were not written deliberately to be as wacky as > possible and avoid all the safety mechanism. > > The standard use case for hGetContents is reading a read-only file, or > stdin where it really does not matter when the read actions occur with > respect to other IO actions. You could do it in parallel rather than > on-demand and it'd still be ok. > > There's the beginner mistake where people don't notice that they're not > actually demanding anything before closing the file, that's nothing new > of course. If the parallel runtime reads files eagerly, that might hide a resource problem that would occur when the program is run on a sequential system, for example. Cheers, Simon From duncan.coutts at googlemail.com Tue Oct 20 11:39:40 2009 From: duncan.coutts at googlemail.com (Duncan Coutts) Date: Tue Oct 20 11:16:35 2009 Subject: Unsafe hGetContents In-Reply-To: <4ADDCCF5.20106@gmail.com> References: <20091010095116.97262176DE@Adric.metnet.navy.mil> <1255185512.28525.6806.camel@localhost> <5ab17e790910101059i4effbb88u5d6c9e6135c85050@mail.gmail.com> <4ADDB400.1020902@gmail.com> <1256048672.8777.3916.camel@localhost> <4ADDCCF5.20106@gmail.com> Message-ID: <1256053180.8777.3993.camel@localhost> On Tue, 2009-10-20 at 15:45 +0100, Simon Marlow wrote: > > I've not yet seen anyone put forward any practical programs that have > > confusing behaviour but were not written deliberately to be as wacky as > > possible and avoid all the safety mechanism. > > > > The standard use case for hGetContents is reading a read-only file, or > > stdin where it really does not matter when the read actions occur with > > respect to other IO actions. You could do it in parallel rather than > > on-demand and it'd still be ok. > > > > There's the beginner mistake where people don't notice that they're not > > actually demanding anything before closing the file, that's nothing new > > of course. > > If the parallel runtime reads files eagerly, that might hide a resource > problem that would occur when the program is run on a sequential system, > for example. That's true, but we have the same problem without doing any IO. There are many ways of generating large amounts of data. Duncan From mauricio.antunes at gmail.com Fri Oct 23 13:05:19 2009 From: mauricio.antunes at gmail.com (=?ISO-8859-1?Q?Maur=ED=ADcio_CA?=) Date: Fri Oct 23 12:46:52 2009 Subject: On support of C99 complex numbers Message-ID: Hi, I would like to make a formal request for inclusion of complex numbers in Foreign.C.Types. Which steps should I follow? Here is a start on what I would write in such request. ----- Motivation: As complex is available in C, libraries can pass then by value in function calls, making it difficult to directly wrap such functions in Haskell. GNU GSL, for instance, even if actually using C < 99, does that with a struct type that is supposed to be binary equivalent to C99 double complex. Syntax: My sugestion is to add these: CFComplex --> for 'float complex' CDComplex --> for 'double complex' CLDComplex --> for 'long double complex' I believe this is better than trying to use a kind *->* type, as that would bring little help when actually wrapping C functions. Also, I think it's not important to have field names for real and imaginary, as long as we have an easy convertion for Data.Complex. Also, if we want to take imaginary and real parts in "pure C" Haskell code, we can use available C functions like cimag, creal, cimagf, crealf etc. ----- Thanks for your attention, Maur?cio