From claus.reinke at talk21.com Sat Sep 1 09:39:37 2007 From: claus.reinke at talk21.com (Claus Reinke) Date: Sat Sep 1 09:30:20 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes References: <20070901115444.GA28872@cvs.haskell.org> Message-ID: <007801c7ec9d$8b1c3250$54257ad5@cr3lt> [subject taken from cvs-libraries@; discussion directed to libraries@] does anyone else feel that there is something wrong in haskell library land? i'm picking on cabal here as one of "the usual suspects", but it seems to have become the rule that otherwise stable code has to be fixed every so often, to accomodate new compiler versions, new library dependency versions, new cabal versions, new xyz versions,.. it is almost as if everything feels free these days to evolve in non- backwards-compatible ways, following the motto "what do i care for my apis of yesterday?". with the ongoing trend towards separately evolving libraries rather than prepackaged kitchen-sink releases, this means that useful libraries die quickly, and have to be revived continuously, or they will be left behind. one symptom is "get the latest from hackage" replacing useful extra libraries kept in sync with each other (never mind that the hackage versions are no more likely to work without fixes than the in-repository versions). it is often small things ("that function/option has been renamed", "you now need to import x instead of y", "you can work around this by using 2 cabal files, then removing one depending on context", etc.), but as all dependencies keep eroding in this way, haskell projects are now built on sand rather than firm foundations, requiring constant attention just to avoid falling behind - attention that would better be focussed on development than maintenance. just a thought, claus ps. perhaps i've misunderstood, and there is in fact a haskell cabal trying to introduce as many version incompatibilities as possible, to ensure a market demand for cabal.. ?-) From ross at soi.city.ac.uk Sat Sep 1 09:55:03 2007 From: ross at soi.city.ac.uk (Ross Paterson) Date: Sat Sep 1 09:45:53 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <007801c7ec9d$8b1c3250$54257ad5@cr3lt> References: <20070901115444.GA28872@cvs.haskell.org> <007801c7ec9d$8b1c3250$54257ad5@cr3lt> Message-ID: <20070901135503.GB3474@soi.city.ac.uk> On Sat, Sep 01, 2007 at 02:39:37PM +0100, Claus Reinke wrote: > does anyone else feel that there is something wrong in haskell library > land? i'm picking on cabal here as one of "the usual suspects", but it > seems to have become the rule that otherwise stable code has to be fixed > every so often, to accomodate new compiler versions, new library dependency > versions, new cabal versions, new xyz versions,.. Cabal has a different development process from all the other libraries: you can't generalize from it. From nominolo at googlemail.com Sat Sep 1 10:45:46 2007 From: nominolo at googlemail.com (Thomas Schilling) Date: Sat Sep 1 10:36:36 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <007801c7ec9d$8b1c3250$54257ad5@cr3lt> References: <20070901115444.GA28872@cvs.haskell.org> <007801c7ec9d$8b1c3250$54257ad5@cr3lt> Message-ID: <1188657946.14131.9.camel@intothevoid> On Sat, 2007-09-01 at 14:39 +0100, Claus Reinke wrote: > [subject taken from cvs-libraries@; discussion directed to libraries@] > > does anyone else feel that there is something wrong in haskell library > land? i'm picking on cabal here as one of "the usual suspects", but it > seems to have become the rule that otherwise stable code has to be > fixed every so often, to accomodate new compiler versions, new > library dependency versions, new cabal versions, new xyz versions,.. > > it is almost as if everything feels free these days to evolve in non- > backwards-compatible ways, following the motto "what do i care > for my apis of yesterday?". > > with the ongoing trend towards separately evolving libraries rather > than prepackaged kitchen-sink releases, this means that useful > libraries die quickly, and have to be revived continuously, or they > will be left behind. one symptom is "get the latest from hackage" > replacing useful extra libraries kept in sync with each other (never > mind that the hackage versions are no more likely to work without > fixes than the in-repository versions). > > it is often small things ("that function/option has been renamed", > "you now need to import x instead of y", "you can work around > this by using 2 cabal files, then removing one depending on > context", etc.), but as all dependencies keep eroding in this way, > haskell projects are now built on sand rather than firm foundations, > requiring constant attention just to avoid falling behind - attention > that would better be focussed on development than maintenance. > > just a thought, > claus > > ps. perhaps i've misunderstood, and there is in fact a haskell > cabal trying to introduce as many version incompatibilities > as possible, to ensure a market demand for cabal.. ?-) I agree that Cabal is really unstable ATM. The problem is that we added new features, but are still missing lots of features and are facing yet unsolved problems. Also, the Cabal code needs quite some cleanup which we do while adding new features. The problem is, that this brakes many Setup.lhs files, also in base libraries. But this should only affect ghc HEAD, nothing else. This is the development version, so you can't really expect it to be stable at all. So, could you please exclude Cabal (and stay with me when I say we hope to have things stable by the next ghc release), re-evaluate the situation excluding anything related to Cabal issues (ie, setup.lhs errors), and see if your issue still exists? Thanks, / Thomas From sven.panne at aedion.de Sat Sep 1 11:24:51 2007 From: sven.panne at aedion.de (Sven Panne) Date: Sat Sep 1 11:15:33 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <1188657946.14131.9.camel@intothevoid> References: <20070901115444.GA28872@cvs.haskell.org> <007801c7ec9d$8b1c3250$54257ad5@cr3lt> <1188657946.14131.9.camel@intothevoid> Message-ID: <200709011724.52017.sven.panne@aedion.de> On Saturday 01 September 2007 16:45, Thomas Schilling wrote: > On Sat, 2007-09-01 at 14:39 +0100, Claus Reinke wrote: > > [subject taken from cvs-libraries@; discussion directed to libraries@] There are even more patches sitting on my disk regarding this area, stay tuned... :-P > [...] > So, could you please exclude Cabal (and stay with me when I say we hope > to have things stable by the next ghc release), re-evaluate the > situation excluding anything related to Cabal issues (ie, setup.lhs > errors), and see if your issue still exists? The main problem is that I've been hearing the sentence "Cabal is unstable at the moment, but with the next GHC release everything will be fixed and rock-solid, never changing again, ..." at least for a year now. In my experience, Cabal is *the* #1 reason for breaking build for aeons, and this is really getting frustrating. When trying to build a GHC RPM at an arbitrary point in time, you have an almost 99% chance that it won't work. The GHC project has really made a step backwards in this respect, and I hope that this will improve again. I really wish back the good old times when "make" was king... (Does anybody remember changes in "make"'s basic syntax? I don't...) And while we are at regressions: Although darcs concepts are OK (although personally I would have been happier with Subversion's model, you can always easily do a 3-way merge for personal development), the performance for getting repositories is ridiculous: Due to various obscure things I've experienced, partial repositories are not an option for developers, but getting complete repositories for GHC + extra libs takes about half a day, even when you have a "fat" line to the Internet. The tarball snapshots of the repositories are not really an option in the long run IMHO and defeat the purpose of a versioning tool. To be usable, a speedup of at least factor 10 would be required. Is there any hope for this? The aging CVS at least scaled... Cheers, S. (going back to the next build failure...) From claus.reinke at talk21.com Sat Sep 1 12:20:23 2007 From: claus.reinke at talk21.com (Claus Reinke) Date: Sat Sep 1 12:11:22 2007 Subject: patch applied (packages/regex-base): Make setup scriptcompileagain after recent Cabal changes References: <20070901115444.GA28872@cvs.haskell.org><007801c7ec9d$8b1c3250$54257ad5@cr3lt> <1188657946.14131.9.camel@intothevoid> Message-ID: >> it is almost as if everything feels free these days to evolve in non- >> backwards-compatible ways, following the motto "what do i care >> for my apis of yesterday?". i picked cabal as an easily demonstrated *example*, not as the only culprit. for instance, if base gets split up into smaller parts, and the *remainder* gets called base, even though it no longer provides the same functionality as the old base package, that is not cabal's fault. i'm not interested in blame-games, and my concern is with a general trend rather than any particular tool or library. as far as _stability only_ is concerned, there is nothing wrong with *adding* features or with *cleaning up* existing code; the problems come when *externally used* features are *removed* or *new dependencies* are introduced. lets have a look at that base package example, for a change: if the remainder had been called base-core and base had become a proxy package that provides no functionality other than recording the new dependencies needed to get hold of the old functionality, by depending on the spin-off packages, then there wouldn't have been any externally visible changes to base, old code depending on base would still work as long as base was installed, and new code could bypass base, using base-core and other spin-off packages directly. same for cabal: if it learns to do new things, do old things in nicer ways, or interpret more information, *that should not affect any old code*. if old code breaks, that means that old information is no longer accepted, old dependencies are no longer sufficient, or old functionality is no longer provided. > This is the development version, so you can't really expect it to > be stable at all. i'm not concerned with the development versions of cabal (and again:none of this is specific to cabal), which i never expect to see. i'm concerned with the versions of cabal that are in active use, in ghc head, hackage or elsewhere. if a version of some centrally important software (be it library or tool) is used, then that version's functionality/api should be supported for as long as practical. compatibility packages and automatic translators are not ideal, but better than nothing; simply abandoning old clients is the worst situation. ideally, new versions of such software should still support old clients, while warning about uses of deprecated functionality (somehow, deprecation warnings have come to mean "it will be gone in the next version", when it used to mean "it will not be here indefinitely"). > So, could you please exclude Cabal (and stay with me when > I say we hope to have things stable by the next ghc release), > re-evaluate the situation excluding anything related to Cabal > issues (ie, setup.lhs errors), and see if your issue still exists? yes, it does. it just turns up in cabal so often because (a) there's a lot of development there (which is great!-), (b) cabal is aiming to become central to all haskell development (which is rather a huge burden of responsibility). the more cabal succeeds, the more maintenance issues will be related to cabal, and the more cabal is used, the higher the standards for backwards-compatibility it should aspire to achieve. :-) claus From ross at soi.city.ac.uk Sat Sep 1 14:13:17 2007 From: ross at soi.city.ac.uk (Ross Paterson) Date: Sat Sep 1 14:04:01 2007 Subject: patch applied (packages/regex-base): Make setup scriptcompileagain after recent Cabal changes In-Reply-To: References: <1188657946.14131.9.camel@intothevoid> Message-ID: <20070901181317.GA3942@soi.city.ac.uk> On Sat, Sep 01, 2007 at 05:20:23PM +0100, Claus Reinke wrote: > lets have a look at that base package example, for a change: if the > remainder had been called base-core and base had become a proxy > package that provides no functionality other than recording the new > dependencies needed to get hold of the old functionality, by > depending on the spin-off packages, then there wouldn't have been > any externally visible changes to base, old code depending on base > would still work as long as base was installed, and new code could > bypass base, using base-core and other spin-off packages directly. That wouldn't work, as packages can't re-export modules. And even if they could, this method would be very awkward. This is about the third time that base has shrunk, and hopefully not the last. This protocol would leave us with some rather awkward names. Fortunately the damage is limited. Module interfaces have largely observed the kind of stability you're asking for, it's just which packages contain those modules. So the needed change is confined to one line of the .cabal file. It's a pain to support different versions, but Thomas has just extended Cabal to make this much easier. > same for cabal: if it learns to do new things, do old things in nicer > ways, or interpret more information, *that should not affect any old > code*. if old code breaks, that means that old information is no > longer accepted, old dependencies are no longer sufficient, or old > functionality is no longer provided. Cabal really is a special case. The API is still much too immature to stabilize, which is why it's handled differently from the other libraries. It was never claimed to be stable. However a package that uses a boilerplate Setup.hs (as 90% of the packages on Hackage do) needs only backwards compatibility of the .cabal format, and that has been carefully preserved. With the recent improvements to Cabal, fewer packages will need the tailored setup scripts that are so fragile. To impose extra constraints on the Cabal API at this point would make it harder to make the wholesale cleanups and enhancements needed to benefit all the other packages, such as Thomas's recent implementation of configurations. There is a common issue, but it's the "birthpangs of a new package system". Pouring concrete over Cabal in its current state won't help. From sven.panne at aedion.de Sat Sep 1 14:34:43 2007 From: sven.panne at aedion.de (Sven Panne) Date: Sat Sep 1 14:25:22 2007 Subject: patch applied (packages/regex-base): Make setup scriptcompileagain after recent Cabal changes In-Reply-To: <20070901181317.GA3942@soi.city.ac.uk> References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> Message-ID: <200709012034.43698.sven.panne@aedion.de> On Saturday 01 September 2007 20:13, Ross Paterson wrote: > [...] However a package that uses a > boilerplate Setup.hs (as 90% of the packages on Hackage do) needs only > backwards compatibility of the .cabal format, and that has been carefully > preserved. Alas, that's true: Today I had to unbreak the "time" package exactly because the .cabal format has been changed. > [...] There is a common issue, but it's the "birthpangs of a new package > system". Pouring concrete over Cabal in its current state won't help. Fair enough, but giving a painful birth for months after months is really annoying. Our very big mistake has been to use Cabal in its current immature state in the GHC (+ libs) build system. Being progressive and open to new technologies is a good thing, but *not* in the area of versioning systems, build infrastructure and the like, at least this should be avoided when the SW in question is built by a large amount of people. Experiments should be done in isolation... Cheers, S. From sven.panne at aedion.de Sat Sep 1 14:35:55 2007 From: sven.panne at aedion.de (Sven Panne) Date: Sat Sep 1 14:26:34 2007 Subject: patch applied (packages/regex-base): Make setup scriptcompileagain after recent Cabal changes In-Reply-To: <200709012034.43698.sven.panne@aedion.de> References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <200709012034.43698.sven.panne@aedion.de> Message-ID: <200709012035.55154.sven.panne@aedion.de> On Saturday 01 September 2007 20:34, I wrote: > Alas, that's true: Today I had to unbreak the "time" package exactly [...] "*not* true", sorry... Cheers, S. From ross at soi.city.ac.uk Sat Sep 1 14:57:38 2007 From: ross at soi.city.ac.uk (Ross Paterson) Date: Sat Sep 1 14:48:22 2007 Subject: patch applied (packages/regex-base): Make setup scriptcompileagain after recent Cabal changes In-Reply-To: <200709012034.43698.sven.panne@aedion.de> References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <200709012034.43698.sven.panne@aedion.de> Message-ID: <20070901185738.GB3942@soi.city.ac.uk> On Sat, Sep 01, 2007 at 08:34:43PM +0200, Sven Panne wrote: > On Saturday 01 September 2007 20:13, Ross Paterson wrote: > > [...] However a package that uses a > > boilerplate Setup.hs (as 90% of the packages on Hackage do) needs only > > backwards compatibility of the .cabal format, and that has been carefully > > preserved. > > Alas, that's [not] true: Today I had to unbreak the "time" package > exactly because the .cabal format has been changed. Only because the development version of time.cabal was using the new extensions while they were being developed. The backwards compatibility is between released versions. The .cabal file in the time package in Hackage should still work with the new Cabal. From claus.reinke at talk21.com Sat Sep 1 18:48:34 2007 From: claus.reinke at talk21.com (Claus Reinke) Date: Sat Sep 1 18:41:07 2007 Subject: patch applied (packages/regex-base): Make setupscriptcompileagain after recent Cabal changes References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> Message-ID: > That wouldn't work, as packages can't re-export modules. And even if > they could, this method would be very awkward. This is about the third > time that base has shrunk, and hopefully not the last. This protocol > would leave us with some rather awkward names. that sounds like a design flaw (and i believe it isn't the first time module re-export has come up?). yes, proxies are awkward, but less so than breakage. and nice names that don't do what they say are not nice at all, either. it is the problem itself that is awkward, and proxies are the only solution i know that works without relying on all clients following your development changes manually. the idea of such proxies is to allow refactorings to cross externally visible apis, from library code to client code, without breaking code: you refactor your library, so that the old api becomes a thin proxy over your new api; then you release your new library, together with the proxy; then you deprecate the proxy; clients who can't or don't want to refactor their code can keep using the proxies, whereas clients who do get around to cleaning up their code can continue the refactoring you started in your library code, in their client code, eliminating their uses of the proxy. i just did a little experiment, with ghc-6.6.1. consider: -- Y.hs module Main where import Data.Time main = print =<< getCurrentTime this module will not compile without '-package time' or --make: $ ghc Y.hs Y.o(.text+0xb1):fake: undefined reference to `timezm1zi1zi1_DataziTimeziLocalTimeziLocalTime_zdf1_cl osure' Y.o(.text+0x115):fake: undefined reference to `timezm1zi1zi1_DataziTimeziClock_getCurrentTime_closur e' Y.o(.rodata+0xc):fake: undefined reference to `timezm1zi1zi1_DataziTimeziLocalTimeziLocalTime_zdf1_c losure' Y.o(.rodata+0x10):fake: undefined reference to `timezm1zi1zi1_DataziTimeziClock_getCurrentTime_closu re' collect2: ld returned 1 exit status now, consider this cabal package, the only purpose of which will be to make Y.hs compileable: -- P.cabal License: BSD3 Author: Homepage: Category: Build-Depends: base, time Synopsis: testing proxy packaging Exposed-modules: Data Extensions: -- Setup.hs import Distribution.Simple main = defaultMain -- Data.hs (yes..) module Data(module Time) where import Data.Time as Time configure, build, install, and 'ghc -package P Y.hs' seems to work. $ ghc -package P Y.hs $ ./main.exe 2007-09-01 22:35:05 UTC would something like this work for base splitting? > Cabal really is a special case. The API is still much too immature to > stabilize, which is why it's handled differently from the other libraries. > It was never claimed to be stable. it is now in use. therefore (apart from differences between theory and practice;-) there should be a stable version. i assume most cabal-specific breakage is due to early use beyond that stable api? claus From duncan.coutts at worc.ox.ac.uk Sat Sep 1 20:31:26 2007 From: duncan.coutts at worc.ox.ac.uk (Duncan Coutts) Date: Sat Sep 1 20:20:05 2007 Subject: patch applied (packages/regex-base): Make setup scriptcompileagain after recent Cabal changes In-Reply-To: <20070901181317.GA3942@soi.city.ac.uk> References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> Message-ID: <1188693086.10322.219.camel@localhost> On Sat, 2007-09-01 at 19:13 +0100, Ross Paterson wrote: > Cabal really is a special case. The API is still much too immature to > stabilize, which is why it's handled differently from the other libraries. > It was never claimed to be stable. However a package that uses a > boilerplate Setup.hs (as 90% of the packages on Hackage do) needs only > backwards compatibility of the .cabal format, and that has been carefully > preserved. With the recent improvements to Cabal, fewer packages will > need the tailored setup scripts that are so fragile. To impose extra > constraints on the Cabal API at this point would make it harder to make > the wholesale cleanups and enhancements needed to benefit all the other > packages, such as Thomas's recent implementation of configurations. > > There is a common issue, but it's the "birthpangs of a new package > system". Pouring concrete over Cabal in its current state won't help. I fully agree. We've done a lot of work on Cabal recently fixing a huge long list of bugs, complaints and feature requests. None of this would have been possible without doing lots of refactoring at the same time. Anyone is welcome to look at the internal code before and after too see what I mean. We still need to make significant further internal API changes to keep the code sane, to get features to be implemented in one place rather than spread throughout the code base. So the internal APIs are not stable and we can't promise to make them stable without halting all significant future development. On the other hand we have been extremely careful with not breaking the Cabal file format (and most of the command line interface). It's backwards compatible and in many ways future compatible too (ie adding new fields does not break old versions of Cabal, though it does generate warnings). GHC adopting Cabal for it's library build system has driven many improvements in Cabal but yes has caused quite a bit of breakage in GHC HEAD and the associated libraries. This isn't too bad since we have Ian on hand to keep things in sync. For other libraries using trivial Setup.hs files it's fine. I really don't know what to do to make the base split up easier. The main problem is that there really is nothing that identifies a piece of code as the package or module it lives in changes. The modules in packages can overlap so the module name isn't sufficient, though it's probably good enough in most cases. I've also been guilty of changing the internal api of ByteString, though I do note that it is only the internal api which we never promised to keep stable. The only changes in the public part of the api were as a result of requests and discussion on this mailing list (see unsafe CString bits and renaming .Base to .Internal and .Unsafe to reduce confusion). Duncan From apfelmus at quantentunnel.de Sun Sep 2 04:19:38 2007 From: apfelmus at quantentunnel.de (apfelmus) Date: Sun Sep 2 04:10:21 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <007801c7ec9d$8b1c3250$54257ad5@cr3lt> References: <20070901115444.GA28872@cvs.haskell.org> <007801c7ec9d$8b1c3250$54257ad5@cr3lt> Message-ID: Claus Reinke wrote: > does anyone else feel that there is something wrong in haskell library > land? i'm picking on cabal here as one of "the usual suspects", but it > seems to have become the rule that otherwise stable code has to be fixed > every so often, to accomodate new compiler versions, new library > dependency versions, new cabal versions, new xyz versions,.. Wouldn't the fundamental solution to backward compatibility simply be to install old versions of a library? I mean, if the code only compiles with foo-1.0 but no longer with foo-2.2 then we'll just link against this old dependency foo-1.0. Viewed this way, backward compatibility of the API of foo-2.2 with foo-1.0 is just an optimization to save disk space since you don't have to keep two separate packages foo-1.0 and foo-2.2 around then. In general, I think that the notions "install" and "uninstall" are wrong for package management. IMHO, the right way is a purely functional one, along the lines of Nix http://nix.cs.uu.nl/index.html In essence, Nix treats every "package" (= populated directory tree) as a cell in some kind of _immutable_ memory (= disk space in this case). A set of installed packages corresponds to a set of allocated cells in memory. This is entirely analogous to storing Haskell values in memory. And not surprisingly, package contents can then be generated as values in a purely functional language, dependencies are just function arguments. Regards, apfelmus From igloo at earth.li Sun Sep 2 16:59:24 2007 From: igloo at earth.li (Ian Lynagh) Date: Sun Sep 2 16:50:00 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <200709011724.52017.sven.panne@aedion.de> References: <20070901115444.GA28872@cvs.haskell.org> <007801c7ec9d$8b1c3250$54257ad5@cr3lt> <1188657946.14131.9.camel@intothevoid> <200709011724.52017.sven.panne@aedion.de> Message-ID: <20070902205924.GA4337@matrix.chaos.earth.li> On Sat, Sep 01, 2007 at 05:24:51PM +0200, Sven Panne wrote: > > (Does anybody remember changes in "make"'s basic syntax? I > don't...) No, as far as I am aware make still has all the irritations and limitations that it's had for decades. > even when you have a "fat" line to the Internet. The tarball snapshots of the > repositories are not really an option in the long run IMHO and defeat the > purpose of a versioning tool. I don't really see why it defeats the purpose. There are a lot of problems more serious than needing to download a tarball in order to get an up-to-date repo quickly. > To be usable, a speedup of at least factor 10 > would be required. Is there any hope for this? Patch application can be sped up with a planned change of hunk format. If downloading lots of small patch files individually is the problem then darcs could tar them up when checkpointed tags are made. So yes, I think there is hope; it just needs some development time put in. Thanks Ian From igloo at earth.li Sun Sep 2 17:24:50 2007 From: igloo at earth.li (Ian Lynagh) Date: Sun Sep 2 17:15:25 2007 Subject: patch applied (packages/regex-base): Make setupscriptcompileagain after recent Cabal changes In-Reply-To: References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> Message-ID: <20070902212450.GB4337@matrix.chaos.earth.li> On Sat, Sep 01, 2007 at 11:48:34PM +0100, Claus Reinke wrote: > > i just did a little experiment, with ghc-6.6.1. consider: > > -- Y.hs > module Main where > import Data.Time > main = print =<< getCurrentTime > > now, consider this cabal package, the only purpose of which will > be to make Y.hs compileable: > > -- P.cabal > Exposed-modules: Data > Build-Depends: base, time > > -- Data.hs (yes..) > module Data(module Time) where > import Data.Time as Time I'm a bit confused; is this Data module necessary? If so, is its name important? Are you proposing a new extension? > configure, build, install, and 'ghc -package P Y.hs' seems to work. > > $ ghc -package P Y.hs Why is -package P better than -package time? Incidentally, one thing that might help is to make an empty "time" package for people with old GHCs to install (or even to make the time package appear empty to such people, with configurations. Except they'd have to upgrade Cabal to support configurations first). Thanks Ian From claus.reinke at talk21.com Sun Sep 2 18:10:04 2007 From: claus.reinke at talk21.com (Claus Reinke) Date: Sun Sep 2 18:00:43 2007 Subject: patch applied (packages/regex-base): Make setupscriptcompileagain after recent Cabal changes References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <20070902212450.GB4337@matrix.chaos.earth.li> Message-ID: <00f701c7edae$05265880$e4248351@cr3lt> >> -- P.cabal >> Exposed-modules: Data >> Build-Depends: base, time >> >> -- Data.hs (yes..) >> module Data(module Time) where >> import Data.Time as Time > > I'm a bit confused; is this Data module necessary? If so, is its name > important? Are you proposing a new extension? for context: - i suggested that there should be a base package that would depend on the packages split off from the old base, and would simply reexport their modules to provide the full functionality of the old base on top of the spin-off packages - Ross pointed out that packages can't simply re-export modules, so the straightforward solution of a package without sources, just with a .cabal file, seems barred for the moment (though i don't understand why this restriction is there, so perhaps i am asking for an extension?) - taking time as a small example, i was looking for a way around this limitation, to reexport time's modules via a different package: - it seems that cabal needs some sources for exported modules - module Data.Time where import Data.Time, then exposing Data.Time, does not work, because of cycle - module Data(module Time) import Data.Time as Time, then exposing Data, does work, as demonstrated >> configure, build, install, and 'ghc -package P Y.hs' seems to work. >> $ ghc -package P Y.hs > > Why is -package P better than -package time? not better, it just demonstrates that we can re-export a module from package time via package P. so, presumably, one could re-export the modules distributed over the packages split off from the old base via a thin package base, avoiding breakage. there may be other ways, i just needed one way to confirm that it is possible. claus From ross at soi.city.ac.uk Sun Sep 2 18:38:12 2007 From: ross at soi.city.ac.uk (Ross Paterson) Date: Sun Sep 2 18:28:48 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <200709011724.52017.sven.panne@aedion.de> References: <20070901115444.GA28872@cvs.haskell.org> <007801c7ec9d$8b1c3250$54257ad5@cr3lt> <1188657946.14131.9.camel@intothevoid> <200709011724.52017.sven.panne@aedion.de> Message-ID: <20070902223811.GA30444@soi.city.ac.uk> On Sat, Sep 01, 2007 at 05:24:51PM +0200, Sven Panne wrote: > The main problem is that I've been hearing the sentence "Cabal is unstable at > the moment, but with the next GHC release everything will be fixed and > rock-solid, never changing again, ..." at least for a year now. I don't recall that myself, but I certainly hope no-one's given you the impression that Cabal will be fixed and rock-solid, never changing again, after the upcoming GHC release. From simons at cryp.to Sun Sep 2 19:40:51 2007 From: simons at cryp.to (Peter Simons) Date: Sun Sep 2 19:31:38 2007 Subject: ByteString I/O Performance References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> Message-ID: <87ps107i0s.fsf@write-only.cryp.to> Donald Bruce Stewart writes: > It's been a while since I benchmarked the IO performance, so > looks like time to revisit this issue. Hi Bruce, my impression is that performance cannot be improved significantly without providing a modified input API. For some purposes reading input in large chunks is fine, but for some purposes it is not. A network proxy server, for example, cannot afford to buffer large amounts of data for every connection. An I/O buffer size of, say 256KB, would be really not a good choice for such a program. Something like 4KB typically is, but the small buffer size means that large amounts of data will be read using a fairly large number of hGet calls. As it is, hGet performs at least one malloc() per read() call. That will be slow, no matter how optimized that code is. One way to get malloc() out of the picture would be to provide a variant of hGet that takes an existing, pre-allocated buffer as an argument, so that the user can allocate a ByteString once and re-use it for every single hGet and hPut. A different approach would be to try to reduce the cost for malloc() by using some sort of pre-allocated pool of ByteStrings behind the scenes. Last but not least, it's also possible to decide and document that ByteStrings are not supposed to be used for those kinds of purposes and that users who need very high performance should rely on hGetBuf instead. I can't say what's best, those are simply the options I see. With kind regards, Peter From bos at serpentine.com Sun Sep 2 23:23:25 2007 From: bos at serpentine.com (Bryan O'Sullivan) Date: Sun Sep 2 23:15:57 2007 Subject: ByteString I/O Performance In-Reply-To: <87ps107i0s.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> Message-ID: <46DB7E2D.4080001@serpentine.com> Peter Simons wrote: > One way to get malloc() out of the picture would be to provide a > variant of hGet that takes an existing, pre-allocated buffer as an > argument, so that the user can allocate a ByteString once and re-use > it for every single hGet and hPut. This is already quite easy to do. See unsafeUseAsCStringLen in Data.ByteString.Base, and hGetBuf in System.IO. References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <46DB7E2D.4080001@serpentine.com> Message-ID: <0c6201c7edde$47ac8640$d70592c0$@com> -----Original Message----- From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Bryan O'Sullivan Sent: Sunday, September 02, 2007 11:23 PM To: Peter Simons Cc: libraries@haskell.org Subject: Re: ByteString I/O Performance Peter Simons wrote: > One way to get malloc() out of the picture would be to provide a > variant of hGet that takes an existing, pre-allocated buffer as an > argument, so that the user can allocate a ByteString once and re-use > it for every single hGet and hPut. This is already quite easy to do. See unsafeUseAsCStringLen in Data.ByteString.Base, and hGetBuf in System.IO. Is it possible without resorting to an unsafeXXX function? References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <46DB7E2D.4080001@serpentine.com> <0c6201c7edde$47ac8640$d70592c0$@com> Message-ID: <20070903040252.GC21914@cse.unsw.EDU.AU> seth: > > > -----Original Message----- > From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Bryan O'Sullivan > Sent: Sunday, September 02, 2007 11:23 PM > To: Peter Simons > Cc: libraries@haskell.org > Subject: Re: ByteString I/O Performance > > Peter Simons wrote: > > > One way to get malloc() out of the picture would be to provide a > > variant of hGet that takes an existing, pre-allocated buffer as an > > argument, so that the user can allocate a ByteString once and re-use > > it for every single hGet and hPut. > > This is already quite easy to do. See unsafeUseAsCStringLen in > Data.ByteString.Base, and hGetBuf in System.IO. > > Is it possible without resorting to an unsafeXXX function? They're all 'unsafe' for different reasons :) The question should be: why is this unsafe? (It's unsafe because it doesn't copy the C string, so you need to have a side condition that the string isn't modified by C). -- Don From seth at cql.com Mon Sep 3 00:08:06 2007 From: seth at cql.com (Seth Kurtzberg) Date: Sun Sep 2 23:59:09 2007 Subject: ByteString I/O Performance In-Reply-To: <20070903040252.GC21914@cse.unsw.EDU.AU> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <46DB7E2D.4080001@serpentine.com> <0c6201c7edde$47ac8640$d70592c0$@com> <20070903040252.GC21914@cse.unsw.EDU.AU> Message-ID: <0c6501c7ede0$07ea95e0$17bfc1a0$@com> -----Original Message----- From: Donald Bruce Stewart [mailto:dons@cse.unsw.edu.au] Sent: Monday, September 03, 2007 12:03 AM To: Seth Kurtzberg Cc: libraries@haskell.org Subject: Re: ByteString I/O Performance seth: > > > -----Original Message----- > From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Bryan O'Sullivan > Sent: Sunday, September 02, 2007 11:23 PM > To: Peter Simons > Cc: libraries@haskell.org > Subject: Re: ByteString I/O Performance > > Peter Simons wrote: > > > One way to get malloc() out of the picture would be to provide a > > variant of hGet that takes an existing, pre-allocated buffer as an > > argument, so that the user can allocate a ByteString once and re-use > > it for every single hGet and hPut. > > This is already quite easy to do. See unsafeUseAsCStringLen in > Data.ByteString.Base, and hGetBuf in System.IO. > > Is it possible without resorting to an unsafeXXX function? They're all 'unsafe' for different reasons :) The question should be: why is this unsafe? (It's unsafe because it doesn't copy the C string, so you need to have a side condition that the string isn't modified by C). OK. Assume that I'm not doing any C coding, so that the only C code that is invoked is called from within the implementation (in this case the implementation of System.IO). Can I assume that no implementation code modifies the string? In other words, is it valid to assume that the side condition is never violated so long as I don't violate the side condition in my own C code (if any)? -- Don From tomasz.zielonka at gmail.com Mon Sep 3 01:11:45 2007 From: tomasz.zielonka at gmail.com (Tomasz Zielonka) Date: Mon Sep 3 01:02:20 2007 Subject: ByteString I/O Performance In-Reply-To: <87ps107i0s.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> Message-ID: <20070903051145.GB19428@lambda> On Mon, Sep 03, 2007 at 01:40:51AM +0200, Peter Simons wrote: > One way to get malloc() out of the picture would be to provide a > variant of hGet that takes an existing, pre-allocated buffer as an > argument, so that the user can allocate a ByteString once and re-use > it for every single hGet and hPut. This seems dangerous. For example, consider that the ByteString can be referenced by some lazy computation, expecting it to contain the data from some earlier hGet. > A different approach would be to try to reduce the cost for malloc() > by using some sort of pre-allocated pool of ByteStrings behind the > scenes. I just wrote this, before I read you proposition: As safer alternative would be to keep a cache of pre-malloced buffers, populated by the ByteString finalizer. But the bookkeeping cost could outweight the benefit of avoiding malloc. Best regards Tomek From stefanor at cox.net Mon Sep 3 01:16:59 2007 From: stefanor at cox.net (Stefan O'Rear) Date: Mon Sep 3 01:07:37 2007 Subject: ByteString I/O Performance In-Reply-To: <20070903051145.GB19428@lambda> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <20070903051145.GB19428@lambda> Message-ID: <20070903051659.GA4294@localhost.localdomain> On Mon, Sep 03, 2007 at 07:11:45AM +0200, Tomasz Zielonka wrote: > On Mon, Sep 03, 2007 at 01:40:51AM +0200, Peter Simons wrote: > > One way to get malloc() out of the picture would be to provide a > > variant of hGet that takes an existing, pre-allocated buffer as an > > argument, so that the user can allocate a ByteString once and re-use > > it for every single hGet and hPut. > > This seems dangerous. For example, consider that the ByteString can be > referenced by some lazy computation, expecting it to contain the data > from some earlier hGet. > > > A different approach would be to try to reduce the cost for malloc() > > by using some sort of pre-allocated pool of ByteStrings behind the > > scenes. > > I just wrote this, before I read you proposition: > As safer alternative would be to keep a cache of pre-malloced buffers, > populated by the ByteString finalizer. But the bookkeeping cost could > outweight the benefit of avoiding malloc. How are you getting these bytestrings? Normal bytestring allocation doesn't use malloc and doesn't use finalizers; it calls the (deceptively named) mallocForeignPtrBytes function, which allocates a block of data in the pinned GHC heap, nearly like any other Haskell object. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://www.haskell.org/pipermail/libraries/attachments/20070902/5bcfce3e/attachment.bin From tomasz.zielonka at gmail.com Mon Sep 3 01:49:01 2007 From: tomasz.zielonka at gmail.com (Tomasz Zielonka) Date: Mon Sep 3 01:39:37 2007 Subject: ByteString I/O Performance In-Reply-To: <20070903051659.GA4294@localhost.localdomain> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <20070903051145.GB19428@lambda> <20070903051659.GA4294@localhost.localdomain> Message-ID: <20070903054901.GC19428@lambda> On Sun, Sep 02, 2007 at 10:16:59PM -0700, Stefan O'Rear wrote: > On Mon, Sep 03, 2007 at 07:11:45AM +0200, Tomasz Zielonka wrote: > > I just wrote this, before I read you proposition: > > As safer alternative would be to keep a cache of pre-malloced buffers, > > populated by the ByteString finalizer. But the bookkeeping cost could > > outweight the benefit of avoiding malloc. > > How are you getting these bytestrings? Normal bytestring allocation > doesn't use malloc and doesn't use finalizers; it calls the (deceptively > named) mallocForeignPtrBytes function, which allocates a block of data > in the pinned GHC heap, nearly like any other Haskell object. IIUC, the discussion was about malloced ByteStrings... Best regards Tomek From simonpj at microsoft.com Mon Sep 3 03:57:57 2007 From: simonpj at microsoft.com (Simon Peyton-Jones) Date: Mon Sep 3 03:48:32 2007 Subject: patch applied (packages/regex-base): Make setupscriptcompileagain after recent Cabal changes In-Reply-To: <00f701c7edae$05265880$e4248351@cr3lt> References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <20070902212450.GB4337@matrix.chaos.earth.li> <00f701c7edae$05265880$e4248351@cr3lt> Message-ID: | - Ross pointed out that packages can't simply re-export modules, | so the straightforward solution of a package without sources, | just with a .cabal file, seems barred for the moment (though i | don't understand why this restriction is there, so perhaps i am | asking for an extension?) Yes this is undoubtedly a bug, in Cabal or GHC or (probably) both. Of *course* a package should be able to re-export a module from another package, just as a module can re-export a function imported from another module. I don't think there is any difficulty in principle; it just needs to be implemented. Simon From duncan.coutts at worc.ox.ac.uk Mon Sep 3 04:56:07 2007 From: duncan.coutts at worc.ox.ac.uk (Duncan Coutts) Date: Mon Sep 3 04:44:41 2007 Subject: ByteString I/O Performance In-Reply-To: <87ps107i0s.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> Message-ID: <1188809768.10322.255.camel@localhost> On Mon, 2007-09-03 at 01:40 +0200, Peter Simons wrote: > Donald Bruce Stewart writes: > > > It's been a while since I benchmarked the IO performance, so > > looks like time to revisit this issue. > > Hi Bruce, > > my impression is that performance cannot be improved significantly > without providing a modified input API. For some purposes reading > input in large chunks is fine, but for some purposes it is not. A > network proxy server, for example, cannot afford to buffer large > amounts of data for every connection. An I/O buffer size of, say > 256KB, would be really not a good choice for such a program. Something > like 4KB typically is, but the small buffer size means that large > amounts of data will be read using a fairly large number of hGet > calls. As it is, hGet performs at least one malloc() per read() call. > That will be slow, no matter how optimized that code is. > > One way to get malloc() out of the picture would be to provide a > variant of hGet that takes an existing, pre-allocated buffer as an > argument, so that the user can allocate a ByteString once and re-use > it for every single hGet and hPut. Stefan is right, the only place that C malloc() is used is in strict ByteString's hGetContents. Everywhere else we use GHC's pinned heap allocation. Strict ByteString ReadFile does not use hGetContents because in that case we know the file length, it uses hGetBuf. Also, createAndTrim does not do any copying when there is no trimming necessary, so in the best case we do only a single copy using hGetBuf. As I recall from when I profiled this for the ByteString paper, with a lazy ByteString implementation of unix 'cat' on disk files (rather than a network socket) we should only copy each chunk once (as far as I can see). The slowdown compared to a simple hGetBuf 'cat' was all down to cache locality, because we're cycling between a range of buffers rather than a single cache-hot buffer. The time overhead of memory allocation and GC is negligible. In the tests I did at the time I found that on fully cached files the slowdown compared to using a single mutable buffer was about 2-3x. I figured that overhead is not bad, considering it represents the worst case when no work is being done to transform the data in any way and we're not doing any real IO, just copying data from kernel memory to user space memory. If we're doing lots of short reads (like when reading from sockets) then there is an opportunity for improvement. We could read into an internal buffer and cut off as an immutable ByteString only the chunk that got filled. The remainder of the buffer could be used for the following reads until the whole buffer is exhausted and a new buffer has to be allocated. This is the buffering strategy we use in the binary library when serialising. Duncan From igloo at earth.li Mon Sep 3 08:12:57 2007 From: igloo at earth.li (Ian Lynagh) Date: Mon Sep 3 08:03:30 2007 Subject: patch applied (packages/regex-base): Make setupscriptcompileagain after recent Cabal changes In-Reply-To: <00f701c7edae$05265880$e4248351@cr3lt> References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <20070902212450.GB4337@matrix.chaos.earth.li> <00f701c7edae$05265880$e4248351@cr3lt> Message-ID: <20070903121257.GA26595@matrix.chaos.earth.li> On Sun, Sep 02, 2007 at 11:10:04PM +0100, Claus Reinke wrote: > >> > >> -- Data.hs (yes..) > >> module Data(module Time) where > >> import Data.Time as Time > > > >I'm a bit confused; is this Data module necessary? If so, is its name > >important? Are you proposing a new extension? > > for context: > > - taking time as a small example, i was looking for a way around > this limitation, to reexport time's modules via a different package: > > - it seems that cabal needs some sources for exported modules > - module Data.Time where import Data.Time, > then exposing Data.Time, does not work, because of cycle > - module Data(module Time) import Data.Time as Time, > then exposing Data, does work, as demonstrated But the above is making a module called Data which exports everything that Data.Time exports. The Data.Time user isn't importing Data, so I don't see how that can help. As far as I can see, if this does work then there is at least one misfeature involved. Thanks Ian From claus.reinke at talk21.com Mon Sep 3 08:36:33 2007 From: claus.reinke at talk21.com (Claus Reinke) Date: Mon Sep 3 08:27:12 2007 Subject: patch applied (packages/regex-base): Make setupscriptcompileagain after recent Cabal changes References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <20070902212450.GB4337@matrix.chaos.earth.li> <00f701c7edae$05265880$e4248351@cr3lt> <20070903121257.GA26595@matrix.chaos.earth.li> Message-ID: <00a701c7ee27$1112d2d0$08097ad5@cr3lt> >> >> -- Data.hs (yes..) >> >> module Data(module Time) where >> >> import Data.Time as Time >> >> - taking time as a small example, i was looking for a way around >> this limitation, to reexport time's modules via a different package: >> >> - it seems that cabal needs some sources for exported modules >> - module Data.Time where import Data.Time, >> then exposing Data.Time, does not work, because of cycle >> - module Data(module Time) import Data.Time as Time, >> then exposing Data, does work, as demonstrated > > But the above is making a module called Data which exports everything > that Data.Time exports. The Data.Time user isn't importing Data, so I > don't see how that can help. As far as I can see, if this does work then > there is at least one misfeature involved. yes, i continue to be rather surprised by all of this, that simple module re-export in packages is not implemented, that import has a "letrec semantics" (the only reason for having the current module in scope for import seems that, before packages, there was no nesting; but with packages, there is the question of how to refer to a module of the same name in the dependencies, isn't there? perhaps there should be an optional "from " qualifier for imports?), and that the variant i found works (if import Data.Time refers to Data's Time module in the client, why isn't the same import cyclic in Data.hs? just splitting the name breaks the cycle?).. but then, the draft hierarchical module spec didn't say much beyond "the names may now have dots". perhaps module Main where import Data.Time main = print =<< getCurrentTime and module Main where import Data(module Time) main = print =<< Time.getCurrentTime are really meant to be equivalent? claus From simonmarhaskell at gmail.com Mon Sep 3 09:03:56 2007 From: simonmarhaskell at gmail.com (Simon Marlow) Date: Mon Sep 3 08:54:33 2007 Subject: patch applied (packages/regex-base): Make setupscriptcompileagain after recent Cabal changes In-Reply-To: <00a701c7ee27$1112d2d0$08097ad5@cr3lt> References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <20070902212450.GB4337@matrix.chaos.earth.li> <00f701c7edae$05265880$e4248351@cr3lt> <20070903121257.GA26595@matrix.chaos.earth.li> <00a701c7ee27$1112d2d0$08097ad5@cr3lt> Message-ID: <46DC063C.8050506@gmail.com> Claus Reinke wrote: > but then, the draft hierarchical module spec didn't say much > beyond "the names may now have dots". perhaps > module Main where import Data.Time main = print =<< getCurrentTime > > and > module Main where import Data(module Time) > main = print =<< Time.getCurrentTime > > are really meant to be equivalent? The second example isn't syntactically correct. What are you getting at here? FWIW, there was a proposal to add a package qualifier to import declarations, but it wasn't unanimously viewed as the right thing at the time (check the libraries@haskell.org archives). Same goes for the various "grafting"/"mounting" proposals - we just haven't seen a proposal that has the right power to weight ratio. Meanwhile, the current package system seems to be scaling quite nicely, thank you. We do intend, I think, to allow packages to re-export modules, it just needs to be implemented. (I see it as a missing feature, rather than a bug or a "design flaw", though). Cheers, Simon From claus.reinke at talk21.com Mon Sep 3 10:34:36 2007 From: claus.reinke at talk21.com (Claus Reinke) Date: Mon Sep 3 10:25:13 2007 Subject: patch applied (packages/regex-base): Make setupscriptcompileagain after recent Cabal changes References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <20070902212450.GB4337@matrix.chaos.earth.li> <00f701c7edae$05265880$e4248351@cr3lt> <20070903121257.GA26595@matrix.chaos.earth.li> <00a701c7ee27$1112d2d0$08097ad5@cr3lt> <46DC063C.8050506@gmail.com> Message-ID: <00ec01c7ee37$8e928f10$08097ad5@cr3lt> > The second example isn't syntactically correct. What are you getting at here? is that really as completely unobvious to you as your question implies? -- A module Main where import Data.Time main = print =<< getCurrentTime -- B module Main where import Data main = print =<< Time.getCurrentTime given that package P only exposes a module Data (which, in turn, exports a module Time), it appears that A behaves as if it was B. is that intended? or is something else going on? which is what Ian was asking about, wasn't it? > FWIW, there was a proposal to add a package qualifier to import > declarations, but it wasn't unanimously viewed as the right thing at the > time (check the libraries@haskell.org archives). Same goes for the various > "grafting"/"mounting" proposals - we just haven't seen a proposal that has > the right power to weight ratio. Meanwhile, the current package system > seems to be scaling quite nicely, thank you. no offense intended;-) > We do intend, I think, to allow packages to re-export modules, it just > needs to be implemented. (I see it as a missing feature, rather than a bug > or a "design flaw", though). as long as we agree on the features we want, we don't need to agree on whether their lack is a flaw or a miss. claus From claus.reinke at talk21.com Mon Sep 3 10:52:43 2007 From: claus.reinke at talk21.com (Claus Reinke) Date: Mon Sep 3 10:43:35 2007 Subject: patch applied (packages/regex-base):Make setupscriptcompileagain after recent Cabal changes References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <20070902212450.GB4337@matrix.chaos.earth.li> <00f701c7edae$05265880$e4248351@cr3lt> <20070903121257.GA26595@matrix.chaos.earth.li><00a701c7ee27$1112d2d0$08097ad5@cr3lt> <46DC063C.8050506@gmail.com> <00ec01c7ee37$8e928f10$08097ad5@cr3lt> Message-ID: > -- A > module Main where > import Data.Time > main = print =<< getCurrentTime > > -- B > module Main where > import Data > main = print =<< Time.getCurrentTime that last line should be > main = print =<< getCurrentTime i probably thought for a moment that i was exporting a module, rather than its contents. was that the source of confusion? so why did the experiment reported in http://www.haskell.org/pipermail/libraries/2007-September/008062.html work, when Y.hs imports Data.Time, but the package P only exposes Data? is Y importing time's Data.Time directly, even though P doesn't expose it? claus From simonmarhaskell at gmail.com Mon Sep 3 11:13:26 2007 From: simonmarhaskell at gmail.com (Simon Marlow) Date: Mon Sep 3 11:04:04 2007 Subject: patch applied (packages/regex-base):Make setupscriptcompileagain after recent Cabal changes In-Reply-To: References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <20070902212450.GB4337@matrix.chaos.earth.li> <00f701c7edae$05265880$e4248351@cr3lt> <20070903121257.GA26595@matrix.chaos.earth.li><00a701c7ee27$1112d2d0$08097ad5@cr3lt> <46DC063C.8050506@gmail.com> <00ec01c7ee37$8e928f10$08097ad5@cr3lt> Message-ID: <46DC2496.2040900@gmail.com> Claus Reinke wrote: >> -- A >> module Main where >> import Data.Time >> main = print =<< getCurrentTime >> >> -- B >> module Main where >> import Data >> main = print =<< Time.getCurrentTime > > that last line should be > >> main = print =<< getCurrentTime > > i probably thought for a moment that i was exporting a module, rather > than its contents. was that the source of confusion? actually I was confused because you had import Data(module Time) and that isn't Haskell. But never mind, I think I now get the gist of what you're asking about... see below. > so why did the experiment reported in > > http://www.haskell.org/pipermail/libraries/2007-September/008062.html > > work, when Y.hs imports Data.Time, but the package P only > exposes Data? is Y importing time's Data.Time directly, even > though P doesn't expose it? Interesting. To recap, you had -- Y.hs module Main where import Data.Time main = print =<< getCurrentTime and you compiled it like this; $ ghc -package P Y.hs without failure. Package P simply depends on package time, that's all. You have to understand that the above command does two things: it compiles Y.hs to Y.o, and then it links Y.o with libraries to form a binary. In the first stage, all exposed packages are available (because you didn't say -hide-all-packages), so Data.Time from package time is in scope and you can successfully import it. At link time, all you have to do is make sure the required packages are linked in, and you did that by explicitly linking something (P) that depends on package time, so package time was linked in too. I don't think there's anything really "wrong" here, that's just the way it works when you don't use --make. Cheers, Simon From claus.reinke at talk21.com Mon Sep 3 13:02:05 2007 From: claus.reinke at talk21.com (Claus Reinke) Date: Mon Sep 3 12:52:43 2007 Subject: patch applied (packages/regex-base):Make setupscriptcompileagain after recent Cabal changes References: <1188657946.14131.9.camel@intothevoid> <20070901181317.GA3942@soi.city.ac.uk> <20070902212450.GB4337@matrix.chaos.earth.li> <00f701c7edae$05265880$e4248351@cr3lt> <20070903121257.GA26595@matrix.chaos.earth.li><00a701c7ee27$1112d2d0$08097ad5@cr3lt> <46DC063C.8050506@gmail.com> <00ec01c7ee37$8e928f10$08097ad5@cr3lt> <46DC2496.2040900@gmail.com> Message-ID: <016d01c7ee4c$2927d530$08097ad5@cr3lt> > You have to understand that the above command does two things: it compiles > Y.hs to Y.o, and then it links Y.o with libraries to form a binary. In the > first stage, all exposed packages are available (because you didn't say > -hide-all-packages), so Data.Time from package time is in scope and you can > successfully import it. At link time, all you have to do is make sure the > required packages are linked in, and you did that by explicitly linking > something (P) that depends on package time, so package time was linked in too. > > I don't think there's anything really "wrong" here, that's just the way it > works when you don't use --make. ah, thanks! somehow, i had been living under the misconception that, to get at any non-base package's modules, i would have to use either --make or -package. as if -hide-all-packages -package base was the default, in other words. instead, the distinction is exposed vs hidden, which makes sense. (*) which means that my test setup was useless, other than supporting link-time dependency recording. package dependencies are another matter, since for building packages -hide-all-packages is the default, and my setup did not, after all, re-export modules as intended. one might still re-export modules under their original names, by using two packages. for the time example: a package P can depend on base and time, importing Data.Time, exposing P.Data.Time; then, a package Q can depend on base and P, importing P.Data.Time, exposing Data.Time. with this alternate setup, ghc -hide-all-packages -package base -package Q Y.hs succeeds (this time hopefully for the right reasons?-). if i recall correctly, the grafting/mounting/package-qualifier proposals were attempting to formalise this linking of package modules into a module hierarchy, in various ways. perhaps providing package qualifiers (with a way to disambiguate between package and module names) could be reconsidered, given that mounting of package modules in the hierarchy may then be achieved via intermediate packages like Q above? (*) still, i find this two-things separation confusing, because the compiler only "half-knows" about things (being able to compile without complaint, then fail in linking with confusing errors), and because -package serves two separate purposes (exposing packages, and enabling linking). i'd rather keep the illusion that link objects are an internal representation of source modules, and either both compile and link succeed, or both fail, with a source-level error message. claus From simons at cryp.to Mon Sep 3 15:47:31 2007 From: simons at cryp.to (Peter Simons) Date: Mon Sep 3 15:38:34 2007 Subject: ByteString I/O Performance References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> Message-ID: <87y7fnv8do.fsf@write-only.cryp.to> Duncan Coutts writes: | As I recall from when I profiled this for the ByteString paper | [...], the slowdown compared to a simple hGetBuf 'cat' was all | down to cache locality, because we're cycling between a range | of buffers rather than a single cache-hot buffer. I believe you are right. The following implementation performs just fine: > import System.IO > import qualified Data.ByteString.Base as Str > import qualified Data.ByteString as Str > import Data.ByteString ( ByteString ) > > bufsize :: Int > bufsize = 4 * 1024 > > hGet :: Handle -> ByteString -> IO ByteString > hGet h buf = do i <- Str.unsafeUseAsCStringLen buf (\(p,n) -> hGetBuf h p n) > return (Str.unsafeTake i buf) > > catString :: Handle -> Handle -> IO () > catString hIn hOut = Str.create bufsize (\_ -> return ()) >>= input > where > input buf = hGet hIn buf >>= output buf > output buf b > | Str.null b = return () > | otherwise = Str.hPut hOut b >> input buf > > main :: IO () > main = do > mapM_ (\h -> hSetBuffering h NoBuffering) [ stdin, stdout ] > catString stdin stdout time /bin/cat /dev/null real 0m2.093s user 0m0.024s sys 0m2.068s time ./cat-bytestring /dev/null real 0m2.753s user 0m0.568s sys 0m2.184s Peter From dons at cse.unsw.edu.au Mon Sep 3 16:19:19 2007 From: dons at cse.unsw.edu.au (Donald Bruce Stewart) Date: Mon Sep 3 16:10:03 2007 Subject: ByteString I/O Performance In-Reply-To: <87y7fnv8do.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> Message-ID: <20070903201919.GB15345@cse.unsw.EDU.AU> simons: > Duncan Coutts writes: > > | As I recall from when I profiled this for the ByteString paper > | [...], the slowdown compared to a simple hGetBuf 'cat' was all > | down to cache locality, because we're cycling between a range > | of buffers rather than a single cache-hot buffer. > > I believe you are right. The following implementation performs > just fine: > > > import System.IO > > import qualified Data.ByteString.Base as Str > > import qualified Data.ByteString as Str > > import Data.ByteString ( ByteString ) > > > > bufsize :: Int > > bufsize = 4 * 1024 > > > > hGet :: Handle -> ByteString -> IO ByteString > > hGet h buf = do i <- Str.unsafeUseAsCStringLen buf (\(p,n) -> hGetBuf h p n) > > return (Str.unsafeTake i buf) > > > > catString :: Handle -> Handle -> IO () > > catString hIn hOut = Str.create bufsize (\_ -> return ()) >>= input > > where > > input buf = hGet hIn buf >>= output buf > > output buf b > > | Str.null b = return () > > | otherwise = Str.hPut hOut b >> input buf > > > > main :: IO () > > main = do > > mapM_ (\h -> hSetBuffering h NoBuffering) [ stdin, stdout ] > > catString stdin stdout > > time /bin/cat /dev/null > > real 0m2.093s > user 0m0.024s > sys 0m2.068s > > time ./cat-bytestring /dev/null > > real 0m2.753s > user 0m0.568s > sys 0m2.184s That's a useful benchmark. Thanks for looking into this. -- Don From dons at cse.unsw.edu.au Mon Sep 3 19:21:28 2007 From: dons at cse.unsw.edu.au (Donald Bruce Stewart) Date: Mon Sep 3 19:12:06 2007 Subject: ByteString I/O Performance In-Reply-To: <0c6501c7ede0$07ea95e0$17bfc1a0$@com> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <46DB7E2D.4080001@serpentine.com> <0c6201c7edde$47ac8640$d70592c0$@com> <20070903040252.GC21914@cse.unsw.EDU.AU> <0c6501c7ede0$07ea95e0$17bfc1a0$@com> Message-ID: <20070903232128.GG15345@cse.unsw.EDU.AU> seth: > > > -----Original Message----- > From: Donald Bruce Stewart [mailto:dons@cse.unsw.edu.au] > Sent: Monday, September 03, 2007 12:03 AM > To: Seth Kurtzberg > Cc: libraries@haskell.org > Subject: Re: ByteString I/O Performance > > seth: > > > > > > -----Original Message----- > > From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] > On Behalf Of Bryan O'Sullivan > > Sent: Sunday, September 02, 2007 11:23 PM > > To: Peter Simons > > Cc: libraries@haskell.org > > Subject: Re: ByteString I/O Performance > > > > Peter Simons wrote: > > > > > One way to get malloc() out of the picture would be to provide a > > > variant of hGet that takes an existing, pre-allocated buffer as an > > > argument, so that the user can allocate a ByteString once and re-use > > > it for every single hGet and hPut. > > > > This is already quite easy to do. See unsafeUseAsCStringLen in > > Data.ByteString.Base, and hGetBuf in System.IO. > > > > Is it possible without resorting to an unsafeXXX function? > > They're all 'unsafe' for different reasons :) > The question should be: why is this unsafe? > > (It's unsafe because it doesn't copy the C string, so you need to have a > side condition that the string isn't modified by C). > > OK. Assume that I'm not doing any C coding, so that the only C code that is > invoked is called from within the implementation (in this case the > implementation of System.IO). Can I assume that no implementation code > modifies the string? In other words, is it valid to assume that the side > condition is never violated so long as I don't violate the side condition in > my own C code (if any)? > Yes, that's right. Its safe as long as you don't modify the string yourself in C. -- Don From dons at cse.unsw.edu.au Mon Sep 3 19:58:43 2007 From: dons at cse.unsw.edu.au (Donald Bruce Stewart) Date: Mon Sep 3 19:49:17 2007 Subject: Adding IdentityT to mtl In-Reply-To: <5ab17e790708281013j7102fd5uf7e3251432b9a271@mail.gmail.com> References: <20070601053514.GA468@cse.unsw.EDU.AU> <8dde104f0708280608o20b6a62fq699574bc79a82318@mail.gmail.com> <5ab17e790708281013j7102fd5uf7e3251432b9a271@mail.gmail.com> Message-ID: <20070903235843.GL15345@cse.unsw.EDU.AU> Thanks Iavor! iavor.diatchki: > Hi, > I comlpletely forgot about this. I have added two new transformers to > monadLib: IdT and LiftT, the second one using a strict bind. These > changes are available in the darcs repository. When I get around to > playing around with them a bit more I will make a new package and put > it on hackage. > -Iavor > > On 8/28/07, Josef Svenningsson wrote: > > Whatever happened to the suggestion of extending mtl with IdentityT? I > > think it's reasonable, especially since we have a documented use case. > > > > /Josef > > > > On 6/1/07, Donald Bruce Stewart wrote: > > > I wanted an IdentityT today, for extending xmonad. (The idea is to > > > allower user-defined monad transformers, so users can plug in their own > > > semantics easily). > > > > > > By default it would use IdentityT, which I note is not in mtl! > > > > > > Here's roughly what it would be: > > > > > > ----------------------------------------------------------------------------- > > > -- | > > > -- Module : Identity.hs > > > -- License : BSD3-style (see LICENSE) > > > -- > > > module IdentityT where > > > > > > import Control.Monad.Trans > > > > > > -- > > > -- IdentityT , a parameterisable identity monad, with an inner monad > > > -- The user's default monad transformer > > > -- > > > > > > newtype IdentityT m a = IdentityT { runIdentityT :: m a } > > > > > > instance (Functor m, Monad m) => Functor (IdentityT m) where > > > fmap f = IdentityT . fmap f . runIdentityT > > > > > > instance (Monad m) => Monad (IdentityT m) where > > > return = IdentityT . return > > > m >>= k = IdentityT $ runIdentityT . k =<< runIdentityT m > > > fail msg = IdentityT $ fail msg > > > > > > instance (MonadIO m) => MonadIO (IdentityT m) where > > > liftIO = IdentityT . liftIO > > > > > > Any reasons why this shouldn't be in mtl? > > > > > > -- Don > > > _______________________________________________ > > > Libraries mailing list > > > Libraries@haskell.org > > > http://www.haskell.org/mailman/listinfo/libraries > > > > > _______________________________________________ > > Libraries mailing list > > Libraries@haskell.org > > http://www.haskell.org/mailman/listinfo/libraries > > From pgavin at gmail.com Mon Sep 3 21:16:24 2007 From: pgavin at gmail.com (Peter Gavin) Date: Mon Sep 3 21:06:55 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <200709011724.52017.sven.panne@aedion.de> References: <20070901115444.GA28872@cvs.haskell.org> <007801c7ec9d$8b1c3250$54257ad5@cr3lt> <1188657946.14131.9.camel@intothevoid> <200709011724.52017.sven.panne@aedion.de> Message-ID: <37df87420709031816l1235c65dt2778796a4786574f@mail.gmail.com> On 9/1/07, Sven Panne wrote: > The main problem is that I've been hearing the sentence "Cabal is unstable at > the moment, but with the next GHC release everything will be fixed and > rock-solid, never changing again, ..." at least for a year now. In my > experience, Cabal is *the* #1 reason for breaking build for aeons, and this > is really getting frustrating. It doesn't sound like you'd disagree with me, I just want to make a couple comments about this... Autoconf had similar problems during its infancy. Packages were constantly having problems. It takes time to work out the kinks, and bugs should be expected for the time being. And if you consider that Cabal is trying to be autoconf, automake, and libtool, and make, in a single tool, there will be that many more kinks to work out :) Pete From seth at cql.com Tue Sep 4 10:45:49 2007 From: seth at cql.com (Seth Kurtzberg) Date: Tue Sep 4 10:36:59 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <37df87420709031816l1235c65dt2778796a4786574f@mail.gmail.com> References: <20070901115444.GA28872@cvs.haskell.org> <007801c7ec9d$8b1c3250$54257ad5@cr3lt> <1188657946.14131.9.camel@intothevoid> <200709011724.52017.sven.panne@aedion.de> <37df87420709031816l1235c65dt2778796a4786574f@mail.gmail.com> Message-ID: <0cb201c7ef02$48ca2990$da5e7cb0$@com> -----Original Message----- From: libraries-bounces@haskell.org [mailto:libraries-bounces@haskell.org] On Behalf Of Peter Gavin Sent: Monday, September 03, 2007 9:16 PM Cc: libraries@haskell.org; Claus Reinke Subject: Re: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes On 9/1/07, Sven Panne wrote: > The main problem is that I've been hearing the sentence "Cabal is unstable at > the moment, but with the next GHC release everything will be fixed and > rock-solid, never changing again, ..." at least for a year now. In my > experience, Cabal is *the* #1 reason for breaking build for aeons, and this > is really getting frustrating. It doesn't sound like you'd disagree with me, I just want to make a couple comments about this... Autoconf had similar problems during its infancy. Packages were constantly having problems. It takes time to work out the kinks, and bugs should be expected for the time being. And if you consider that Cabal is trying to be autoconf, automake, and libtool, and make, in a single tool, there will be that many more kinks to work out :) I completely agree. The level of instability is actually quite good given the location of Cabal in the software life cycle. Obviously, more stable is better, but IMO the developers have delivered a level of stability that, given the context, is above reasonable expectations Seth Kurtzberg Software Engineer Specializing in Security, Reliability, and the Hardware/Software Interface Pete _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries From sven.panne at aedion.de Tue Sep 4 12:25:03 2007 From: sven.panne at aedion.de (Sven Panne) Date: Tue Sep 4 12:15:34 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <37df87420709031816l1235c65dt2778796a4786574f@mail.gmail.com> References: <20070901115444.GA28872@cvs.haskell.org> <200709011724.52017.sven.panne@aedion.de> <37df87420709031816l1235c65dt2778796a4786574f@mail.gmail.com> Message-ID: <200709041825.03457.sven.panne@aedion.de> On Tuesday 04 September 2007 03:16, Peter Gavin wrote: > [...] And if you consider that > Cabal is trying to be autoconf, automake, and libtool, and make, in a > single tool, there will be that many more kinks to work out :) Huh? Cabal tries to be autoconf? Unless I've missed something, there's no way Cabal can help me to figure out the paths to OpenGL headers, the wildly varying linker options for OpenGL apps, if the installed OpenAL's alcCloseDevice returns void or an int, the matching Haskell type for an ALuint, etc. Cheers, S. From sven.panne at aedion.de Tue Sep 4 12:47:06 2007 From: sven.panne at aedion.de (Sven Panne) Date: Tue Sep 4 12:37:38 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <20070902205924.GA4337@matrix.chaos.earth.li> References: <20070901115444.GA28872@cvs.haskell.org> <200709011724.52017.sven.panne@aedion.de> <20070902205924.GA4337@matrix.chaos.earth.li> Message-ID: <200709041847.07442.sven.panne@aedion.de> On Sunday 02 September 2007 22:59, Ian Lynagh wrote: > On Sat, Sep 01, 2007 at 05:24:51PM +0200, Sven Panne wrote: > [...] > > To be usable, a speedup of at least factor 10 > > would be required. Is there any hope for this? > > Patch application can be sped up with a planned change of hunk format. > If downloading lots of small patch files individually is the problem > then darcs could tar them up when checkpointed tags are made. Will this improve the time for a "./darcs-all pull" for the GHC repository, too? With extra-libraries it takes over 20 minutes for me, even though only 3-4 small patches have actually been pulled and my 2Mbit DSL line sits basically idle during the whole time. This is fairly disappointing... :-( Cheers, S. From jgbailey at gmail.com Tue Sep 4 12:57:11 2007 From: jgbailey at gmail.com (Justin Bailey) Date: Tue Sep 4 12:47:45 2007 Subject: Request for code review - Knuth Morris Pratt for Data.Sequence In-Reply-To: References: Message-ID: Using the code developed for ByteStrings by myself, Christ Kuklewicz and Daniel Fischer, I've implemented Knuth-Morris-Pratt substring searching on Data.Sequence "Seq" values. Attached you'll find the library in kmp.zip.safe. The algorithm is implemented in the module Data.Sequence.KMP. At the root, SpeedTest.hs can be compiled on Windows with the "prof_compile.bat" file included (you'll need to install the "regex-dfa" and "regex-base" packages from Hackage to build). SpeedTest searches for a known value in the 7 MB file "endo.dna" (which can be downloaded from http://www.icfpcontest.org/endo.zip) using several different algorithms and methods: strict and lazy bytestrings, regular expressions, and the KMP algorithm for "Seq" values. SpeedTest is pretty fast but I worry about its space usage. It may just be the nature of Seq values, but I cannot get it to run in constant space when I think it should be. I'm especially interested in help here. All comments and feedback are welcome. Since the zip file includes a darcs context file, feel free to send patches. Justin -------------- next part -------------- A non-text attachment was scrubbed... Name: kmp.zip.safe Type: application/octet-stream Size: 9785 bytes Desc: not available Url : http://www.haskell.org/pipermail/libraries/attachments/20070904/d680edf8/kmp.zip.obj From igloo at earth.li Tue Sep 4 13:20:52 2007 From: igloo at earth.li (Ian Lynagh) Date: Tue Sep 4 13:11:21 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <200709041847.07442.sven.panne@aedion.de> References: <20070901115444.GA28872@cvs.haskell.org> <200709011724.52017.sven.panne@aedion.de> <20070902205924.GA4337@matrix.chaos.earth.li> <200709041847.07442.sven.panne@aedion.de> Message-ID: <20070904172052.GA9046@matrix.chaos.earth.li> On Tue, Sep 04, 2007 at 06:47:06PM +0200, Sven Panne wrote: > On Sunday 02 September 2007 22:59, Ian Lynagh wrote: > > On Sat, Sep 01, 2007 at 05:24:51PM +0200, Sven Panne wrote: > > [...] > > > To be usable, a speedup of at least factor 10 > > > would be required. Is there any hope for this? > > > > Patch application can be sped up with a planned change of hunk format. > > If downloading lots of small patch files individually is the problem > > then darcs could tar them up when checkpointed tags are made. > > Will this improve the time for a "./darcs-all pull" for the GHC repository, > too? With extra-libraries it takes over 20 minutes for me, even though only > 3-4 small patches have actually been pulled and my 2Mbit DSL line sits > basically idle during the whole time. I'm not sure what's happening there, but I can't think why that should be slow. It might be related to the bug that means that "darcs send" patches have way too much context in. Either way, it ought to be fixable. It's just a matter of developer time. Thanks Ian From duncan.coutts at worc.ox.ac.uk Tue Sep 4 13:24:09 2007 From: duncan.coutts at worc.ox.ac.uk (Duncan Coutts) Date: Tue Sep 4 13:12:36 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <200709041825.03457.sven.panne@aedion.de> References: <20070901115444.GA28872@cvs.haskell.org> <200709011724.52017.sven.panne@aedion.de> <37df87420709031816l1235c65dt2778796a4786574f@mail.gmail.com> <200709041825.03457.sven.panne@aedion.de> Message-ID: <1188926649.10322.305.camel@localhost> On Tue, 2007-09-04 at 18:25 +0200, Sven Panne wrote: > On Tuesday 04 September 2007 03:16, Peter Gavin wrote: > > [...] And if you consider that > > Cabal is trying to be autoconf, automake, and libtool, and make, in a > > single tool, there will be that many more kinks to work out :) > > Huh? Cabal tries to be autoconf? Unless I've missed something, there's no way > Cabal can help me to figure out the paths to OpenGL headers, the wildly > varying linker options for OpenGL apps, Well, if the library you're binding to happens to be one of the 350+ that use pkg-config then cabal can indeed find the paths to the headers and the varying linker options (by asking pkg-config). pkgconfig-depends: gtk+-2.0 >= 2.8, cairo This is a new feature in Cabal-1.2. Of course it doesn't help for libs like GL that don't provide pkg-config files. > if the installed OpenAL's alcCloseDevice returns void or an int, That's trickier. > the matching Haskell type for an ALuint, etc. c2hs can do that. Sure, it's not an autoconf replacement and though you can use Setup.lhs to do some of those things, the api is not sufficiently stable to recommend using that yet. Duncan From sven.panne at aedion.de Tue Sep 4 13:50:24 2007 From: sven.panne at aedion.de (Sven Panne) Date: Tue Sep 4 13:40:56 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <1188926649.10322.305.camel@localhost> References: <20070901115444.GA28872@cvs.haskell.org> <200709041825.03457.sven.panne@aedion.de> <1188926649.10322.305.camel@localhost> Message-ID: <200709041950.24995.sven.panne@aedion.de> On Tuesday 04 September 2007 19:24, Duncan Coutts wrote: > Well, if the library you're binding to happens to be one of the 350+ > that use pkg-config then cabal can indeed find the paths to the headers > and the varying linker options (by asking pkg-config). > > pkgconfig-depends: gtk+-2.0 >= 2.8, cairo > > This is a new feature in Cabal-1.2. > > Of course it doesn't help for libs like GL that don't provide pkg-config > files. I once tried to use pkg-config, but it was a total desaster: It suffers from versionitis itself, various distros put various (sometimes wrong) things into the package configuration files, dynamic vs. static linking dependencies are usually totally broken, it is not available on Windows (Mac OS X?) etc. In general, I think that pkg-config is a nice idea, but typically poorly maintained. (Anyone remembers problems with xmkmf? ;-) > > if the installed OpenAL's alcCloseDevice returns void or an int, > > That's trickier. > > > the matching Haskell type for an ALuint, etc. > > c2hs can do that. Hmmm, just another build-time dependency... Note that I don't claim that autotools are perfect or beautiful, but I haven't seen something substantially better yet. (Yes, I know scons, cmake, ...) Cheers, S. From simons at cryp.to Tue Sep 4 14:07:05 2007 From: simons at cryp.to (Peter Simons) Date: Tue Sep 4 13:57:49 2007 Subject: ByteString I/O Performance References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> Message-ID: <87bqci719y.fsf@write-only.cryp.to> Donald Bruce Stewart writes: >> hGet :: Handle -> ByteString -> IO ByteString >> hGet h buf = do i <- Str.unsafeUseAsCStringLen buf (\(p,n) -> hGetBuf h p n) >> return (Str.unsafeTake i buf) > > That's a useful benchmark. Thanks for looking into this. It was a pleasure. How do you feel about providing this kind of input API to the user as part of the module? Note that the current hGet can be implemented on top of the combinator above, but the reverse is not true. So I guess that this API is strictly more powerful than the one we currently have. And for some use cases, it's significantly faster too. By the way, did you receive my e-mail about the race condition in readFile? Best regards, Peter From duncan.coutts at worc.ox.ac.uk Tue Sep 4 14:28:49 2007 From: duncan.coutts at worc.ox.ac.uk (Duncan Coutts) Date: Tue Sep 4 14:17:17 2007 Subject: ByteString I/O Performance In-Reply-To: <87bqci719y.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> Message-ID: <1188930529.10322.315.camel@localhost> On Tue, 2007-09-04 at 20:07 +0200, Peter Simons wrote: > Donald Bruce Stewart writes: > > >> hGet :: Handle -> ByteString -> IO ByteString > >> hGet h buf = do i <- Str.unsafeUseAsCStringLen buf (\(p,n) -> hGetBuf h p n) > >> return (Str.unsafeTake i buf) > > > > That's a useful benchmark. Thanks for looking into this. > > It was a pleasure. How do you feel about providing this kind of > input API to the user as part of the module? We can't provide this interface as it modifies the immutable input ByteString. What you're looking for is to copy into a mutable buffer. That buffer cannot be itself a ByteString as they're immutable. The best we can do is to copy into a newly allocated buffer as we do in hGet. In the best case we can do this with a single copy. We could look at the trimming again if that's an issue. So if you want an api that provides a mutable input buffer, you'll need something other than a ByteString. We do have something like this in the binary package, which we've been considering cleaning up and bringing into the bytestring package. However even that doesn't give you a mutable buffer that you'd want for fast 'cat', that really requires an api that guarantees that the buffer is no longer in use when the next chunk is read, destroying the previous content of the buffer. > By the way, did you receive my e-mail about the race condition in > readFile? Yes, it's a good point. We could fix it my doing more reads in the same style a hGetContents. Duncan From simons at cryp.to Tue Sep 4 15:35:49 2007 From: simons at cryp.to (Peter Simons) Date: Tue Sep 4 15:26:44 2007 Subject: ByteString I/O Performance References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> Message-ID: <87642q6x62.fsf@write-only.cryp.to> Duncan Coutts writes: > We can't provide this interface as it modifies the immutable > input ByteString. Well, one could call it unsafeHGet. :-) > [Mutable input buffer] really requires an api that guarantees > that the buffer is no longer in use when the next chunk is > read, destroying the previous content of the buffer. I see your point. It would be nice if the API would guarantee that the ByteString cannot be misused. Personally, I feel that is a minor point though. When you read data into a buffer, then the previous contents of that buffer is lost. That is hardly a surprise and ByteString offers functions like 'copy' which allow the user to design his algorithms correctly. Programmers who don't want to face that problem can always use the hGet variant that creates a new buffer every time. Anyway, it's not a big thing. The modified hGet function is simple enough so that those who want it can write it themselves. Thank you for your time, Peter From apfelmus at quantentunnel.de Wed Sep 5 04:14:39 2007 From: apfelmus at quantentunnel.de (apfelmus) Date: Wed Sep 5 04:05:17 2007 Subject: ByteString I/O Performance In-Reply-To: <87642q6x62.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> Message-ID: Peter Simons wrote: > Duncan Coutts writes: > > > We can't provide this interface as it modifies the immutable > > input ByteString. > > Well, one could call it unsafeHGet. :-) > > > > [Mutable input buffer] really requires an api that guarantees > > that the buffer is no longer in use when the next chunk is > > read, destroying the previous content of the buffer. > > I see your point. It would be nice if the API would guarantee > that the ByteString cannot be misused. Personally, I feel that is > a minor point though. Remember that Haskell expressions are evaluated lazily, that's why we have the IO monad for doing input/output. Hence, mutable values that look like pure ones become unpredictable and are considered a major sin in Haskell land, please don't do it. As catBuf crucially depends on the mutability of the buffer, ByteStrings are not the right data structure to use in that case, that's all there is to it. Regards, apfelmus From simons at cryp.to Wed Sep 5 04:31:07 2007 From: simons at cryp.to (Peter Simons) Date: Wed Sep 5 04:21:51 2007 Subject: ByteString I/O Performance References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> Message-ID: <87zm017buc.fsf@write-only.cryp.to> apfelmus writes: > Remember that Haskell expressions are evaluated lazily, that's > why we have the IO monad for doing input/output. I see. Thank you for the clarification. > Hence, mutable values that look like pure ones become > unpredictable and are considered a major sin in Haskell land, > please don't do it. It feels patronizing to tell someone else what he should or shouldn't do. What can I say? Outside of Haskell land there are people who believe that software should, like, work, instead of falling apart whenever you feed it input data larger than a few kilobytes and to reach that objective those people are absolutely prepared to face the wild unpredictability of -- *gasp* -- pointers! > As catBuf crucially depends on the mutability of the buffer, > ByteStrings are not the right data structure to use in that > case, that's all there is to it. A ByteString is a pointer, a byte size, and a byte offset. As such, it is the perfect data structure for a program like catBuf. Let's agree to disagree. Best regards, Peter From apfelmus at quantentunnel.de Wed Sep 5 10:27:01 2007 From: apfelmus at quantentunnel.de (apfelmus) Date: Wed Sep 5 10:17:27 2007 Subject: ByteString I/O Performance In-Reply-To: <87zm017buc.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> <87zm017buc.fsf@write-only.cryp.to> Message-ID: Peter Simons wrote: >> Hence, mutable values that look like pure ones become >> unpredictable and are considered a major sin in Haskell land, >> please don't do it. > > It feels patronizing to tell someone else what he should or > shouldn't do. What can I say? Outside of Haskell land there are > people who believe that software should, like, work, instead of > falling apart whenever you feed it input data larger than a few > kilobytes and to reach that objective those people are absolutely > prepared to face the wild unpredictability of -- *gasp* -- > pointers! I didn't intend to patronize, I apologize for the harsh words. It's just that there's a difference between manipulating pointers peek :: Ptr Word8 -> IO Word8 -- :) poke :: Word8 -> Ptr Word8 -> IO () and breaking language semantics peek :: Ptr Word8 -> Word8 -- :( poke :: Word8 -> Ptr Word8 -> () > > As catBuf crucially depends on the mutability of the buffer, > > ByteStrings are not the right data structure to use in that > > case, that's all there is to it. > > A ByteString is a pointer, a byte size, and a byte offset. As > such, it is the perfect data structure for a program like catBuf. Not quite. ByteStrings are intended to be a memory-efficient representation of Strings and the memory efficiency is implemented in Haskell with buffers and unsafePeformIO. But great care is taken to preserve language semantics in the exported API which means that ByteStrings have to be immutable. Note that the copy function is not for assuring immutability but for handling possible space leaks. Regards, apfelmus From simons at cryp.to Wed Sep 5 15:30:02 2007 From: simons at cryp.to (Peter Simons) Date: Wed Sep 5 15:20:52 2007 Subject: ByteString I/O Performance References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> <87zm017buc.fsf@write-only.cryp.to> Message-ID: <87bqcgzz9h.fsf@write-only.cryp.to> Hey Apfelmus, I have to apologize for being overly sensitive. I had a couple of rough days and am easily frustrated at the moment. That is not your fault. I am sorry. Your illustrate example is very good. It helped me to see more clearly the point I've been trying to make but couldn't quite articulate. > peek :: Ptr Word8 -> Word8 -- :( The ByteString package offers a number of pure functions to manipulate the underlying buffer. Some of them -- like 'take' and 'drop' -- are by all means supposed to be pure, because they manipulate merely the base pointer, the offset, or the size. Those function depend only on the value of ByteString, not on the memory it references. Then there is this function: index :: ByteString -> Int -> Word8 This function does earn one of those inverse smilies. Personally, I would not have provided a dereferencing operation outside of the IO monad. My personal opinion is that an monadic 'index' would have been ever so slightly less convenient, but it would be far more robust than the function above. As far as I can tell, the only reason why a function like 'unsafeUseAsCStringLen' has to be dubbed unsafe is because 'index' makes it unsafe. The limitation that ByteString has to be immutable is a consequence of the choice to provide 'index' as a pure function. Personally, I won't use 'index' in my code. I'll happily dereference the pointer in the IO monad, because I've found that to be no effort whatsoever. I love monads. For my purposes, 'unsafeUseAsCStringLen' is a perfectly safe function. The efficient variant of 'hGet' I posted can be implemented on top of it, so that 'hGet' is by all means a safe function in my code. There really is no risk at all, unless one uses 'index' or something that's based on it. The way I see it, there will be other people who'll find the performance limitations of standard 'hGet' a decisive factor in their design decisions. Chances are, those people will wonder about using the base pointer for hGetBuf and then they'll end up re-inventing the wheel we just came up with. Maybe I'll find the time to submit a patch to the documentation, so that fine points like an optimal buffer size etc. are explained in more detail than they are right now. It would be nice if some kind of result would come out of this discussion. Anyway, thank you. I appreciate everyone's efforts in helping me figure out why I/O with ByteString is more than two times slower than it could be. Take care, Peter From duncan.coutts at worc.ox.ac.uk Wed Sep 5 16:32:29 2007 From: duncan.coutts at worc.ox.ac.uk (Duncan Coutts) Date: Wed Sep 5 16:20:53 2007 Subject: ByteString I/O Performance In-Reply-To: <87bqcgzz9h.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> <87zm017buc.fsf@write-only.cryp.to> <87bqcgzz9h.fsf@write-only.cryp.to> Message-ID: <1189024349.10322.343.camel@localhost> On Wed, 2007-09-05 at 21:30 +0200, Peter Simons wrote: > As far as I can tell, the only reason why a function like > 'unsafeUseAsCStringLen' has to be dubbed unsafe is because 'index' makes > it unsafe. The limitation that ByteString has to be immutable is a > consequence of the choice to provide 'index' as a pure function. Well, it's not just index, all the functions that get data from the ByteString, like head/tail/uncons etc etc are pure. That is the whole point of the design of ByteString, to provide pure/immutable high performance strings. What you want is just fine, but it's a mutable interface not a pure one. We cannot provide any operations that mutate an existing ByteString without breaking the semantics of all the pure operations. It's very much like the difference between the MArray and IArray classes, for mutable and immutable arrays. One provides index in a monad, the other is pure. > Personally, I won't use 'index' in my code. I'll happily dereference the > pointer in the IO monad, because I've found that to be no effort > whatsoever. I love monads. For my purposes, 'unsafeUseAsCStringLen' is a > perfectly safe function. The efficient variant of 'hGet' I posted can be > implemented on top of it, so that 'hGet' is by all means a safe function > in my code. There really is no risk at all, unless one uses 'index' or > something that's based on it. Right, or if you were to hand out a ByteString and then change the contents of it when nobody is looking then that's very much unsafe. So the point is you can break the semantics locally and nobody will notice. It's not a technique we should encourage however. > The way I see it, there will be other people who'll find the performance > limitations of standard 'hGet' a decisive factor in their design > decisions. Chances are, those people will wonder about using the base > pointer for hGetBuf and then they'll end up re-inventing the wheel we > just came up with. I'd rather not provide a quick easy way to break the semantics. unsafeUseAsCStringLen and friends are already plenty enough rope... > Maybe I'll find the time to submit a patch to the documentation, so that > fine points like an optimal buffer size etc. are explained in more > detail than they are right now. It would be nice if some kind of result > would come out of this discussion. I really don't think we can provide anything that copies into an existing pre-allocated ByteString. As far as I can see, the best we can do is to allocate a fresh buffer and do a single copy into that. Mutating an existing buffer is fine, and System.IO already provides hGetBuf. But you have to be really really careful if you create a ByteString based on the contents of that mutable buffer, without making any copy first. > Anyway, thank you. I appreciate everyone's efforts in helping me figure > out why I/O with ByteString is more than two times slower than it could > be. Thanks very much for pointing out where we are copying more than necessary. As for the last bit of performance difference due to the cache benefits of reusing a mutable buffer rather than allocating and GCing a range of buffer, I can't see any way within the existing design how we can achieve that. Bear in mind, that these cache benefits are fairly small in real benchmarks as opposed to 'cat' on fully cached files. Usually you do some actual IO and some operation on the data rather than just copying it from one file descriptor to another. For example, my lazy bytestring binding to iconv performs exactly the same as the command line iconv. In that case we are doing a bit of work on the data which swamps the cache benefits that the command line iconv prog gets from using mutable buffers. If we are trying to optimise the 'cat' case however, eg for network servers, there are even lower level things we can do so that no copies of the data have to be made at all. eg mmap or linux's copyfile or splice. ByteString certainly isn't the right abstraction for that though. Duncan From igloo at earth.li Wed Sep 5 17:15:41 2007 From: igloo at earth.li (Ian Lynagh) Date: Wed Sep 5 17:06:06 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <200709041847.07442.sven.panne@aedion.de> References: <20070901115444.GA28872@cvs.haskell.org> <200709011724.52017.sven.panne@aedion.de> <20070902205924.GA4337@matrix.chaos.earth.li> <200709041847.07442.sven.panne@aedion.de> Message-ID: <20070905211541.GA22101@matrix.chaos.earth.li> On Tue, Sep 04, 2007 at 06:47:06PM +0200, Sven Panne wrote: > > too? With extra-libraries it takes over 20 minutes for me, even though only > 3-4 small patches have actually been pulled and my 2Mbit DSL line sits > basically idle during the whole time. By the way, if you're currently pulling over SSH then you'll probably find it much faster to pull over HTTP. With no patches to pull I get this for ghc + testsuite + corelibs: ./darcs-all pull -a 2.21s user 0.42s system 1% cpu 2:49.42 total and this for extralibs: ./darcs-all pull -a 0.82s user 0.33s system 1% cpu 1:40.47 total Thanks Ian From simons at cryp.to Wed Sep 5 20:30:28 2007 From: simons at cryp.to (Peter Simons) Date: Wed Sep 5 20:21:05 2007 Subject: ByteString I/O Performance References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> <87zm017buc.fsf@write-only.cryp.to> <87bqcgzz9h.fsf@write-only.cryp.to> <1189024349.10322.343.camel@localhost> Message-ID: <87wsv4prdn.fsf@write-only.cryp.to> Duncan Coutts writes: > What you want is just fine, but it's a mutable interface not a > pure one. We cannot provide any operations that mutate an > existing ByteString without breaking the semantics of all the > pure operations. Is that so? How exactly does mutating a ByteString break the semantics of the pure function 'take'? > It's very much like the difference between the MArray and > IArray classes, for mutable and immutable arrays. One provides > index in a monad, the other is pure. Right. Now I wonder: why does ByteString provide an immutable interface but not a mutable one? Apparently mutable interfaces are useful for some purposes, right? Why else would the Array package provide one? >> Personally, I won't use 'index' in my code. I'll happily >> dereference the pointer in the IO monad [...]. For my >> purposes, 'unsafeUseAsCStringLen' is a perfectly safe >> function. > > [So you can] break the semantics locally and nobody will > notice. It's not a technique we should encourage however. Why do you keep saying that I would break semantics? The code that breaks semantics is the one that uses unsafePerformIO, and ByteString does that, not my code. When I refrain from using those parts of Data.ByteString, then my use of the underlying pointer doesn't "break semantics locally", it doesn't break them at all. > Thanks very much for pointing out where we are copying more > than necessary. You are welcome. And please don't forget the race condition in readFile. I feel that problem is actually more significant than the one we're discussing right now, but my impression is not that the report would have been taken particularly seriously. > As for the last bit of performance difference due to the cache > benefits of reusing a mutable buffer rather than allocating > and GCing a range of buffer, I can't see any way within the > existing design how we can achieve that. Exactly, with the current API that problem cannot be solved. It's a shame. > Bear in mind, that these cache benefits are fairly small in > real benchmarks as opposed to 'cat' on fully cached files. Do I understand that right? It sounds as if you were saying that -- in the general case -- allocating a new buffer for every single read() is not significantly slower than re-using the same buffer every time. Is that what you meant to say? > If we are trying to optimise the 'cat' case however, eg for > network servers, there are even lower level things we can do > so that no copies of the data have to be made at all. eg mmap > or linux's copyfile or splice. Yes, that's true, but none of these functions is remotely portable -- or even available in any of the Haskell implementations. So these things concern me considerably less than the 'hGet' function that _is_ available. Or rather, the one that apparently won't be available because we Haskell are way too clever for efficient software. > ByteString certainly isn't the right abstraction for that > though. I am sorry, but that is nonsense. A ByteString is a tuple consisting of a pointer into raw memory, an integer signifying the size of the front gap, and an integer signifying the length of the payload. That data structure is near perfect for performing efficient I/O. To say that this abstraction isn't right for the task is absurd. What you mean to say is that you don't _intend_ it to be used that way, which is an altogether different thing. Best regards, Peter From sebastian.sylvan at gmail.com Wed Sep 5 21:06:30 2007 From: sebastian.sylvan at gmail.com (Sebastian Sylvan) Date: Wed Sep 5 20:56:56 2007 Subject: ByteString I/O Performance In-Reply-To: <87wsv4prdn.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> <87zm017buc.fsf@write-only.cryp.to> <87bqcgzz9h.fsf@write-only.cryp.to> <1189024349.10322.343.camel@localhost> <87wsv4prdn.fsf@write-only.cryp.to> Message-ID: <3d96ac180709051806h31ab79a6k11ffdf40c0213b03@mail.gmail.com> On 06 Sep 2007 02:30:28 +0200, Peter Simons wrote: > Duncan Coutts writes: > > > What you want is just fine, but it's a mutable interface not a > > pure one. We cannot provide any operations that mutate an > > existing ByteString without breaking the semantics of all the > > pure operations. > > Is that so? How exactly does mutating a ByteString break the > semantics of the pure function 'take'? > Because if you mutate the original bytestring the value of the other bytestring (returned from 'take') will change. Not pure. Bad. Evil. Etc. > > It's very much like the difference between the MArray and > > IArray classes, for mutable and immutable arrays. One provides > > index in a monad, the other is pure. > > Right. Now I wonder: why does ByteString provide an immutable > interface but not a mutable one? Apparently mutable interfaces > are useful for some purposes, right? Why else would the Array > package provide one? It doesn't provide two different interfaces to the same data structure, it provides two different data structures. You can't have a pure interface AND an impure one, as the impure one could then mutate values that are used with the pure interface, which would mean that the pure interface is broken (see above). > > Bear in mind, that these cache benefits are fairly small in > > real benchmarks as opposed to 'cat' on fully cached files. > > Do I understand that right? It sounds as if you were saying that > -- in the general case -- allocating a new buffer for every > single read() is not significantly slower than re-using the same > buffer every time. Is that what you meant to say? I think he said that most of the speed difference is due to better cache performance when reusing the same buffer, but in general you do "other stuff" as well which won't be as benign for the cache and the difference will be smaller (if at all noticable). > > ByteString certainly isn't the right abstraction for that > > though. > > I am sorry, but that is nonsense. A ByteString is a tuple > consisting of a pointer into raw memory, an integer signifying > the size of the front gap, and an integer signifying the length > of the payload. That data structure is near perfect for > performing efficient I/O. To say that this abstraction isn't > right for the task is absurd. What you mean to say is that you > don't _intend_ it to be used that way, which is an altogether > different thing. A ByteString is an immutable data structure representing a string, if you need a mutable one then it's not the right abstraction *by definition*. Yes, a ByteString is not intended to be a mutable buffer, which is precisely what makes it not the right abstraction if you need that (not an "altogether different thing", it is THE thing). The fact that the internal representation would look similar to a different abstraction which did allow mutation doesn't mean that *this* abstraction is the right choice. This is analogous to Java, and C# - if you need a mutable string buffer the "string" class is not the right abstraction, you use the string builder classes. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 From bos at serpentine.com Wed Sep 5 23:50:14 2007 From: bos at serpentine.com (Bryan O'Sullivan) Date: Wed Sep 5 23:43:03 2007 Subject: ByteString I/O Performance In-Reply-To: <87wsv4prdn.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> <87zm017buc.fsf@write-only.cryp.to> <87bqcgzz9h.fsf@write-only.cryp.to> <1189024349.10322.343.camel@localhost> <87wsv4prdn.fsf@write-only.cryp.to> Message-ID: <46DF78F6.5090602@serpentine.com> Peter Simons wrote: > Right. Now I wonder: why does ByteString provide an immutable > interface but not a mutable one? Apparently mutable interfaces > are useful for some purposes, right? I understand from your earlier admission that you have been having some bad days recently, and I look forward to you kindly not sharing the fruits of said bad days with the list. Regards, I noticed that the HEAD readline package wasn't linking against the GNUreadline framework (as distributed for Mac OS X on the ghc-6.6.1 downloads page). The following patch fixes that. I've tested it on OS X (both with and without the framework), but someone should probably double-check that it works fine on non-Mac systems. Also note that for this to work correctly, it requires a bugfix that I just sent to cvs-ghc: http://permalink.gmane.org/gmane.comp.lang.haskell.cvs.ghc/23055 Thank, -Judah * Link against GNUreadline.framework, if it's available. This can be overridden by setting --with-readline-includes or --with-readline-libraries. M ./HsReadline_cbits.c -2 +1 M ./configure.ac -20 +39 M ./include/HsReadline.h -1 +7 M ./readline.buildinfo.in +1 -------------- next part -------------- A non-text attachment was scrubbed... Name: readline.patch Type: application/octet-stream Size: 7669 bytes Desc: not available Url : http://www.haskell.org/pipermail/libraries/attachments/20070905/b022dbfc/readline-0001.obj From simons at cryp.to Thu Sep 6 04:17:36 2007 From: simons at cryp.to (Peter Simons) Date: Thu Sep 6 04:08:19 2007 Subject: ByteString I/O Performance References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> <87zm017buc.fsf@write-only.cryp.to> <87bqcgzz9h.fsf@write-only.cryp.to> <1189024349.10322.343.camel@localhost> <87wsv4prdn.fsf@write-only.cryp.to> <46DF78F6.5090602@serpentine.com> Message-ID: <87lkbkyzq7.fsf@write-only.cryp.to> Bryan O'Sullivan writes: > I understand from your earlier admission that you have been > having some bad days recently, and I look forward to you > kindly not sharing the fruits of said bad days with the list. Bryan, my mood would be a lot better if you would address my technical points on this list instead of my mood. I realize this discussion is going nowhere. It's hard for me to understand how people manage to say things like "you can't do that" despite the fact that I posted code which does exactly that, but I guess I don't have to understand everything. Take care everyone, Peter From simonmarhaskell at gmail.com Thu Sep 6 06:14:00 2007 From: simonmarhaskell at gmail.com (Simon Marlow) Date: Thu Sep 6 06:04:25 2007 Subject: patch applied (packages/regex-base): Make setup script compileagain after recent Cabal changes In-Reply-To: <20070905211541.GA22101@matrix.chaos.earth.li> References: <20070901115444.GA28872@cvs.haskell.org> <200709011724.52017.sven.panne@aedion.de> <20070902205924.GA4337@matrix.chaos.earth.li> <200709041847.07442.sven.panne@aedion.de> <20070905211541.GA22101@matrix.chaos.earth.li> Message-ID: <46DFD2E8.7040409@gmail.com> Ian Lynagh wrote: > On Tue, Sep 04, 2007 at 06:47:06PM +0200, Sven Panne wrote: >> too? With extra-libraries it takes over 20 minutes for me, even though only >> 3-4 small patches have actually been pulled and my 2Mbit DSL line sits >> basically idle during the whole time. > > By the way, if you're currently pulling over SSH then you'll probably > find it much faster to pull over HTTP. With no patches to pull I get > this for ghc + testsuite + corelibs: > > ./darcs-all pull -a 2.21s user 0.42s system 1% cpu 2:49.42 total > > and this for extralibs: > > ./darcs-all pull -a 0.82s user 0.33s system 1% cpu 1:40.47 total And it's dead easy to have a local repo tree that is kept up to date by a cron job. A 'darcs-all pull -a' from the local tree then takes only a few seconds. Cheers, Simon From Christian.Maeder at dfki.de Thu Sep 6 08:28:25 2007 From: Christian.Maeder at dfki.de (Christian Maeder) Date: Thu Sep 6 08:18:55 2007 Subject: darcs patch: Link against GNUreadline.framework, if it's available. In-Reply-To: <6d74b0d20709052239s28c77458t547a45085bfdb6b5@mail.gmail.com> References: <6d74b0d20709052239s28c77458t547a45085bfdb6b5@mail.gmail.com> Message-ID: <46DFF269.5050508@dfki.de> I think your patch fix ticket: http://hackage.haskell.org/trac/ghc/ticket/1395 Thanks Christian Judah Jacobson wrote: > I noticed that the HEAD readline package wasn't linking against the > GNUreadline framework (as distributed for Mac OS X on the ghc-6.6.1 > downloads page). The following patch fixes that. I've tested it on > OS X (both with and without the framework), but someone should > probably double-check that it works fine on non-Mac systems. > > Also note that for this to work correctly, it requires a bugfix that I > just sent to cvs-ghc: > http://permalink.gmane.org/gmane.comp.lang.haskell.cvs.ghc/23055 > > Thank, > -Judah > > > * Link against GNUreadline.framework, if it's available. > This can be overridden by setting --with-readline-includes or > --with-readline-libraries. > > M ./HsReadline_cbits.c -2 +1 > M ./configure.ac -20 +39 > M ./include/HsReadline.h -1 +7 > M ./readline.buildinfo.in +1 > > > ------------------------------------------------------------------------ > > _______________________________________________ > Libraries mailing list > Libraries@haskell.org > http://www.haskell.org/mailman/listinfo/libraries From apa3a at yahoo.com Thu Sep 6 11:05:36 2007 From: apa3a at yahoo.com (Andriy Palamarchuk) Date: Thu Sep 6 10:55:59 2007 Subject: Proposal: Test.HUnit documentation (#1632) In-Reply-To: <46CDE913.50105@leiffrenzel.de> Message-ID: <638034.28862.qm@web56411.mail.re3.yahoo.com> Hi Leif, glad to see you on this list. --- Leif Frenzel wrote: > I have recently written a tutorial for using HUnit > and noticed that > there was no Haddock library documentation yet, so I > thought I might contribute some :-) This annoyed me too when I started to work with HUnit. That's great that you got to fixing this. > Please let me know if anything else > needs to be done to make this fit for the library Including the ticket URL in the message is a nice touch: http://hackage.haskell.org/trac/ghc/ticket/1632 I also suggest attaching the resulting generated Haddock HTML file to the ticket and including the direct link to it into the message as well, e.g (this one is from different ticket): http://hackage.haskell.org/trac/ghc/attachment/ticket/1611/Data-Map.html?format=raw Makes it much easier to review docs. Don't forget to embed the stylesheet. Could you add examples how to use HUnit in the Haddock documentation? Thanks, Andriy ____________________________________________________________________________________ Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow From igloo at earth.li Thu Sep 6 11:11:46 2007 From: igloo at earth.li (Ian Lynagh) Date: Thu Sep 6 11:02:08 2007 Subject: darcs patch: Link against GNUreadline.framework, if it's available. In-Reply-To: <6d74b0d20709052239s28c77458t547a45085bfdb6b5@mail.gmail.com> References: <6d74b0d20709052239s28c77458t547a45085bfdb6b5@mail.gmail.com> Message-ID: <20070906151145.GA8545@matrix.chaos.earth.li> On Wed, Sep 05, 2007 at 10:39:00PM -0700, Judah Jacobson wrote: > > * Link against GNUreadline.framework, if it's available. Thanks for the patch! I'll validate+apply. Ian From nominolo at googlemail.com Thu Sep 6 12:35:19 2007 From: nominolo at googlemail.com (Thomas Schilling) Date: Thu Sep 6 12:25:46 2007 Subject: ANNOUNCE: Cabal 1.2.0 released Message-ID: <1189096519.18535.2.camel@intothevoid> The Haskell Cabal The Common Architecture for Building Applications and Libraries. We are pleased to announce that Cabal version 1.2.0 is now available. Changes: The major new feature in this release is support for Cabal configurations. This allows package authors to more easily adopt their package descriptions to different system parameters such as operating system, architecture, or compiler. In addition, some optional features may be enabled or disabled explicitly by the package user. Many other new features and tool support has been added, among others: - Support for hscolour (haddock links to coloured source code). - Support for pkg-config (allows specifying dependencies on many C libraries). - Specification of build-tool dependencies. - Better default installation paths on Windows. - Seperate "includes" and "install-includes" fields - Install paths can be specified relative to each other. - Many more new small features, command line flags and bugfixes. For a more exhaustive list, see http://www.haskell.org/cabal/release/rc/changelog Please note that this is a .0 release, so we would appreciate any feedback or bug reports. Note also, that the hooks API changed, so it's quite likely that many non-trivial Setup.[l]hs files will break. We hope, however, that much of those files' functionality can now be expressed using configurations. This version (or a bug fix update) will be included in GHC version 6.8.1. For other Haskell implementations or older versions of GHC you can install it separately: Download: http://www.haskell.org/cabal/download.html http://haskell.org/cabal/release/rc/cabal-1.2.0.tar.gz Documentation: http://www.haskell.org/cabal/release/rc/doc/users-guide/ http://www.haskell.org/cabal/release/rc/doc/API/Cabal See both the README file and the changelog for interface changes: http://www.haskell.org/cabal/release/rc/changelog Bugs: Report bugs using our bug tracker: http://hackage.haskell.org/trac/hackage/newticket or at the libraries@haskell.org (please CC to cabal-devel@haskell.org) mailing list (note that this is subscriber only). From bulat.ziganshin at gmail.com Thu Sep 6 12:46:43 2007 From: bulat.ziganshin at gmail.com (Bulat Ziganshin) Date: Thu Sep 6 12:37:26 2007 Subject: ANNOUNCE: Cabal 1.2.0 released In-Reply-To: <1189096519.18535.2.camel@intothevoid> References: <1189096519.18535.2.camel@intothevoid> Message-ID: <1739835292.20070906204643@gmail.com> Hello Thomas, Thursday, September 6, 2007, 8:35:19 PM, you wrote: > We are pleased to announce that Cabal version 1.2.0 is now available. > The major new feature in this release is support for Cabal > configurations. our big thanks! why it not planned to be included in ghc 6.8.0? which ghc/hugs/*hc versions are compatible with this release? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com From Malcolm.Wallace at cs.york.ac.uk Thu Sep 6 12:57:55 2007 From: Malcolm.Wallace at cs.york.ac.uk (Malcolm Wallace) Date: Thu Sep 6 12:50:01 2007 Subject: ANNOUNCE: Cabal 1.2.0 released In-Reply-To: <1739835292.20070906204643@gmail.com> References: <1189096519.18535.2.camel@intothevoid> <1739835292.20070906204643@gmail.com> Message-ID: <20070906175755.14205929.Malcolm.Wallace@cs.york.ac.uk> Bulat Ziganshin wrote: > > We are pleased to announce that Cabal version 1.2.0 is now > > available. > > why it not planned to be included in ghc 6.8.0? It will be included. But ghc-6.8.0 will be called ghc-6.8.1. Regards, Malcolm From duncan.coutts at worc.ox.ac.uk Thu Sep 6 13:13:53 2007 From: duncan.coutts at worc.ox.ac.uk (Duncan Coutts) Date: Thu Sep 6 13:02:14 2007 Subject: ANNOUNCE: Cabal 1.2.0 released In-Reply-To: <1739835292.20070906204643@gmail.com> References: <1189096519.18535.2.camel@intothevoid> <1739835292.20070906204643@gmail.com> Message-ID: <1189098833.10322.376.camel@localhost> On Thu, 2007-09-06 at 20:46 +0400, Bulat Ziganshin wrote: > Hello Thomas, > > Thursday, September 6, 2007, 8:35:19 PM, you wrote: > > We are pleased to announce that Cabal version 1.2.0 is now available. > > > The major new feature in this release is support for Cabal > > configurations. > which ghc/hugs/*hc versions are compatible with this release? In theory: ghc-6.2.x - ghc-6.8.x hugs jhc However we've only actually tested ghc-6.6 and 6.8. If you find any problems with the others please report them. nhc98 support is still not read yet. Sadly there is no yhc support at all yet. Duncan From bulat.ziganshin at gmail.com Thu Sep 6 13:11:54 2007 From: bulat.ziganshin at gmail.com (Bulat Ziganshin) Date: Thu Sep 6 13:02:27 2007 Subject: ANNOUNCE: Cabal 1.2.0 released In-Reply-To: <20070906175755.14205929.Malcolm.Wallace@cs.york.ac.uk> References: <1189096519.18535.2.camel@intothevoid> <1739835292.20070906204643@gmail.com> <20070906175755.14205929.Malcolm.Wallace@cs.york.ac.uk> Message-ID: <164721299.20070906211154@gmail.com> Hello Malcolm, Thursday, September 6, 2007, 8:57:55 PM, you wrote: > It will be included. But ghc-6.8.0 will be called ghc-6.8.1. thank you. please keep us informed about new names for 6.8.1, 7.0.0 and so on :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com From bos at serpentine.com Thu Sep 6 13:25:01 2007 From: bos at serpentine.com (Bryan O'Sullivan) Date: Thu Sep 6 13:17:56 2007 Subject: ByteString I/O Performance In-Reply-To: <87lkbkyzq7.fsf@write-only.cryp.to> References: <87d4x62jkh.fsf@write-only.cryp.to> <20070830222058.GB13953@cse.unsw.EDU.AU> <87ps107i0s.fsf@write-only.cryp.to> <1188809768.10322.255.camel@localhost> <87y7fnv8do.fsf@write-only.cryp.to> <20070903201919.GB15345@cse.unsw.EDU.AU> <87bqci719y.fsf@write-only.cryp.to> <1188930529.10322.315.camel@localhost> <87642q6x62.fsf@write-only.cryp.to> <87zm017buc.fsf@write-only.cryp.to> <87bqcgzz9h.fsf@write-only.cryp.to> <1189024349.10322.343.camel@localhost> <87wsv4prdn.fsf@write-only.cryp.to> <46DF78F6.5090602@serpentine.com> <87lkbkyzq7.fsf@write-only.cryp.to> Message-ID: <46E037ED.30508@serpentine.com> Peter Simons wrote: > Bryan, my mood would be a lot better if you would address my > technical points on this list instead of my mood. I find that people are more responsive when not being flamed, and when they get the sense that their helpful words aren't falling on deaf ears. > I realize this discussion is going nowhere. It's hard for me to > understand how people manage to say things like "you can't do > that" despite the fact that I posted code which does exactly > that, but I guess I don't have to understand everything. I think the manner in which people are talking past each other revolves around what's *possible* versus what's *sensible*. It is perfectly true that you can take an existing ByteString and smoosh its innards however you like, because the authors sensibly made this possible. However, doing so breaks referential transparency, so it's not encouraged as a general principle. The fact that a ByteString has a pointer to a nice flat piece of memory inside is an implementation detail, and doesn't change the fact that it's intended to be immutable. If you personally know for sure that you're not going to accidentally screw yourself by running a lazy computation on a ByteString that you then modify while the computation is still thunked, then by all means, go to town. But any claim that this is generally a safe thing to do would be completely wrong. It's very much in the realm of "here's the gun, there's your foot, be careful". References: <87642q6x62.fsf@write-only.cryp.to> <87zm017buc.fsf@write-only.cryp.to> <87bqcgzz9h.fsf@write-only.cryp.to> <1189024349.10322.343.camel@localhost> <87wsv4prdn.fsf@write-only.cryp.to> <46DF78F6.5090602@serpentine.com> <87lkbkyzq7.fsf@write-only.