From bulat.ziganshin at gmail.com Thu Jan 17 12:18:24 2008 From: bulat.ziganshin at gmail.com (Bulat Ziganshin) Date: Thu Jan 17 12:23:07 2008 Subject: Proposal: hands off the base! :) In-Reply-To: <478E8C95.7020500@gmail.com> References: <478E8C95.7020500@gmail.com> Message-ID: <295830643.20080117201824@gmail.com> Hello Twan, Thursday, January 17, 2008, 2:00:37 AM, you wrote: > An often requested function is 'split', to split a list into parts delimited by > some separator. ByteString has the functions split and splitWith for this > purpose. I propose we add equivalents to Data.List: one more proposal to ruin all the programs that define this function themselves. i wonder whether all who propose to add function or two ever seen MissingH package? it includes a lot of such popular functions and anyone who needs them can install this package (or just borrow code) i make a contra-proposal - fix base library in 6.8 state and add to ghc distribution new libs with all the popular Monad, List and any other functions. this will allow: 1) precisely control which functions are available by importing exact versions of all libs (except of base which anyway will be frozen) 2) use all the new functions regardless of ghc version you are using. otherwise, adding `split` function to the base actually means that noone except for core hackers can use it for a year - because all these changes in base will go in production GHC version at the end of 2008 and because using user-defined split function will automatically make program incompatible with next GHC version overall, i want to use GHC for production, open-source programming and can formalize requirements which will allow to do this: 1) frozen base library interface, except for GHC.* modules 2) if we want to improve some base code, we *duplicate* it into new lib (with modified module names), publish first version of this libarry with exactly original code (and therefore equivalent interfaces) and then start to improve it, publishing newer and newer versions. imagine, for example, that we want to improve Data.Array.*: step 1: create library NewArray with modules Data.NewArray.* copied one-to-one from Data.Array.* and publish it as version 1 step 2: raise NewArray version to 2.0 and start making changes. once we've finished, raise version to 3.0 and keep interfaces for 3.* intact so anyone can import NewArray 3.* and got latest version with exactly the same interface as he used yes, this means that every functionality improved against the base package, should be installed in two versions - one inside base one in new package. but this is very natural taking into account that we can't change base without breaking all the code that relies on it. so, if we want new arrays, Handles or Exceptions - we will need to keep old version in base and add new one in other lib. overall, it should be recommended to not import directly anything from base but use separate libraries instead 3) GHC distribution should include all the popular libs (which, at the last end, should terminate rushes to include "popular functions" into the base!) with *MULTIPLE* versions - i.e. last 1.* version of NewArray, last 2.* version and so on. this will ensure that program developed in year 2007, will continue to compile with newest ghc versions in 2008, 2009 and so on. we can drop library from ghc distribution after, say, 3 years. i also propose that Haskell' ccommittee will decide every year which libs to include into Haskell standard libs set with exact major version. for example: year 2007: BS 1.*, Collections 2.*, HDBC 1.* year 2008: BS 2.*, Collections 2.*, HSQL 1.* year 2009: BS 2.*, Collections 2.*, HSQL 2.* ghc-2009 should include std libs from last 3 years, i.e. BS 1.*/2.*, Collections 2.*, HDBC 1.*, HSQL 1.*/2.* ghc-2010 may drop BS 1.* and HDBC 1.* support and of course should add newer libs from HL-2010 standard. the same should do other haskell compilers. this will significantly improve situation with Haskell standard libraries: 1) Haskell' committee will not need to develop artificial "standard libraries" set - actually, i think it can't and anyway any fixed set will become obsolete next year. libraries are most important part of any language and nowadays we can't develop proper stdlibs set "by committee". it should be made by community and committee should only "sign up" final results 2) every haskell distribution will include some guaranteed minimum of common, up-to-date libraries. any program written using H2009 specifications, will continue to run with any major Haskell compilers for a next 3 years. this should overcome "libraries hell" for mid-sized apps 3) any book or courses teaching Haskell can declare, for example, that it investigates Haskell-2009 and any Haskell2009-compatible compiler will provide both the syntax and libs discussed in the book. it will also mean that when you hire "Haskell2009-certified" specialist, you will be sure that he knows not only the language itself but also basic set of libs, equivalent of STL for C++ defining large standard set of libs was the major source of success for Java/C#/C++ last decade. we can go one step further and join development by community with standardization by committee. it should make Haskell better solution for developing large, long-standing products - by providing RICH set of MODERN STANDARD libs, which are guaranteed to run across compilers and years. criteria of inclusion library in this set are (obvious): 1) popularity (this can be fairly measured by downloads/installations/ user votes) 2) open-source, free license, unix/win and ghc/hugs compatibility -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com From dons at galois.com Thu Jan 17 14:15:42 2008 From: dons at galois.com (Don Stewart) Date: Thu Jan 17 14:15:48 2008 Subject: Proposal: hands off the base! :) In-Reply-To: <295830643.20080117201824@gmail.com> References: <478E8C95.7020500@gmail.com> <295830643.20080117201824@gmail.com> Message-ID: <20080117191542.GE10636@scytale.galois.com> bulat.ziganshin: > i make a contra-proposal - fix base library in 6.8 state and add to > ghc distribution new libs with all the popular Monad, List and any > other functions. this will allow: > > 2) if we want to improve some base code, we *duplicate* it into new > lib (with modified module names), publish first version of this > libarry with exactly original code (and therefore equivalent > interfaces) and then start to improve it, publishing newer and newer > versions. imagine, for example, that we want to improve Data.Array.*: > > step 1: create library NewArray with modules Data.NewArray.* copied > one-to-one from Data.Array.* and publish it as version 1 Data.Array is in the external 'array' package now. -- Don From laurent.deniau at cern.ch Mon Jan 21 04:01:35 2008 From: laurent.deniau at cern.ch (Laurent Deniau) Date: Mon Jan 21 03:58:37 2008 Subject: Proposal: hands off the base! :) In-Reply-To: <295830643.20080117201824@gmail.com> References: <478E8C95.7020500@gmail.com> <295830643.20080117201824@gmail.com> Message-ID: <47945F6F.8060403@cern.ch> Bulat Ziganshin wrote: > step 1: create library NewArray with modules Data.NewArray.* copied > one-to-one from Data.Array.* and publish it as version 1 > > step 2: raise NewArray version to 2.0 and start making changes. once > we've finished, raise version to 3.0 and keep interfaces for 3.* > intact so anyone can import NewArray 3.* and got latest version with > exactly the same interface as he used Why not use some convention on the version number, a bit like in the Linux kernel (odd minor = dev, even minor = stable)? Instead of creating NewArray, just tag Array with version 1.0.x and create a new branch 1.1.y as a development version of 1.0.x. Your NewArray 2.0 proposal will then become Array numbered 1.2.x for stable version and 1.3.y for development (unstable) version (your version 3.0 would become 1.4.x and 1.5.y respectively). Moving to versions 2.0.x - 2.1.y would mean a major change of all standard libraries (cross-cutting changes) like moving to features of the last standard or reflecting a reorganization of the standard classes. It would allow to keep track of many dev + stable releases in parallel without changing the libraries names. It also allows to release development version more often than stable version. Best regards, ld. From johan.tibell at gmail.com Mon Jan 21 04:13:24 2008 From: johan.tibell at gmail.com (Johan Tibell) Date: Mon Jan 21 04:13:14 2008 Subject: Proposal: hands off the base! :) In-Reply-To: <295830643.20080117201824@gmail.com> References: <478E8C95.7020500@gmail.com> <295830643.20080117201824@gmail.com> Message-ID: <90889fe70801210113p285698e7x46368e9f849a0ea0@mail.gmail.com> Hi Bulat, On Jan 17, 2008 6:18 PM, Bulat Ziganshin wrote: > step 1: create library NewArray with modules Data.NewArray.* copied > one-to-one from Data.Array.* and publish it as version 1 Having words like "new" for the purpose of versioning is quite confusing because a library which is new at some point will eventually become old and then the name is misleading. Versioning doesn't belong in module/function names IMHO. -- Johan From johan.tibell at gmail.com Wed Jan 23 07:12:55 2008 From: johan.tibell at gmail.com (Johan Tibell) Date: Wed Jan 23 07:12:51 2008 Subject: [Haskell-cafe] Has character changed in GHC 6.8? In-Reply-To: <4c88418c0801230347ic5c2869j1710de4badcea075@mail.gmail.com> References: <4795B764.8010305@therning.org> <20080122154508.GA4826@matrix.chaos.earth.li> <47960507.4060507@telenet.be> <87abmwkh8a.fsf@nmd9999.imr.no> <4797131C.4050602@telenet.be> <47971D51.3000704@jellybean.co.uk> <90889fe70801230259r4af63ea1o6ab10bee9e333813@mail.gmail.com> <47972141.8070700@jellybean.co.uk> <4c88418c0801230347ic5c2869j1710de4badcea075@mail.gmail.com> Message-ID: <90889fe70801230412k66a56bc6y14bff1fdb1d2c1e0@mail.gmail.com> > > > > What *does* matter to the programmer is what encodings putStr and > > > > getLine use. AFAIK, they use "lower 8 bits of unicode code point" which > > > > is almost functionally equivalent to latin-1. > > > > > > Which is terrible! You should have to be explicit about what encoding > > > you expect. Python 3000 does it right. > > > > Presumably there wasn't a sufficiently good answer available in time for > > haskell98. > > Will there be one for haskell prime ? The I/O library needs an overhaul but I'm not sure how to do this in a backwards compatible manner which probably would be required for inclusion in Haskell'. One could, like Python 3000, break backwards compatibility. I'm not sure about the implications of doing this. Maybe introducing a new System.IO.Unicode module would be an option. If one wants to keep the interface but change the semantics slightly one could define e.g. getChar as: getChar :: IO Char getChar = getWord8 >>= decodeChar latin1 Assuming latin-1 is what's used now. The benefit would be that if the input is not in latin-1 an exception could be thrown rather than returning a Char representing the wrong Unicode code point. I recommend reading about the Python I/O system overhaul for Python 3000 which is outlined in PEP 3116 http://www.python.org/dev/peps/pep-3116/ My proposal is for I/O functions to specify the encoding they use if they accept or return Chars (and Strings). If they deal in terms of bytes (e.g. socket functions) they should accept and return Word8s. Optionally, text I/O functions could default to the system locale setting. -- Johan From johan.tibell at gmail.com Wed Jan 23 07:43:54 2008 From: johan.tibell at gmail.com (Johan Tibell) Date: Wed Jan 23 07:43:37 2008 Subject: [Haskell-cafe] Has character changed in GHC 6.8? In-Reply-To: <479731C9.6020400@jellybean.co.uk> References: <4795B764.8010305@therning.org> <47960507.4060507@telenet.be> <87abmwkh8a.fsf@nmd9999.imr.no> <4797131C.4050602@telenet.be> <47971D51.3000704@jellybean.co.uk> <90889fe70801230259r4af63ea1o6ab10bee9e333813@mail.gmail.com> <47972141.8070700@jellybean.co.uk> <4c88418c0801230347ic5c2869j1710de4badcea075@mail.gmail.com> <90889fe70801230412k66a56bc6y14bff1fdb1d2c1e0@mail.gmail.com> <479731C9.6020400@jellybean.co.uk> Message-ID: <90889fe70801230443u91fca12v80de5bee6494b021@mail.gmail.com> > > The benefit would be that if the input is not in latin-1 an exception > > could be thrown rather than returning a Char representing the wrong > > Unicode code point. > > I'm not sure what you mean here. All 256 possible values have a meaning. You're of course right. So we don't have a problem here. Maybe I was thinking of an encoding (7-bit ASCII?) where some of the 256 values are invalid. > > My proposal is for I/O functions to specify the encoding they use if > > they accept or return Chars (and Strings). If they deal in terms of > > bytes (e.g. socket functions) they should accept and return Word8s. > > I would be more inclined to suggest they default to a particular well > understand encoding, almost certainly UTF8. Another interface could give > access to other encodings. That might be a good option. However, it would be nice if beginners could write simple console programs using System.IO and have them work correctly even if their system's encoding is not byte compatible with UTF-8. People who do I/O over the network etc. need to be more careful and should specify the encoding used. How would a UTF-8 default work on different Windows versions? > > Optionally, text I/O functions could default to the system locale > > setting. > > That is a disastrous idea. I'm not sure about that as long as decode is called on the input to make sure that it's a valid encoding given the input bytes. Same point as above. What I would like to avoid is having to write: main = do putStrLn systemLocalEncoding "What's your name?" name <- getLine systemLocalEncoding putStrLn systemLocalEncoding $ "Hi " ++ name ++ "!" I guess we could solve this by putting the functions in different modules: System.IO -- requires explicit encoding System.IO.DefaultEncoding -- implicit use of system locale setting And have the modules export the same functions. Another option would be to include the fact that encoding is implied in the name of the function. Maybe we should start by giving some type signatures and function names. That often helps my thinking. I'll try to write something down when I get home from work. -- Johan From ketil+haskell at ii.uib.no Wed Jan 23 09:15:56 2008 From: ketil+haskell at ii.uib.no (Ketil Malde) Date: Wed Jan 23 09:16:04 2008 Subject: [Haskell-cafe] Has character changed in GHC 6.8? In-Reply-To: <90889fe70801230443u91fca12v80de5bee6494b021@mail.gmail.com> (Johan Tibell's message of "Wed\, 23 Jan 2008 13\:43\:54 +0100") References: <4795B764.8010305@therning.org> <47960507.4060507@telenet.be> <87abmwkh8a.fsf@nmd9999.imr.no> <4797131C.4050602@telenet.be> <47971D51.3000704@jellybean.co.uk> <90889fe70801230259r4af63ea1o6ab10bee9e333813@mail.gmail.com> <47972141.8070700@jellybean.co.uk> <4c88418c0801230347ic5c2869j1710de4badcea075@mail.gmail.com> <90889fe70801230412k66a56bc6y14bff1fdb1d2c1e0@mail.gmail.com> <479731C9.6020400@jellybean.co.uk> <90889fe70801230443u91fca12v80de5bee6494b021@mail.gmail.com> Message-ID: <87abmwiq5f.fsf@nmd9999.imr.no> "Johan Tibell" writes: >>> The benefit would be that if the input is not in latin-1 an exception >>> could be thrown rather than returning a Char representing the wrong >>> Unicode code point. >> I'm not sure what you mean here. All 256 possible values have a meaning. OTOH, going the other way could be more troublesome, I'm not sure that outputting a truncated value is what you want. > You're of course right. So we don't have a problem here. Maybe I was > thinking of an encoding (7-bit ASCII?) where some of the 256 values > are invalid. Well - each byte can be converted to the equivalent code point, but 0x80-0x9F are control characters, and some of those are left undefined. Perhaps instead of truncating on output, we should map code points > 0xFF to such a value? E.g. 0x81 is undefined in both Unicode and Windows 1252. -k -- If I haven't seen further, it is by standing in the footprints of giants From johan.tibell at gmail.com Wed Jan 23 09:20:47 2008 From: johan.tibell at gmail.com (Johan Tibell) Date: Wed Jan 23 09:20:39 2008 Subject: [Haskell-cafe] Has character changed in GHC 6.8? In-Reply-To: References: <4795B764.8010305@therning.org> <47960507.4060507@telenet.be> <87abmwkh8a.fsf@nmd9999.imr.no> <4797131C.4050602@telenet.be> <47971D51.3000704@jellybean.co.uk> <90889fe70801230259r4af63ea1o6ab10bee9e333813@mail.gmail.com> <47972141.8070700@jellybean.co.uk> <4c88418c0801230347ic5c2869j1710de4badcea075@mail.gmail.com> <90889fe70801230412k66a56bc6y14bff1fdb1d2c1e0@mail.gmail.com> Message-ID: <90889fe70801230620l203efeb5td529ffa7e4df1f07@mail.gmail.com> On Jan 23, 2008 2:11 PM, Magnus Therning wrote: > Yes, this reflects my recent experience, Char is not a good representation > for an 8-bit byte. This thread came out of my attempt to add a module to > dataenc[1] that would make base64-string[2] obsolete. As you probably can > guess I came to the conclusion that a function for data encoding with type > 'String -> String' is plain wrong. :-) Yes. Functions that deal with bytes shouldn't use Char. Char should be seen as and ADT representing Unicode code points. It has nothing to do with bytes. -- Johan