From simonpj at microsoft.com Thu Mar 15 04:24:02 2007 From: simonpj at microsoft.com (Simon Peyton-Jones) Date: Thu Mar 15 04:23:58 2007 Subject: [Hs-Generics] FW: [Haskell-cafe] SYB vs HList (again) Message-ID: Dear generics mailing list This message from Alex should be of interest, if you have not already seen it. (Alex: this group is trying to create a single library --- or perhaps a couple with well-articulated tradeoffs --- for generic programming. Maybe you should join it.) Simon -----Original Message----- From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of S. Alexander Jacobson Sent: 14 March 2007 23:40 To: haskell-cafe@haskell.org Subject: [Haskell-cafe] SYB vs HList (again) Right now I am looking at using either SYB (Scrap Your Boilerplate) or HList Records to eliminate boilerplate for: * parsing URLEncoded strings into application data structures * generating XML/JSON from application data structures * handling adding new fields to serialized data structures * creating indexed collections of application data structures I am looking for insights from people here on which approach they think is better and why. Here are my current thoughts on the issue == Both HList and SYB require data structures redesign == HLists require you to define Labels and basically only use label values that are themselves either scalar or HLists. SYB basically requires the same thing except that you use data/newtype to define labels instead of HLists more cumbersome label constructions. == Defaults: HList gives you compile time errors, SYB only runtime errors == SYB does not seem to provide a way of having the compiler tell you that you are accessing a field that is unavailable in the type. HList will give you a type error if you do that. I don't know exactly how HList handles default values but I assume you can restrict use of those values to explicit deserialization contexts. Is that correct? == HList allows more informality == With HList you can specify the type of a record ad hoc using obj::(Record Foo .*. Bar .*.Baz). SYB requires that you define data structures in separate data declarations. It would be really nice if there was some way to tell Haskell that HLists have no more fields than the ones you happen to be getting and setting in your code. Effectively that would mean you get data structure inference not just function type inference which would be really cool! That is probably not possible but it couldn't hurt to ask (Oleg?). == SYB doesn't require template haskell to make it usable == With SYB you create field labels using newtype (or data) declarations e.g. data Salary = S {salary::Float} With HList, label declarations are really verbose e.g. data SalaryLabel deriving(Typeable) type Salary = Field (Proxy SalaryLabel) Int salary = proxy :: Proxy FooLabel You can make this more concise using TemplateHaskell but TH looks alien and adds fear to the use of any code. == Performance issues == SYB requires a linear traversal of all field elements using dynamic to get or transform a value. HList traverses an HCons list. I don't how bad this is as compared with traditional data structure access using pattern matching or field labels. My current bias is towards using HList because if we are going to force a conversion to a new data structure convention I'd rather have the typesystem on my side. With SYB it is too easy to let haskell field labels creep in to your data definitions and end up with subtle errors to correct. Any opinions on these issues would be very appreciated. -Alex- _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe From oleg at pobox.com Fri Mar 16 02:18:47 2007 From: oleg at pobox.com (oleg@pobox.com) Date: Fri Mar 16 02:19:58 2007 Subject: [Hs-Generics] FW: [Haskell-cafe] SYB vs HList (again) In-Reply-To: Message-ID: <20070316061847.E1F25AD3A@Adric.metnet.fnmoc.navy.mil> [Please follow-up to generics@haskell.org] S. Alexander Jacobson wrote: > HLists require you to define Labels and basically only use label > values that are themselves either scalar or HLists. > ... > With SYB you create field labels using newtype (or data) declarations > e.g. > > data Salary = S {salary::Float} > > With HList, label declarations are really verbose e.g. > > data SalaryLabel deriving(Typeable) > type Salary = Field (Proxy SalaryLabel) Int > salary = proxy :: Proxy FooLabel Actually there is no requirement that HList record names must be scalar `labels', must be Proxies and require such a complex declaration. From HList's high point of view, any collection can be a record provided the type of each item is unique and there is some way to extract the value associated with that type. The HList library provides two implementations of Records (and there was one more, obsolete now). There could be more. For example, I have just committed a yet another implementation, http://darcs.haskell.org/HList/src/RecordD.hs Here a record is a list of things that have a type and a value and provide a way to extract that value. The example from the end of this file seems worth quoting: > data Name = Name String String deriving Show > newtype Salary = S Float deriving Show > data Dept = D String Int deriving Show > > person = (Name "Joe" "Doe") .*. (S 1000) .*. (D "CIO" 123) .*. emptyRecord > > -- could be derived automatically, like Typeable... > instance Fieldish Name (String,String) where > fromField (Name s1 s2) = (s1,s2) > instance Fieldish Salary Float where > fromField (S n) = n > instance Fieldish Dept (String,Int) where > fromField (D s n) = (s,n) > > test1 = show person > -- When a field acts as a label, only its type matters, not the contents > test2 = person .!. (Name undefined undefined) > test3 = person .!. (undefined::Salary) > test5 = person .!. (D "xxx" 111) > I don't know exactly how HList handles default values but I assume you > can restrict use of those values to explicit deserialization contexts. > Is that correct? I'm not sure what you mean about the restriction of default values to deserialization contexts. Anyway, HList provides a left-biased union of two records: hLeftUnion r1 r2 is the record r1 augmented with all the fields from r2 that didn't occur in r2. One may consider r2 to be the record with default fields and the corresponding values. > It would be really nice if there was some way to tell Haskell that > HLists have no more fields than the ones you happen to be getting and > setting in your code. Effectively that would mean you get data > structure inference not just function type inference which would be > really cool! I'm not sure I follow. Could you outline an example of the code you wish work? Incidentally, a lot of the library depends on the record types being members of some specific classes. One can define > newtype ClosedRecord = ClosedRecord r To make a ClosedRecord to be a record from which we can extract the values of some fields, we merely need to say > instance HasField l r v => HasField l (ClosedRecord r) v > where hLookupByLabel l (ClosedRecord r) v = hLookupByLabel l r v Since we did not make this record the member of HExtend or HAppend, it is not extensible. From oleg at pobox.com Sun Mar 25 20:52:33 2007 From: oleg at pobox.com (oleg@pobox.com) Date: Sun Mar 25 20:53:12 2007 Subject: [Hs-Generics] SYB vs HList (again) In-Reply-To: Message-ID: <20070326005233.B8F94AD35@Adric.metnet.fnmoc.navy.mil> S. Alexander Jacobson wrote: > I'd like be able to do something like this: > > $(label Salary Int) -- template haskell to define salary label > main = do > person <- readFile "blah" >>= return . read > print $ person # salary > > In this case, haskell would assume that person has only one label, > salary. The read function would ignore all the other labels. If I > changed the code to this: > > $(label Salary Int) -- template haskell to define salary label > $(label Name String) > > main = do > person <- readFile "blah" >>= return . read > print $ person # salary > print $ show (person::Name .*. Salary) > > Then the code would assume that a person has both a name and a salary. I see. You would like the pattern of using a data structure would tell the reader which data structure it should have read. This reminds me of how `read' is supposed to be used, although it doesn't quite work when polymorphism is involved (cf. Num-erals). This is an interesting problem; I should think about it. BTW, in the second example, you supply an annotation `Name .*. Salary'. If you're willing to do that, the problem can be solved then. I mean a function asShapeOf that is operationally an identity. You would use it like let _ = person `asShapeOf` (undefined::Name .*. Salary) or let _ = person `asShapeOf` (Name .*. Salary) If you find that approach appropriate, it could be easily implemented. > Separately, I would really like hrecords not to have order dependency. > It seems strange to me that (Foo .*. Bar .*. HNil) is a different type > from (Bar .*. Foo .*. HNil). That is indeed strange, and inevitable to the way record polymorphism is attained. One normally does not care about this distinction, because many (most) of polymorphic record consumers are sufficiently polymorphic and so accept either type. That is, if a function accepts any subtype of the particular record type, then the issue of the order of record fields should not arise. One may think of the type of a record with permuted fields as being a subtype of the original record type. There are however many cases where the record type should be closed. Most frequently that case arises when we want to store records in a data structure (e.g., a list). This is also the case of functions like (==) which takes two arguments of the same type (and won't take a subtype). In the OOHaskell library, we have a coercion function (Actually, we have a bunch of such functions, which compute either a meet or a join of several record types). That function will rearrange the fields if necessary (as well as remove extra fields). In HList, it could be implemented via h2projectByLabels. So, one may write [rec1, coerce rec2, coerce rec3] and not to care of the order of fields in rec2 or rec3. Granted, the current implementations of the coercion functions are quite inefficient; I've been meaning to re-write them for more than a year. Perhaps now is the time...