[Haskell-cafe] Call for discussion: OverloadedLists extension

Tue Sep 25 10:25:31 CEST 2012

Simon Peyton-Jones wrote:
> |  I remember a similar discussion a few years ago. The question of
> whether
> |  or not overloading list literals a good idea notwithstanding, the
> problem
> |  with this is that fromList for vectors is highly inefficient. So if
> |  something like this gets implemented and if vector/array literals are
> one
> |  of the main motivations then I really hope there will be no lists
> |  involved.
>
> Would you like to remind us why it is so inefficient?  Can't the vector
> construction be a fold over the list?  Ah... you need to know the *length*
> of the list, don't you?  So that you can allocate a suitably-sized vector.
>  Which of course we do for literal lists.
>
> So what if fromList went
> 	fromList :: Int -> [b] -> a b
> where the Int is the length of the list?

That's part of a problem. There are really two aspects to it. Firstly, a
naive list-based implementation would be a loop. But when I write ([x,y]
:: Vector Double) somewhere in an inner loop in my program, I *really*
don't want a loop with two iterations at runtime - I want just an
allocation and two writes. I suppose this could be solved by doing
something like this:

  {-# INLINE fromList #-}
  fromList [] = V.empty
  fromList [x] = V.singleton x
  fromList [x,y] = ...
  -- and so on up to 8? 16? 32?
  fromList xs = fromList_loop xs

But it's ugly and, more importantly, inlines a huge term for every literal.

The other problem is with literals where all values are known at compile
time. Suppose I have ([2.5,1.4] :: Vector Double) in an inner loop. Here,
I don't want a complicated CAF for the constant vector which would have to
be entered on every loop iteration. I'd much rather just have a pointer to
the actual data somewhere in memory and use that. This is more or less
what happens for strings at the moment, even though you have to use
rewrite rules to get at the pointer which, in my opinion, is neither ideal
nor really necessary. IMO, the "right" design shouldn't rely on rewrite
rules. Also, strings give you an Addr# whereas vector supports ByteArray#,
too.

Since enumerated literals have been mentioned in a different post, I'll
just mention that the Enum class as it is now can't support those
efficiently for arrays because there is no way to determine either the
length or the nth element of [x..y] in constant time. This would have to
be fixed.

Roman