[Haskell-cafe] Re: Allowing hyphens in identifiers

Mon Dec 14 21:04:43 EST 2009

On Dec 14, 2009, at 5:11 PM, Daniel Fischer wrote:
> 1. I wasn't playing in the under_score vs. camelCase game, just  
> proposing a possible
> reason why the camelCase may have been chosen for Haskell's standard  
> libraries.

But the insanely abbreviated example did not provide such a reason.
You still haven't explained what the reason is supposed to be:  it
can't be that baStudlyCase salvages the readability of abbreviation
because it doesn't.  Indeed, it makes it worse, because you can't
always tell where one abbreviation ends and another begins.

In teaching an information retrieval paper, one of my favourite examples
is unionised.  Does it mean
	(union+ise)+ed	"having had the workers organised into a union"
or	un+(ion+ised)   "not having had its molecules turned into ions".
When I mean the latter, I always write un-ionised.

Now consider an actual Java class name,
where I genuinely didn't know what the answer was.
	INSURL
baStudlyCaps style for Java doesn't allow underscores in class names.
(This is actual Sun code.)  Is this something to do with insurance?
Is this something to do with URLs for the US Immigration and
Nationalization Service?  Are Inertial Navigation Systems involved?
Is the mention of 'URL' anything to do with URLs, or should this be
parsed something like (I) (NSU) (RL)?  With underscores, the actual
parsing, INS_URL, would be unambiguous.

Or take NVList, another real name.  Is it an NV_List (where I don't
know what NV is), an N_V_List (where I don't know what N and V are),
or an N_VList (where I do know what a vlist is).  In fact it's a
Name_Value_List.  I _might_ have had a clue with N_V_List...

My point here is that if you separate words with spaces, dots,
hyphens, underscores, backslashes, or almost anything, you are going
to have _much_ less trouble with abbreviations than if you just jam
them together baStudlyCaps-style.

As for my "parody" of baStudlyCaps, thatIsExactlyHowItLooksToMe.
>
> I think you could find that written in many texts on aesthetic  
> relativism.

They are empirically wrong.

> Both are judgments based on their respective preferences and nothing  
> else

I disagree.  Sometimes, people can articulate _why_ they like or dislike
things.  For example, I like anything spacious and bright.  This  
explains
very well why I prefer landscapes (spacious) to portraits (not  
spacious).
When it comes to depictions of plants, animals, people, and so on, I
prefer healthy to unhealthy, friendly appearance to hostile/dangerous.
Given that, you could probably predict my response to most paintings
fairly well.  If I and anyone I personally knew disagreed about which of
two paintings was "better", I would expect to find that we quickly  
reached
agreement about what features were _present_ to what _degree_ and about
the technical standard of the work (on a rather coarse scale, but  
enough).
The differences could be explained by the relative _weights_ we gave to
the various features.  Just as I have learned how to prepare tea and  
cook
onions so that my wife will enjoy them, although I dislike the one and
detest the other, so I would expect to be able to learn how to predict
someone's aesthetic taste fairly well.

Maybe we do agree.  It wasn't clear whether by "preferences" you meant
"weights" or "outcomes".  The thing is, if "preferences" means  
"outcomes",
there's no reason to expect that people will ever agree, whereas if it
means "weights", then it should be possible to find or construct  
examples
differing in a single feature where two people with different weights
will agree on which is better.

In the same way, when it comes to coding style, it may well be that
we are responding to the same objective properties of styles, but
weighting them differently.  It appears, for example, that we both
perceive abbreviation, and we both give it a negative weight.  It is
therefore to be expected that given two versions of a program in which
the *only* thing changed is the degree and/or nature of abbreviation,
we'll agree which is better and which is worse.

For me to accept "personal preference" as a final explanation of
something would be to accept an end to rational investigation.
>>
>> If it were just a matter of experience, then this experience should
>> surely have taught me to love baStudlyCaps.
>
> No. It should have tought you to *read* camelCase - unless your  
> aversion is so strong that
> you actively refuse to learn reading it.

Where did you ever get the idea that I can't *read* baStudlyCaps?
Just because I can read it doesn't mean that I can't read something
else *better*.  Life is hard enough without accepting unnecessary
difficulties, even if they are moderately small ones.
>>
>>> Sourcecode is so different from ordinary text (a line of sourcecode
>>> rarely
>>> contains more than four or five words),

I gave the wrong response to that yesterday.  Later in the common room
I realised what the perfect answer is:

	newspapers are ordinary text,
	newspaper columns are typically four or five words across.

The number of words per line is therefore not a useful way to
distinguish source code from ordinary text.

>>
>> baStudlyCaps doesn't read any better with short lines.
>
> I have no trouble reading either version. And that although this is  
> not what camelCase is
> intended for (as far as I know, the purpose of it is to mark word  
> boundaries within *one
> token* [identifier]).

You missed the point of the example, which was that those words were
joined (either by underscores or baStudlyJunctions) which formed
sensible units.  The junctions were not arbitrary.

[1]
> So? Whitespace helps tokenising and thus increases readability (for  
> me, at least).
>
[2]
> What's the relation to the question whether camel case and  
> underscore are readable or not?

In quotation [1], you concede the argument against baStudlyCaps.
If white space helps finding units of meaning and thus increases
readability for you, then white-or-functionally-white space
should help finding units of meaning in program text, and
baStudlyCaps should be less readable than separated_words.

The only way to have your cake and eat it is to deny that the
words making up a compound identifier _are_ units of meaning
that should be perceived as such, or at least the only way that
I can see.  This seems an odd position to hold.

>> "Persaude a man against his will, he's of the same opinion still."
>> How _much_ evidence?
>
> Replicated studies with enough participants from enough different  
> environments/cultures
> showing that  more than 99% of the participants find it clearly more  
> readable.

OK, there is no point in my continuing this.
Such a level of study is not practically attainable.
>
> That's due to the *objectively*; for such a strong claim, you need  
> unusually strong evidence.

This is NOT one of those extraordinary claims that require extraordinary
evidence.  It's an entirely humdrum claim that what makes ordinary text
more readable makes something strongly resembling ordinary text more
readable, and as such, perfectly ordinary experimental evidence should  
do.
>

> I take the widespread presence of both as an indication that the  
> majority isn't very
> large, so you'd have a little work to do to convince me.

You are making the assumption that the word separation style of
programmers reflects their OWN initial preference.  I am aware of no
reason to believe that.  People writing Pascal (which didn't _have_
an underscore because there wasn't one in the 6-bit character set it
was designed for) or Smalltalk (which didn't _have_ an underscore
because there wasn't one in the 7-bit character set it was designed
for) simply didn't have a choice.  Java's designers seem to have
fairly mindlessly copied Smalltalk, and Java's users _have_ to use
Java's vast range of predefined baStudlyCaps identifiers, so in
effect have no choice.  (Unless like me they have a preprocessor.)

I dare to say that we agree that Java has many advantages over some
of its rivals, so that using an uncomfortable word separation style
may be compensated for by something else (such as NetBeans or Eclipse,
maybe).

I'm quite sure that we agree that Haskell has *huge* advantages.
The unfortunate word separation style offsets that enough that it's
worth programming around (using a preprocessor, for example), but
not enough to make me stop using it.
>>
>>

>> You're asking me to sacrifice readability everywhere else
>> for the sake of one line in every 2850?  (Not that I do
>> find that line more readable in basStudlyCaps.)
>
> Not at all. What gave you that idea?

The form of your argument.
>

> You prefer to read and write code in underscore style. Others prefer  
> camel case.
> Without an easy way to convert, at least one group won't be happy.

But there *IS* an easy way to convert.

> If I can help improving it and making it more usable, I'd be happy  
> to (there are a couple of points where the transformation is not  
> trivial, {-# OPTIONS_GHC #-}, foreign import).

Changes are not made inside {-...-} or "...", only to Haskell  
identifiers.

There's one bug I'm aware of:  --<symbol> is treated as a comment.