<div dir="ltr"><div>Thank you very much for the detailed explanation. It surely was an enlightenment for me. Especially the comments<br><br><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">
Java makes "obtain print version as a string" the basic form and
"append print version to output stream" a derived form.<br></blockquote><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">Smalltalk makes "append print version to output stream"<br>
the basic form and "obtain print version as a string" a derived form.</blockquote><br></div>I wonder, this seems a good example of how such a seemingly 'small' design decision can mean so much in terms of performance difference.<br>
<div><div class="gmail_extra"><br></div><div class="gmail_extra">Surely, in Java I will have to take a detour to get around this handicap by implementing something 'shows' on my own or may be use an equivalent function from some library BUT in any case I will have to be aware of this.<br>
<br></div><div class="gmail_extra">Please give me a more general principle that can be learnt from this design lesson, which you might think of. It doesn't quite seem to be the 'kiss' principle i.e. "keep it simple stupid".<br>
</div><div class="gmail_extra">What is really behind the Smalltalk's decision?<br></div><div class="gmail_extra"><br clear="all"></div><div class="gmail_extra"><div>Thanks and regards,<br>-Damodar Kulkarni<br></div>
<br><br><div class="gmail_quote">On Tue, Sep 3, 2013 at 12:06 PM, Richard A. O'Keefe <span dir="ltr"><<a href="mailto:ok@cs.otago.ac.nz" target="_blank">ok@cs.otago.ac.nz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
On 3/09/2013, at 5:17 PM, damodar kulkarni wrote:<br>
> I didn't want to clutter that thread so I am asking a question here.<br>
> Where do I find foundational and/or other good references on the topic of "stream interface vs string interface to convert objects to text"? I tried google but failed.<br>
<br>
</div>It has to do with the cost of string concatenation.<br>
<br>
Let's take a simple thing.<br>
A Set of Strings.<br>
<br>
Smalltalk has<br>
<br>
String<br>
printOn: aStream<br>
aStream nextPut: $'.<br>
self do: [:each |<br>
each = $' ifTrue: [aStream nextPut: $'].<br>
aStream nextPut: each].<br>
aStream nextPut: $'.<br>
<br>
(Smalltalk uses '' for strings with quote doubling for embedded<br>
quotes and has no character escapes. s nextPut: c writes character<br>
c to stream s. do: [:...] is a loop.)<br>
<br>
Set<br>
printOn: aStream<br>
|started|<br>
started := false.<br>
aStream nextPutAll: 'a Set ('.<br>
self do: [:each |<br>
started ifTrue: [aStream space] ifFalse: [started := true].<br>
each printOn: aStream].<br>
aStream nextPut: $'.<br>
<br>
Object<br>
printString<br>
|stream|<br>
stream := WriteStream on: (String new: 40).<br>
self printOn: stream.<br>
^stream contents<br>
<br>
(A WriteStream is an internal stream. It starts with the<br>
array-like object you give it and grows it, typically by<br>
doubling, as needed. `contents' returns the part actually<br>
written to.)<br>
<br>
Let's actually do something subtly different.<br>
Each Set will contain the printString of a number<br>
and also another set, so<br>
a Set('3' a Set('2' a Set('1' a Set())))<br>
<br>
s := Set new.<br>
1 to: 1000*1000 do: [:i |<br>
s := Set with: s with: i printString].<br>
s printOn: Transcript.<br>
<br>
The set having been created, how much allocation is done<br>
by the output phase? *NONE*. And the time is *LINEAR*<br>
in the size of the output.<br>
<br>
To summarise: Smalltalk makes "append print version to output stream"<br>
the basic form and "obtain print version as a string" a derived form.<br>
The result is that printing (acyclic) objects of any size takes time<br>
linear in the size of the output.<br>
<br>
Now consider the Java version.<br>
Java makes "obtain print version as a string" the basic form and<br>
"append print version to output stream" a derived form.<br>
<br>
There's a nasty little wrinkle which is that "foo".toString() is<br>
"foo" instead of the expected "\"foo\"" in Java. So the output<br>
will be [3, [2, [1, []]] or something like that.<br>
<br>
String {<br>
String toString() {<br>
return this;<br>
}<br>
}<br>
<br>
HashSet {<br>
String toString() {<br>
StringBuilder b = new StringBuilder();<br>
boolean started = false;<br>
b.append("[");<br>
for (Object o: this) {<br>
if (started) b.append(", "); else started = true;<br>
b.append(o.toString());<br>
}<br>
b.append("]");<br>
return b.toString();<br>
}<br>
}<br>
<br>
This takes *QUADRATIC* time, but it's annoyingly hard to demonstrate<br>
because it keeps crashing with a stack overflow for even quite small n.<br>
Thankfully, the -Xss option comes to the rescue.<br>
<br>
n time (seconds)<br>
100 0.16<br>
200 0.18<br>
500 0.22<br>
1000 0.30<br>
2000 0.62<br>
5000 2.58<br>
10000 12.08<br>
20000 51.54<br>
<br>
Not only does it take an obscene amount of time to print a large<br>
object, it turns over a totally unwarranted amount of space.<br>
<br>
Now you might object that sets like this are not realistic.<br>
If you are trying to write large circuits or abstract syntax<br>
trees or the like to a file, or using this *style* even if<br>
not this specific *interface* to write XML documents, the<br>
example errs by being unrealistically *easy* for Java.<br>
It's easy to understand what's going wrong.<br>
<br>
Consider [2, [1, []]]<br>
When visiting {}, we create "[]". So far so good.<br>
When visiting {1, {}}, we *copy* the "[]" into "[1, []]".<br>
When visiting {2, {1, {}}}, we *copy* the "[1, []]" into<br>
"[2, [1, []]]".<br>
And so it goes. This does an awful lot of copying and<br>
*none* of it is needed given a sane interface.<br>
<br>
What is the take of Haskell on this?<br>
There is a *reason* 'shows' exists.<br>
<br>
newtype Date = Date (Int,Int,Int)<br>
instance Show Date<br>
where showsPrec _ (Date (y,m,d)) rest =<br>
"Date (" ++ shows y ("," ++ shows m ("," ++ shows d (")" ++ rest)))<br>
<br>
The only uses of ++ here are where the left operand is a short literal.<br>
Using this approach, Haskell programs can produce strings in linear time<br>
and linear space.<br>
<br>
For lazy I/O, using shows in Haskell is a good analogue of using<br>
#printOn: in Smalltalk. The basic form is "include this as PART of<br>
a stream", with "convert this to a whole string" as a derived form.<br>
<br>
What the equivalent of this would be for Iteratees I don't yet<br>
understand.<br>
<br>
<br>
</blockquote></div><br></div></div></div>