Proposal for a new I/O library design

John Meacham john@repetae.net
Tue, 29 Jul 2003 03:39:59 -0700


On Tue, Jul 29, 2003 at 10:19:21AM +0200, Sven Panne wrote:
> Ben Rudiak-Gould wrote:
> >On Mon, 28 Jul 2003, Simon Marlow wrote:
> >>[...] lookupFileByPathname must open the file, without knowing whether the
> >>file will be used for reading or writing in the future.
> >I know; I'm hoping against hope that this isn't an insurmountable problem.
> 
> Well, I fear it is, at least on POSIX...
> 
> >If the OS provides a "reopen" function which is like open except that it
> >takes a file handle instead of a pathname,
> 
> On POSIX, I'm not aware of anything like that, only dup/dup2, but you
> can't change the access mode after duplicating the fd (at least fcntl
> on Linux is not capable of doing it).

fcntl(2) (a wonderful catch-all for everything files) can change the
flags on an open file. however, it cannot change access modes on all
systems.

> >[...] a File contains a handle with minimal access permissions
> >and maximal sharing permissions,
> 
> The next problem: How should one get a file descriptor on POSIX without
> knowing the access mode in advance? If the file is not readable O_RDONLY
> will fail, if it is only writeable O_WRONLY will fail, O_RDWR is even
> worse... OK, we could stat the file first, but there is no guarantee
> that the file permissions are still the same when we later want to
> "reopen" it.
> 
> >[...] If there's a way to open files by unique ID instead of pathname, that
> >would also work.
> 
> I'm not aware of this on POSIX (open a file by inode/fs?).

yeah. this is not possible in POSIX and even if it did exist, i would
imagine it would interact oddly with non-unixy and network filesystems.

> >[...] All we need here is a way to change the access and sharing rights on 
> >an
> >already-open handle. I find it hard to believe that after decades of use
> >by millions of people, the UNIX file API provides no way to do this
> >safely.
> 
> Personally, I think this is a sign that one is heading towards the wrong 
> direction...
> :-)

yeah. a part of the problem, is that with some filesystems, access
rights are not a quality of the file itself, but of it's name. for
instance /foo and /bar might point to the same file but one is
read-only. this is generally not the case on traditional unix
filesystems. but is sometimes the prefered method of access control on
some systems. in fact, in EROS it was the ONLY method of access control.
knowing somethings name gave you power over it and things could have
several names :). what fun.

> >[...] What are the practical problems with relying on finalizers? As far 
> >as I
> >can see, the "no more filehandles available" problem is completely solved
> >by forcing a major GC and trying again when it occurs.
> 
> But on quite a few systems there is an upper limit on the *global* number of
> open files, so you would be a "bad citizen" for such a system.

also, with many OS architectures, operations involving fds get slower as
the number of fds increase. they have to be looked up in some sort of
data structure on the kernel side and searching for the lowest free one
when allocating can be slow. treat fd's as precious. never hold
something open longer than necisary. if nothing else it clutters up the
lsof output and obscusiates what your program is doing when viewed by
system tools.

> >>How did you intend text encodings to work?  I see several possibilities:
> >>
> >>  textDecode :: TextEncoding -> [Octet] -> [Char]
> >>
> >>or
> >> 
> >>  decodeInputStream :: TextEncoding -> InputStream -> TextInputStream
> >>  getChar :: TextInputStream -> IO Char
> >>  etc.
> >>
> >>or
> >> 
> >>  setInputStreamCoding :: InputStream -> TextEncoding -> IO ()
> >>  getChar :: InputStream -> IO Char
> >
> >
> >I was thinking of the second. It could easily be implemented as the third
> >under the hood. But I was hoping someone else would worry about it. :-)
> 
> In the non-IO versions you have a problem if the encoder/decoder encounters
> an error because of a malformed InputStream. In the IO case one can simply
> raise an IO exception. And using "Maybe TextInputStream" won't help, because
> this would essentially make the encoder/decoder strict in its InputStream
> argument.

please can we figure out portable binary IO before worrying about i18n?
the problems are relativly orthogonol, but the 'right thing to do' for
i18n is not as clear, and arguably not as important since with portable
binary IO one can implement any sort of character processing on top of
it. it is my opinion that haskell dropped the ball big big time by
specifying IO in terms of undefined OS character set encodings. always
raw binary would have been so much more useful. you can always write a
bit more code to do \r\n <-> \n or convert utf8 to Chars yourself, but
there is no way to ever turn an undefined operation into anything
useful. 

just some thoughts and a (hopefully small and somewhat relevant) rant.
:)     -John

-- 
---------------------------------------------------------------------------
John Meacham - California Institute of Technology, Alum. - john@foo.net
---------------------------------------------------------------------------