[database-devel] Request for Feedback on COPY IN/OUT support for postgresql-simple

Leon Smith leon.p.smith at gmail.com
Fri Jul 12 02:09:40 CEST 2013


Ok, I stubbed out an approach to implementing an async-based binding to
libpq,  sans implementation of course:

https://github.com/lpsmith/postgresql-libpq/commit/96c82197b558019de44cc4b5ef728b95de99f0ab

What do you think?


> The libpq source may be helpful.  Take a look at PQexecFinish in
> fe-exec.c, which reveals some interesting considerations:
>
>  * When PQexec is given multiple SQL commands, it discards all but the
> last PGresult, but it combines error messages from these results.
>

>  * When PQexec encounters a result status of PGRES_COPY_*, it stops
> returning results, to allow the application to perform the transfer.
>
> One thing to bear in mind is that threadWait* can be interrupted on Unix
> (that's one motivation for using the async API in the first place).  You'll
> have to figure out what 'exec' should do when interrupted while waiting for
> a result.  Perhaps it should use PQcancel to cancel the request, then
> continue waiting for results.  If interrupted yet again, a subsequent
> 'exec' will need to call PQgetResult to discard results from the
> interrupted query before calling PQsendQuery again.
>

Yeah, I did plan to read the libpq source;  in any case thanks for the
comments, they are helpful.

Async exceptions do present a thorny semantic problem;   but at least my
use cases would basically assume the connection is dead.    I mean,  didn't
we have this conversation with respect to (timeout (readMVar foo)) and
(timeout (readChan foo)),  in that you should basically then treat the
resource as a dead resource due to the fact that there is some small
possibility that you got interrupted after you took the resource but before
you've actually returned it?

Calling PQcancel and waiting it out seem problematic from the point of view
of my use cases as well:   namely detecting a flaky network and cases where
the backend was restarted in a timely fashion.   (Although,  come to think
of it,  the latter case should be quickly detectable without timeouts,  so
perhaps I should investigate deeper into why my clients are taking so long
to reconnect to a restarted server.)

However,  perhaps there are other good justifications for attempting more
complex behavior,  along the lines you suggest.

Best,
Leon


On Fri, Jun 28, 2013 at 5:12 AM, Joey Adams <joeyadams3.14159 at gmail.com>wrote:

> On Thu, Jun 27, 2013 at 11:40 PM, Leon Smith <leon.p.smith at gmail.com>wrote:
>
>> Yeah, eventually adding some level of understanding of the formats would
>> be nice,   but honestly my immediate need is a higher-performance (and
>> somewhat specialized) replication solution;  I only need a tiny bit of
>> understanding of the data I'm shipping from server to server.    But at the
>> same time,  a complete solution would need to deal with all three formats
>> and the numerous flavors thereof,  not just CSV.  I don't doubt that the
>> current FromField/ToField interface isn't particularly well suited to
>> dealing with copy data,  though.  =)
>>
>
> In response to a proposal to add COPY escaping functionality to libpq, Tom
> Lane pointed out that "one of the key requirements for anything dealing
> with COPY is that it be fast" [1].  postgresql-copy-escape as-is provides a
> faster way to insert data than multi-row INSERT.  We don't need to support
> all of the formats for this.  I didn't even implement COPY TO support since
> PostgreSQL returns result sets pretty efficiently through the normal
> interface.
>
> Your case is even simpler: send COPY data from one server to another.
>  This shouldn't require packing/unpacking fields.
>
>  [1]: http://www.postgresql.org/message-id/19641.1331821069@sss.pgh.pa.us
>
> As for threadWaitWrite,  I understand that,  but the code is actually dead
>> code at the moment because postgresql-simple doesn't use libpq connections
>> in asynchronous mode;  it's just there for when postgresql-simple starts
>> using async libpq calls.    Which is something I would like to do,  but it
>> would also be a disaster on Windows at the moment.  I'm thinking perhaps
>> that the most sensible thing to do would be to reimplement the blocking
>> libpq calls in haskell using nonblocking libpq calls,  and then only use
>> this module on unix systems.    Hopefully that would cut down on the CPP
>> hackery.
>>
>
> The libpq source may be helpful.  Take a look at PQexecFinish in
> fe-exec.c, which reveals some interesting considerations:
>
>  * When PQexec is given multiple SQL commands, it discards all but the
> last PGresult, but it combines error messages from these results.
>
>   * When PQexec encounters a result status of PGRES_COPY_*, it stops
> returning results, to allow the application to perform the transfer.
>
> One thing to bear in mind is that threadWait* can be interrupted on Unix
> (that's one motivation for using the async API in the first place).  You'll
> have to figure out what 'exec' should do when interrupted while waiting for
> a result.  Perhaps it should use PQcancel to cancel the request, then
> continue waiting for results.  If interrupted yet again, a subsequent
> 'exec' will need to call PQgetResult to discard results from the
> interrupted query before calling PQsendQuery again.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/database-devel/attachments/20130711/040170cd/attachment.htm>


More information about the database-devel mailing list