IO::Select can_read() without timeout hangs whereas can_read(<timeout>) works


perl version: 5.8.8 and up
Operating System: i486-linux-gnu-thread-multi

Using a generic IO::Select loop, I somehow managed to come up with a child that exits so the IO::Select loop doesn't notice. The basic loop (note that this is a pretty standard select loop, and comparable to the examples you'll see in any select docs):

    open(local *nostdin, '<', '/dev/null') or die $!;
    my $pid = open3('<&nostdin',
       my $outH = gensym(),
       my $errH = gensym(),
       $command
    );

    my $sel = IO::Select->new();
    $sel->add($outH);
    $sel->add($errH);

    # Select loop
    while (my @ready = $sel->can_read()) {
        foreach my $handle (@ready) {
            my $bytesRead = sysread($handle, my $buf='', 1024);
            if (!defined($bytesRead) || $bytesRead==0) {
                warn("Error reading command out $!\n  Command: $command\n") if $bytesRead;
                $sel->remove($handle);
                next;
            }
            syswrite(STDERR,"OUT [$buf]) if $handle==$outH;
            syswrite(STDERR,"ERR [$buf]) if $handle==$errH;
        }
    }
When the child exits, it should close it's I/O, and the select should see that it's closed so the loop terminates. This is not the case for a specific command I am running that completes fine if run from the command line. It's a complicated makepp/makefile that runs a proprietary CAD command.

Oddly what we see instead is that we see the $outH close (and we remove it from select) but then the next call to can_read() hangs, and then we see the child exit but can_read() never responds.

Interestingly, if we replace the "can_read()" with a "can_read(<timeout>)" then we see the can_read(<timeout>) calls timeout for about ten seconds (even after we can see the child has exited with waitPid()!) then can_read(<timeout>) eventually realize that $errH is closed.

The above loop works with most commands, but this hang is seen when we run a layer of perl scripts that are proprietary to the company I work at (though they are mostly just build/environment scripts) - and it's a large undertaking to try to edit them down to see what is causing the problem, so I'm not really going to ever get a chance to root cause or file this bug. Bummer, since it's still showing up in perl v5.20