summary refs log tree commit diff stats
path: root/lib/pure/net.nim
diff options
context:
space:
mode:
authorMarius Andra <marius.andra@gmail.com>2024-05-08 17:33:43 +0200
committerGitHub <noreply@github.com>2024-05-08 09:33:43 -0600
commite6f66e4d131fb8cf3236df6cb14e219dbb5d0c77 (patch)
treeeb07246f455353bc5bd842e1bfd2559fb75221f9 /lib/pure/net.nim
parente662043fd1c43e8447fb7c30e97823f7d0a46a83 (diff)
downloadNim-e6f66e4d131fb8cf3236df6cb14e219dbb5d0c77.tar.gz
fixes 12381, HttpClient socket handle leak (#23575)
## Bug

Fixes https://github.com/nim-lang/Nim/issues/12381 - HttpClient socket
handle leak

To replicate the bug, run the following code in a loop:

```nim
import httpclient
while true:
    echo "New loop"
    var client = newHttpClient(timeout = 1000)
    try:
        let response = client.request("http://10.44.0.4/bla", httpMethod = HttpPost, body = "boo")
        echo "HTTP " & $response.status
    except CatchableError as e:
        echo "Error sending logs: " & $e.msg
    finally:
        echo "Finally"
        client.close()
```

Note the IP address as the hostname. I'm directly connecting to a
plausible local IP, but one that does not resolve, as I have everything
under 10.4.x.x.

The output looks like this to me:

```
New loop
Error sending logs: Operation timed out
Finally
New loop
Error sending logs: Operation timed out
Finally
New loop
...
```

In Nim 2.0.4, running the code above leaks the socket:

<img width="944" alt="Screenshot 2024-05-05 at 22 00 13"
src="https://github.com/nim-lang/Nim/assets/53387/ddac67db-d7df-45e6-b7a5-3d42f79775ea">

## Fix

With the added line of code, each old socket is cleanly removed:

<img width="938" alt="Screenshot 2024-05-05 at 21 54 18"
src="https://github.com/nim-lang/Nim/assets/53387/5b0b4b2d-d4f0-4e74-a9cf-74aec0c50d2e">

I believe the line below, `closeUnusedFds(ord(domain))` was supposed to
clean up the failed connection attempts, but it failed to do so for the
last one, assuming it succeeded. Yet it didn't. This fix makes sure
failed connections are closed immediately.

## Tests 

I don't have a test with this PR. When testing locally, the
`connect(lastFd, ..)` call on line 2032 blocks for ~75 seconds, ignoring
the http timeout. I fear any test I could add would either 1) take way
too long, 2) one day run in an environment where my randomly chosen IP
is real, yielding in weird flakes.

The only bug i can imagine is if running `lastFd.close()` twice is a bad
idea. I tested by actually running it twice, and... no crash/op? So
seems safe? I'm hoping the CI run will be green, and this will be
enough. However I'm happy to take feedback on how I should test this,
and do the necessary changes.

~Edit: looks like a test does fail, so moving to a draft while I figure
this out.~ Attempt 2 fixed it.
Diffstat (limited to 'lib/pure/net.nim')
-rw-r--r--lib/pure/net.nim2
1 files changed, 2 insertions, 0 deletions
diff --git a/lib/pure/net.nim b/lib/pure/net.nim
index d77ab5db1..7f6a36557 100644
--- a/lib/pure/net.nim
+++ b/lib/pure/net.nim
@@ -2040,8 +2040,10 @@ proc dial*(address: string, port: Port,
   if success:
     result = newSocket(lastFd, domain, sockType, protocol, buffered)
   elif lastError != 0.OSErrorCode:
+    lastFd.close()
     raiseOSError(lastError)
   else:
+    lastFd.close()
     raise newException(IOError, "Couldn't resolve address: " & address)
 
 proc connect*(socket: Socket, address: string,