It runs on my machine 05

November 10, 2025

Previously

Previously, in post 4, I worked on getting the plugin to send properly formatted completion requests to the llama.cpp server. The requests now include the prompt field, context window settings (nPrefix and nSuffix), and all the parameters that llama.vim uses.

But there's a problem: the plugin fires a completion request on every single keystroke. Type "hello" and you've just sent five HTTP requests to the server. This is wasteful, slow, and not how good completion plugins work.

Here's what needs to happen:

Debouncing - Wait for the user to pause typing before sending a request
Cancellation - Cancel in-flight requests when new ones come in
Request avoidance - Skip requests entirely in certain contexts (whitespace, comments, etc.)

The InfillHttpClient already has a cancelPreviousRequest() method that I built in post 3, but I'm not actually using it anywhere. Time to fix that.

Five requests for five characters

Let me add some logging to see what's actually happening. I'll modify the getCompletion method to log every request:

private suspend fun getCompletion(prefix: String, suffix: String, prompt: String): String {
    return withContext(Dispatchers.IO) {
        try {
            logger.info("Making completion request for prompt: '$prompt'")
            // ... rest of the method

Now when I type "func" in a PHP file, the logs show:

Making completion request for prompt: 'f'
Making completion request for prompt: 'fu'
Making completion request for prompt: 'fun'
Making completion request for prompt: 'func'

Four requests. For four characters. And if the server is slow, the completions might arrive out of order - the completion for "f" might show up after I've already typed "func". Not ideal.

Cancelling requests that will never be used

The current implementation of the InfillHttpClient already calls cancelPreviousRequest() at the start of every post() call! So this should already be working, right?

But before I pat myself on the back, let me think about how this actually works. In JavaScript, I'd write something like:

const controller = new AbortController();
const response = await fetch(url, { signal: controller.signal });

// To cancel from another context:
controller.abort();

This works because fetch() is non-blocking - the await doesn't freeze the thread, it just suspends the coroutine. And AbortController provides a clean, thread-safe way to signal cancellation.

But in my Kotlin code, I'm using HttpURLConnection. Let me look at the critical line in InfillHttpClient.kt:

val responseCode = connection.responseCode

This looks innocent, but connection.responseCode is a blocking call. Even though I'm using withContext(Dispatchers.IO) to run on a background thread, that thread is blocked, waiting for the server to respond.

So when I call cancelPreviousRequest() from InfillHttpClient.kt:38-41:

fun cancelPreviousRequest() {
    currentConnection?.disconnect()
    currentConnection = null
}

I'm calling disconnect() on a connection that might be blocked in another thread. Let me research whether this is even safe.

The disconnect() problem

A bit of search showed a Stack Overflow discussion about calling disconnect() from another thread. The answer is not encouraging:

"Instances of this class are not thread safe." - HttpURLConnection documentation

So calling disconnect() while another thread is blocked on getResponseCode() is undefined behavior. It might work, it might not, it might crash.

Even worse, disconnect() often doesn't help when the thread is waiting for a response. It only works once data starts flowing from the server.

So my "working" cancellation code is actually a race condition waiting to happen based on undefined behavior.

What should I use instead?

HttpURLConnection is ancient (it's been around since Java 1.1, released in 1997). It predates modern async I/O patterns, has no clean cancellation support, and isn't thread-safe.

I need a modern HTTP client that supports:

Proper async/non-blocking requests
Clean cancellation from any thread
Thread-safe operations

Since I'm building an IntelliJ plugin, I'm going to use Ktor - JetBrains' own HTTP client library. It's built specifically for Kotlin coroutines with proper suspend function support and automatic cancellation when coroutines are cancelled. Using JetBrains' tools for a JetBrains plugin just makes sense.

Installing Ktor

My plan for this post was ambitious: implement request cancellation, add debouncing, and maybe even tackle request avoidance logic. But as often happens, reality had other plans.

Let me add Ktor to the build.gradle.kts file. I need several dependencies:

dependencies {
    implementation("org.json:json:20240303")

    // Ktor client for HTTP requests
    implementation("io.ktor:ktor-client-core:3.0.2")
    implementation("io.ktor:ktor-client-cio:3.0.2")
    implementation("io.ktor:ktor-client-content-negotiation:3.0.2")
    implementation("io.ktor:ktor-serialization-kotlinx-json:3.0.2")

    // ... rest of dependencies
}

What each one does:

ktor-client-core: The core client API with suspend functions
ktor-client-cio: The CIO (Coroutine I/O) engine - the actual HTTP implementation
ktor-client-content-negotiation: For automatic JSON serialization/deserialization
ktor-serialization-kotlinx-json: The JSON serializer implementation

I've added the JSON functionality here as I'm handling JSON in my code for an IntelliJ plugin and using what the IDE makers use should not hurt.

I run ./gradlew test to see if everything compiles... and the tests hang. They never finish.

The coroutines conflict

A long digging and debugging time later, I found the problem.

I almost feel unappreciative of the debuggin sessions I've done by saying "I found the problem". AI does help with reading the output of the gradlew dependencies command, but still it's been some time.

IntelliJ Platform bundles its own patched fork of kotlinx-coroutines. This fork includes internal methods that the testing framework needs, like runBlockingWithParallelismCompensation.

When Ktor brings in the standard kotlinx-coroutines library, it conflicts with IntelliJ's patched version, causing tests to hang.

The fix is to exclude the coroutines dependencies from Ktor and let it use IntelliJ's version:

dependencies {
    implementation("org.json:json:20240303")

    // Ktor client for HTTP requests
    // Note: Exclude kotlinx-coroutines from Ktor dependencies to avoid version conflicts.
    // IntelliJ Platform bundles a patched fork of kotlinx-coroutines that includes internal
    // methods required by the testing framework. Using the standard version causes
    // NoSuchMethodError for methods like runBlockingWithParallelismCompensation.
    implementation("io.ktor:ktor-client-core:3.0.2") {
        exclude(group = "org.jetbrains.kotlinx", module = "kotlinx-coroutines-core")
        exclude(group = "org.jetbrains.kotlinx", module = "kotlinx-coroutines-core-jvm")
    }
    implementation("io.ktor:ktor-client-cio:3.0.2") {
        exclude(group = "org.jetbrains.kotlinx", module = "kotlinx-coroutines-core")
        exclude(group = "org.jetbrains.kotlinx", module = "kotlinx-coroutines-core-jvm")
    }
    implementation("io.ktor:ktor-client-content-negotiation:3.0.2") {
        exclude(group = "org.jetbrains.kotlinx", module = "kotlinx-coroutines-core")
        exclude(group = "org.jetbrains.kotlinx", module = "kotlinx-coroutines-core-jvm")
    }
    implementation("io.ktor:ktor-serialization-kotlinx-json:3.0.2") {
        exclude(group = "org.jetbrains.kotlinx", module = "kotlinx-coroutines-core")
        exclude(group = "org.jetbrains.kotlinx", module = "kotlinx-coroutines-core-jvm")
    }

    // ... rest of dependencies
}

In simple terms this tells Gradle, the Composer of Java (I'm a PHP guy after all), that when the ktor package wants the kotlinx-coroutines-core and kotlinx-coroutines-core-jvm package it should not be pulled following the ktor dependency tree, but it should us the one that comes from JetBrains and is embedded in the IDE.

After this change, ./gradlew test completes successfully. The tests pass.

Refactoring InfillHttpClient to use Ktor

Now that Ktor is installed, let me refactor the InfillHttpClient class to use it instead of HttpURLConnection.

The current implementation uses HttpURLConnection with blocking I/O:

fun post(headers: Map<String, String>, body: String): HttpResponse {
    val uri = URI(url).toURL()
    val connection = uri.openConnection() as HttpURLConnection

    // Blocking call: thread freezes here waiting for response.
    val responseCode = connection.responseCode
}

With Ktor, I can make post() a suspend function:

suspend fun post(headers: Map<String, String>, body: String): HttpResponse {
    val response = client.post(url) {
        headers.forEach { (key, value) ->
            header(key, value)
        }
        setBody(body)
    }

    // Non-blocking: coroutine suspends, thread can do other work.
}

This follows the same async pattern I showed earlier with AbortController - suspend functions in Kotlin work like async functions in JavaScript. The execution suspends without blocking the thread, and cancellation is built-in.

After updating the InfillHttpClient to use Ktor, the tests fail because post() is now a suspend function. I need to wrap test calls in runBlocking (which is like making async code run synchronously in tests):

fun testSuccessfulPostRequest() {
    server.respondWith(200, """{"content": "hello world"}""")

    val response = runBlocking {  // Like await for async code run synchronously in JS tests.
        client.post(
            headers = mapOf("Content-Type" to "application/json"),
            body = """{"input_prefix": "test"}"""
        )
    }

    assertTrue(response is HttpResponse.Success)
}

After updating all the tests, ./gradlew test passes. The refactoring is complete.

Capturing the coroutine Job

In Kotlin coroutines, every suspend function runs in a coroutine, and every coroutine has a Job. I can access it through coroutineContext:

suspend fun post(headers: Map<String, String>, body: String): HttpResponse {
    cancelPreviousRequest()

    // Capture the current coroutine's Job
    currentJob = coroutineContext[Job]

    val response = client.post(url) {
        // ...
    }
}

Now when post() is called, it:

Cancels the previous request (if any)
Captures its own Job in currentJob
Makes the HTTP request

If another post() call happens before the first one finishes, it will cancel the first request by calling currentJob?.cancel().

The pattern is similar to the AbortController approach I described earlier. The difference is that in Kotlin coroutines, cancellation is built into the Job - I don't need a separate controller object.

Testing that cancellation works

Now comes the critical part: does it actually work? I updated the test to verify cancellation:

fun testCancelPreviousRequest() {
    // Configure server to respond slowly (2 seconds).
    server.respondWithDelay(2000, 200, """{"content": "slow response"}""")

    // Start a request in a background thread.
    Thread {
        try {
            runBlocking {
                client.post(
                    headers = mapOf("Content-Type" to "application/json"),
                    body = """{"test": "data"}"""
                )
            }
        } catch (e: Exception) {
            // Expected - request was cancelled.
        }
    }.start()

    // Give it time to start.
    Thread.sleep(100)

    // Cancel the request.
    client.cancelPreviousRequest()

    // Test passes if we don't hang waiting for the slow response.
    assertTrue(true)
}

The test works. But it does not look great. If this was PHP code I would look at this trying to find a way to run this test that does not require a hard-coded sleep call, but I'm not (yet) good enough at Kotlin to have the solution. It's good enough.

Running ./gradlew test - the test passes! The request was cancelled without waiting for the 2-second response.

Next steps

With proper cancellation in place, I now have the foundation for implementing debouncing. Instead of firing a request on every keystroke and cancelling the previous ones, I can add a small delay before making the request at all. That's for the next post.

Another thing I've noticed is how the cancellation of a request, even done in the context of a "legitimate" use flow, will throw an exception (the one I'm catching in the tests). Since I'm creating the client to abstract that concern away, I will update the InfillHttpClient::post method to wrap and handle that removing the requirement for the client code to be aware of, and having to handle that.