It runs on my machine 08

Over the last few posts (first post, last post) I've been trying to port features from the llama.vim to an IntelliJ plugin called "completamente" (a word-play on the Italian translation of "completely" that, literally, means "complete mind").

It was not as easy as I had expected.

As the saying goes: "third time is the charm". Well: it took me four rewrites to get it right enough to have something working as expected.

Theory building and disposable code

Amazed, as many others are, at the powers of LLMs, I've first tried a vim/neovim to Kotlin (the language used in the Intellij platform) translation. The code was building correctly and running, but there would always be another issue making it, in practice, unusable.

The second and third rewrite (the supposedly charmed one) did not go much better. What went wrong?
The better answer I can provide is from this article that will then link to this paper. In essence, at least this is my takeaway, programming is about "building a theory" of the program under construction; a knowledge of the desired behaviour, constraints and boundaries.

Any translation I attempted was a "mechanical" act of sorts where the LLM (Claude Code in my case) would write a correct translation of the original plugin with little to no understanding of the starting and ending constraints and boundaries.

Until I took the time to understand those in the context of the starting code (the llama.vim plugin) and the destination code (my IntelliJ plugin), any further attempt was bound to fail.

In this phase of understanding and theory building I've used what I think is one of the most powerful features of LLMs: the ability to create a theory in my mind about something and then create some disposable code, a testing harness, to check out whether my theory is correct or wrong.
While the temptation would be to ask the LLM to explain the code, my experience is that it will provide incorrect credible answers even when getting it completely wrong. The "you're absolutely right" meme exists for a reason and it happens frequently enough to undermine my confidence in that system for anything more complex than function bodies.
In a concurrent and asynchronous context like that of the plugin I'm building, I prefer spending my time testing my wrong theories with exact tools rather than reviewing maybe incorrect theories with no tools.

An example tool is one I've created to help me understand how the original context-building function works (fim_ctx_local) and what its outputs are given some inputs.
The tool is composed of a bash script that will run a vim script that will make requests to a python mock server that will log the results in a json file.
The files used in the vim script tests (an empty one and a large one) are the same that will be used by the plugin tests to ensure the kotlin translation of the code is correct and works correctly.

The files make for a moderately interesting read, but the part that is valuable for me is how quickly I could validate theories about what the original plugin was doing without getting bogged down in the vim plugin syntax.

Coroutines and their cancellations

The first part of the theory I was missing was about how the IntelliJ platform uses a suggestion provider like the plugin I've developed:

  • it calls the InlineCompletionProvider::getSuggestion() method in a coroutine
  • if the user waits for the completion provider to come up with a suggestion the coroutine will complete
  • else the coroutine will be canceled when the user types a new character

Understanding this, again using epochs of disposable code and the mock server, allowed me to understand what is cancelled and when.

I will stop here since my understanding of Kotlin and the IntelliJ platform is limited and I do not want to pretend to know more.

The takeaway is that coroutines have scopes and any task started in a coroutine scope would be canceled when, and if, the coroutine is canceled.

The need for a different scope to run the background jobs into (extra input processing, fim requests, caching et cetera) required understanding a second part of how the IntelliJ platform works.

Services everywhere

Most of the platform hooks and extension points require the registration of a "service".

The main service of my plugin is a InlineCompletionProvider service:

package com.github.lucatume.completamente.services

class Completion() : InlineCompletionProvider {}

That is registered in the plugin configuration file:

<!-- Plugin Configuration File. Read more: https://plugins.jetbrains.com/docs/intellij/plugin-configuration-file.html -->
<idea-plugin>
    <id>com.github.lucatume.completamente</id>
    <name>completamente</name>
    <vendor>lucatume</vendor>

    <depends>com.intellij.modules.platform</depends>

    <extensions defaultExtensionNs="com.intellij">
        <inline.completion.provider implementation="com.github.lucatume.completamente.services.Completion"/>
        <postStartupActivity implementation="com.github.lucatume.completamente.startup.CompletamenteStartupActivity"/>
    </extensions>
</idea-plugin>

As a PHP developer I'm reminded of the contribution of the Java ecosystem (Kotlin is based on Java) to the PHP one anytime I use a "service locator" or a "service provider".

The second service, the CompletamenteStartupActivity one, is the one that will bootstrap all the other services that need to connect to other entry points for the plugin operations (monitor cursor movement, file operations et cetera).

Notable is the BackgroundJobs one:

package com.github.lucatume.completamente.services

@Service(Service.Level.PROJECT)
class BackgroundJobs : CoroutineScope, Disposable {}

Its only responsibility is providing and managing a coroutine scope to other services.

An example use in the getSuggestionPure() function implementation:

suspend fun getSuggestionPure(
    services: Services,
    request: InlineCompletionRequest,
    prev: List<String>?,
    indentLast: Int,
    lastFile: String?,
    lastLine: Int?
): SuggestionResult {
    val currentFile = request.editor.virtualFile.canonicalPath
    val currentLine = request.editor.caretModel.logicalPosition.line
    val extraContext = services.chunksRingBuffer.getRingChunks()

    // [...]

    // Create a channel for coroutines to communicate with the cache
    val channel = Channel<String>()
    services.backgroundJobs.runWithDebounce({
        val llmSuggestion = fim(
            localContext,
            extraContext,
            services.settings,
            services.cache,
            services.httpClient
        )

        // Send the completion back to the main thread.
        channel.send(llmSuggestion ?: "")
    }, 100)

    val suggestion: String = channel.receive()

    val rendered = fimRender(localContext, suggestion, request)
    
    // [...]

    // This is the closest we can get to the llama.vim code:
    //  if s:hint_shown
    //      call llama#fim(l:pos_x, l:pos_y, v:true, s:fim_data['content'], v:true)
    //  endif
    // The suggestion is going to be shown to the user: start a speculative request for the code as if the user
    // had accepted the suggestion.
    services.backgroundJobs.runWithDebounce({
        val speculativeContext = buildLocalContext(
            request = request,
            settings = services.settings,
            prev = updatedPrev,
            indentLast = localContext.indent
        )

        // Start a speculative request to get the completion as if the user had accepted the suggestion.
        fim(
            speculativeContext,
            extraContext,
            services.settings,
            services.cache,
            services.httpClient
        )
    }, 100)

    return SuggestionResult(
        StringSuggestion(rendered ?: ""),
        updatedPrev,
        updatedIndentLast
    )
}

Functional(ish) core, imperative shell

I was inspired by this article to try and write the plugin using a "functional core, imperative shell".
I had to make concessions to the service-based architecture of the IntelliJ platform, but am pretty satisfied with how the code turned out.

An example of the imperative shell is the main completion service:

package com.github.lucatume.completamente.services

class Completion() : InlineCompletionProvider {
    override suspend fun getSuggestion(request: InlineCompletionRequest): InlineCompletionSuggestion {
        val project: Project = request.editor.project ?: return StringSuggestion("")

        val services = Services(
            settings = ApplicationManager.getApplication().getService(Settings::class.java),
            cache = project.service<SuggestionCache>(),
            chunksRingBuffer = project.service<ChunksRingBuffer>(),
            backgroundJobs = project.service<BackgroundJobs>(),
            httpClient = project.service<HttpClient>().getHttpClient()
        )

        val suggestionResult: SuggestionResult = getSuggestionPure(
            services,
            request,
            prev,
            indentLast,
            lastFile,
            lastLine
        )

        lastFile = request.editor.virtualFile.canonicalPath
        lastLine = request.editor.caretModel.logicalPosition.line

        prev = suggestionResult.prev
        indentLast = suggestionResult.indentLast

        return suggestionResult.suggestion
    }
}

The service will then call the getSuggestionPure function passing it the required services.

Is the getSuggestionPure() function "pure", then? Hell, no. There are file reads, cache lookups and request to the LLM server, but all its dependencies are injected and they can be easily mocked in tests. Personally that is all I care about: a function is "pure enough" to be tested easily.

Closing thoughts

This was a fun and useful project.

I knew nothing about Kotlin before and I know little now, but at least have a better understanding of how I can extend the family of IDEs I work with everyday (the IntelliJ ones).

I now have a copilot solution that runs locally and that, on my pretty beefy Mac, can provide inline suggestions based on the Qwen 30b model with very low latency. For someone using "airplane mode" all too often, this is a boon.

Finally, it was a good lesson in architecture understanding and planning and how an LLM can help in that phase.
The need to write pure (in the functional sense of the word) code in ecosystems more and more integrated with non-deterministic LLMs (same input, different output) has forced a number of architectural decisions it was fun to make and reflect on.

I've pushed the code online in all its imperfect glory, no support for it and no plan to release it on the IntelliJ marketplace yet, but who knows.
Once I'm confident I've squashed all the major issues I will give it some thought.