Ode to Matrices

I stopped listing specific AI tools per project because it became impossible to keep accurate without turning it into noise – so I use the "GenAI" tag instead, as shorthand for my whole workflow.

I started smiling as soon as I began writing this. Less than a year ago, I was writing the script for the practical part of my thesis and choosing which LLMs to include in a simple benchmark. Even before I wrote a single line of code for the script that ran the benchmark, I knew the results wouldn’t stay relevant for long – a few months, maybe. I was wrong. They were outdated even before I submitted the work.

The same thing happened on this website. I used to list specific GenAI tools next to each project – tags like "Cursor • Codex" – because I wanted to capture exactly what I used: which IDE, which extension, which CLI. But lately I'm constantly trying new workflows, revisiting old projects, and re-running the same work with better tools. Not just as a hobby, but because it makes an actual difference.

I think this is a good place to mention one unspoken phenomenon I’ve noticed lately among developers (especially those with years of experience). I felt it too. It’s a strange feeling when, in a fraction of the time spent prompting, you improve or polish an old project you worked on for weeks or months in the pre-AI era, manually. But for me, that fear is clearly outweighed by curiosity. It feels more interesting to think about what can actually be achieved with these tools than to get stuck in rumination that “years of studying and working were useless,” and to resist the idea that this could, in fact, help.

If I tried to keep those tool lists accurate across all projects, I'd end up listing everything – and it would turn into noise. So from now on, I just use the "GenAI" tag.

The stack

Going into 2026, my IDE of choice is Cursor. I experimented with Windsurf for a few weeks, but it now feels largely stagnant compared to Cursor – slower updates and almost no visible communication. VS Code is immortal – the backbone of most AI-powered IDEs, and I still go back and forth between Cursor and VS Code depending on what I'm doing.

IDE of the year: Cursor.

On the "tools from the companies that build the models" side, Claude Code won me over. I still use Codex too, and for certain tasks (like planning) it still beats everything else. Although I have to mention that Cursor built its first in-house model, "Composer." For a while, it was great for specific workflows because it was fast. But with new models from the big labs coming out nonstop, I stopped using Composer about a month after its release.

Coding agent of the year: Claude Code.

When it comes to the models themselves, I genuinely think Opus 4.5 (Thinking) and GPT 5.2 Codex Extra High can change the minds of a lot of AI skeptics. I prefer fast iteration over spending minutes writing prompts and then waiting minutes for replies, over and over. Opus 4.5 fits that workflow better.

Model of the year: Opus 4.5 (Thinking).

I also think it’s important to test models beyond the frontier labs and keep an eye on open-source solutions. DeepSeek offers incredible value, and I’ve had good results with their models. Kimi K2 Thinking (Moonshot AI) has been solid too, and GLM-4.7 (Z.ai) is next on my list after hearing good reviews. I’ll keep testing lesser-known tools and non-frontier models for a simple reason: the landscape moves faster than any fixed list of "recommended" options. Some people and companies also have real constraints about where their data can go, which changes what you can realistically use – but if you're okay sharing a few thousand lines of code from a side project, it's a no-brainer to try these models just to see what you get.

The mindstack

This has always been true, but AI makes it even harder to ignore: curiosity gives you an edge. Most people have the same tools – not everyone has the urge to explore them.

One feature I keep coming back to is Deep Research. It hasn’t even been a year since the first implementation of this feature appeared in GenAI tools. Besides ChatGPT’s implementation, I also tried Deep Research in Google’s Gemini and it’s just as good – it’s worth running the same prompts in both tools in parallel. The first company to ship this feature was OpenAI. In the very month it launched – February (2025!) – I was giving mini-lectures about it to everyone around me, and I want to underline just how brutally underappreciated this feature is. Instead of an abstract description, I’d rather mention concrete use cases I can think of off the top of my head.

If you’re writing any kind of work that requires hundreds of hours of research, Deep Research can significantly catalyze your work by collecting important information into one place – work you would otherwise have to do manually.
While working on the Easy WebP software, I needed a better understanding of how the technical pipeline of converting images to WebP works. Instead of wading through documentation, I could send into Deep Research the lines of code in my software that are responsible for the conversion, what I don’t understand, and what my options are – and instead of doing several hours of manual work, wait a few tens of minutes and get an answer.
I would be very careful with this use case, but I tried writing a large prompt about what options I have to build a certain type of business. I specified, for example: how big is the warehouse I have available? how big is the initial investment I’m willing to put into the business? what is the target market I want to operate in? In a single response, I got several proposals based on the sources it browsed. After a fairly thorough manual check of all the ideas, one of them was surprisingly feasible.

I think that, in general, it’s necessary to think about all GenAI tools as catalyzing – until recently – manual work, with the fact that most of the output needs to be checked manually. Deep Research is, for me personally, the most underestimated feature across GenAI tools in general. With the right input, you can get genuinely valuable outputs.

Feature of the year: Deep Research.

My GenAI workflow in this post ends with image generation. Just like with coding, you get a big advantage if you can do a clean manual pass afterwards – fix small details, adjust composition, and polish the output into something you’d actually ship. I’m glad I picked up design tools early (I used to mess around in Photoshop as a kid), and later I ended up following Serif and switching to their Affinity apps. These days I mostly use Affinity Photo for raster work and Affinity Designer for curves, and paired with Nano Banana Pro it unlocks another set of doors.

Personally, this was the first model where, after a clear prompt, the generated image landed very close to what I had in my head. As a practical use case for almost anyone, I’d recommend photo upscaling – for example, a headshot. Until now, image models had this problem: if you asked for a clean resolution boost, they would change facial features so much you were barely recognizable. With this model, that struggle ends.

Image model of the year: Nano Banana Pro.