Async Architecture for AI Agents: What We Learned Mentoring a Capstone Team

Husni Nur Fadillah
Husni Nur Fadillah 6 min read
Async Architecture for AI Agents: What We Learned Mentoring a Capstone Team

So, I got the chance to mentor a team in the Dicoding Asah Program, and honestly, it was pretty interesting from the start.

The team got assigned a capstone project that required building an AI agent system for generating personalized recommendations. Not a simple machine learning model that just predict something. A full-blown AI agent that could reason, retrieve information, and generate recommendations dynamically.

It's a solid project brief, but it comes with complexity. The team was already divided into three sub-teams: Front-end, Back-end and Machine Learning. Which is good-it meant they could work in parallel. 

In this article, I'm sharing how we designed this system and why we made certain choices.

The Challenge

When the team started diving deeper into the project, the Backend team came to me with a bunch 
of specific questions:

These are solid questions. But here's the thing: they were all interconnected. You can't answer 
"what framework should we use?" without first understanding the architecture. You can't talk about 
deployment without knowing how the components communicate.

So instead of answering each question in isolation, I realized I needed to show them the entire 
picture first. Not just "use this framework" or "deploy on this platform," but "here's why this 
architecture makes sense, and here's how each piece fits together."

That's when I decided to walk them through the full system design—starting with the high-level 
architecture, then breaking down each component, and finally showing how all the pieces connect.

The key insight: they needed to move from synchronous, blocking calls to an asynchronous, 
event-driven architecture. Because an AI agent doesn't work like a simple API—it takes time. 
Sometimes a lot of time. And you can't just make the user wait.

That's where NATS and webhooks came in.

System Design Overview

Instead of jumping into framework recommendations or deployment strategies, I showed them  the full system design first. Here's the architecture we landed on:

Now that you see the overall architecture, let's break down each component and what it does.

1. Frontend (React/Vue)

The frontend's job is simple: send the request and wait for the result without blocking the user.

What it does:

2. Backend (Node.js)

The backend is the orchestrator. It's not doing the heavy computation—that's the ML agent's job. 
Instead, it's:

What it does:

Key point: Backend doesn't wait for ML agent. It's fully non-blocking.

3. ML Agent (Python)

This is where the actual AI work happens. The ML agent:
Listens to NATS for incoming tasks

What it does:

Key point: ML agent is completely decoupled. It works at its own pace, doesn't care about 
timeouts or frontend users waiting.

4. NATS (Message Broker)

NATS is the glue that connects backend and ML agent. It's a publish-subscribe system.

What it does:

Key point: NATS decouples backend and ML agent. They don't need to know about each other directly.

5. Database

Stores:

Why needed: Even though NATS carries messages, we need persistence. If backend crashes, 
we still have the task history. If user refreshes, we can retrieve their past recommendations.

The Connection:

Frontend → Backend (HTTP) → NATS (async task) → ML Agent → NATS (result) → Backend (webhook) 
→ Frontend (polling/WebSocket)

Each component is independent. One failing doesn't block the others. ML agent takes 5 minutes? 
User sees "processing" but can still use the app. Backend crashes? NATS has the messages 
in a queue, ready to process when it's back.

Why This Approach?

Why not just have the backend call the ML agent directly and wait for the response?"

Option 1: Synchronous Approach (Simple but Problematic)

Frontend → Backend → Python ML Agent (blocking call) → wait for response → return

Pros:

Cons:

Option 2: Asynchronous Approach with NATS (What We Chose)

Architecture:
Frontend → Backend → NATS → ML Agent (async, non-blocking)
Backend returns immediately → ML Agent works in background → webhook callback → Frontend notified

Pros:

Cons:

Key Take Aways

So, to recap what we discussed with the Backend team:

What does backend need from ML?
→ Just a webhook callback when the recommendation is ready. Everything else is async.

How do we connect Node.js backend with Python ML agent?
→ Through NATS message broker. They don't talk directly, they publish/subscribe messages.

What backend framework should we use?
→ Node.js with Express works fine. Just make sure the endpoints are non-blocking.

How should we structure the backend code?
→ Simple: routes for API, services for business logic, separate module for NATS publishing.

Where do we deploy and how
→ NATS broker on one server, Backend on another, ML agent on another. 
They communicate via message queue, not direct connections. Easier to scale independently.

The bigger lesson:

The team initially thought in terms of "how do I call function X from language Y?"  But the real architecture question is "how do these components communicate when they work  at different speeds and need different resources?"

That's when async, event-driven architecture starts making sense.

For the Backend team specifically: your job isn't to wait for ML. Your job is to orchestrate,  persist, and notify. Let the ML agent work in the background. That's how you build systems  that actually scale.

Share this post