~Husni~

So, I got the chance to mentor a team in the Dicoding Asah Program, and honestly, it was pretty interesting from the start.

The team got assigned a capstone project that required building an AI agent system for generating personalized recommendations. Not a simple machine learning model that just predict something. A full-blown AI agent that could reason, retrieve information, and generate recommendations dynamically.

It's a solid project brief, but it comes with complexity. The team was already divided into three sub-teams: Front-end, Back-end and Machine Learning. Which is good-it meant they could work in parallel.

In this article, I'm sharing how we designed this system and why we made certain choices.

The Challenge

When the team started diving deeper into the project, the Backend team came to me with a bunch
of specific questions:

"What exactly does the backend need from the ML team?"
"How do we connect our Node.js backend with a Python AI model?"
"What backend framework should we actually use?"
"How should we structure the backend code—routes, controllers, services—so it's maintainable?"
"Where do we deploy this thing, and how?"

These are solid questions. But here's the thing: they were all interconnected. You can't answer
"what framework should we use?" without first understanding the architecture. You can't talk about
deployment without knowing how the components communicate.

So instead of answering each question in isolation, I realized I needed to show them the entire
picture first. Not just "use this framework" or "deploy on this platform," but "here's why this
architecture makes sense, and here's how each piece fits together."

That's when I decided to walk them through the full system design—starting with the high-level
architecture, then breaking down each component, and finally showing how all the pieces connect.

The key insight: they needed to move from synchronous, blocking calls to an asynchronous,
event-driven architecture. Because an AI agent doesn't work like a simple API—it takes time.
Sometimes a lot of time. And you can't just make the user wait.

That's where NATS and webhooks came in.

System Design Overview

Instead of jumping into framework recommendations or deployment strategies, I showed them the full system design first. Here's the architecture we landed on:

Now that you see the overall architecture, let's break down each component and what it does.

1. Frontend (React/Vue)

The frontend's job is simple: send the request and wait for the result without blocking the user.

What it does:

User fills in preferences and clicks "Get Recommendation"
Send request to backend with user data
Backend returns immediately with a task_id and status "processing"
Frontend stores this task_id and starts polling or listening via WebSocket
When status changes to "completed", fetch and display the result
Key point: Frontend never waits for the AI agent to finish. It gets an immediate response,
shows "processing..." to the user, and updates when ready.

2. Backend (Node.js)

The backend is the orchestrator. It's not doing the heavy computation—that's the ML agent's job.
Instead, it's:

Receiving requests from the frontend
Validating them
Publishing tasks to NATS
Storing results when the ML agent is done
Serving results back to the frontend

What it does:

REST API to receive recommendation requests
Publish task to NATS message broker
Return immediately to frontend with task_id
Receive webhook callback from ML agent when done
Store result in database
Serve result back to frontend on request

Key point: Backend doesn't wait for ML agent. It's fully non-blocking.

3. ML Agent (Python)

This is where the actual AI work happens. The ML agent:
Listens to NATS for incoming tasks

Receives a request from the backend (via NATS)
Initializes the AI agent framework (LangChain, CrewAI, etc.)
Calls LLM API to generate recommendations
Publishes result back to NATS
Sends webhook to backend to notify completion

What it does:

Subscribe to NATS topic "recommendation.create"
Process the task (could take 10 seconds to minutes)
Call LLM API with user preferences
Generate recommendation
Publish result to NATS
Hit backend webhook to signal completion

Key point: ML agent is completely decoupled. It works at its own pace, doesn't care about
timeouts or frontend users waiting.

4. NATS (Message Broker)

NATS is the glue that connects backend and ML agent. It's a publish-subscribe system.

What it does:

Backend publishes task → NATS
ML agent subscribes to task → Gets notified
ML agent publishes result → NATS
Backend can optionally subscribe to results

Key point: NATS decouples backend and ML agent. They don't need to know about each other directly.

5. Database

Stores:

User data
Task status (processing, completed, failed)
Recommendation results
Historical data for analytics

Why needed: Even though NATS carries messages, we need persistence. If backend crashes,
we still have the task history. If user refreshes, we can retrieve their past recommendations.

The Connection:

Frontend → Backend (HTTP) → NATS (async task) → ML Agent → NATS (result) → Backend (webhook)
→ Frontend (polling/WebSocket)

Each component is independent. One failing doesn't block the others. ML agent takes 5 minutes?
User sees "processing" but can still use the app. Backend crashes? NATS has the messages
in a queue, ready to process when it's back.

Why This Approach?

Why not just have the backend call the ML agent directly and wait for the response?"

Option 1: Synchronous Approach (Simple but Problematic)

Frontend → Backend → Python ML Agent (blocking call) → wait for response → return

Pros:

Simple to understand
Easier to debug (because everything in order)
No message broker setup needed

Cons:

If ML agent is slow, entire request times out
No scalability (can't handle many concurrent requests)
Backend can't do anything else while waiting
Poor UX (loading spinner spinning for minutes)

Option 2: Asynchronous Approach with NATS (What We Chose)

Architecture:
Frontend → Backend → NATS → ML Agent (async, non-blocking)
Backend returns immediately → ML Agent works in background → webhook callback → Frontend notified

Pros:

User gets immediate response (task_id, "processing")
UX is smooth (no long wait)
Scalable (can queue many tasks)
Resilient (if ML agent crashes, NATS has the message)
Each component is independent
Easy to add more workers (run multiple ML agent instances)

Cons:

More complex setup (NATS, webhooks, polling/WebSocket)
Harder to debug (async flow is trickier)

Key Take Aways

So, to recap what we discussed with the Backend team:

What does backend need from ML?
→ Just a webhook callback when the recommendation is ready. Everything else is async.

How do we connect Node.js backend with Python ML agent?
→ Through NATS message broker. They don't talk directly, they publish/subscribe messages.

What backend framework should we use?
→ Node.js with Express works fine. Just make sure the endpoints are non-blocking.

How should we structure the backend code?
→ Simple: routes for API, services for business logic, separate module for NATS publishing.

Where do we deploy and how
→ NATS broker on one server, Backend on another, ML agent on another.
They communicate via message queue, not direct connections. Easier to scale independently.

The bigger lesson:

The team initially thought in terms of "how do I call function X from language Y?" But the real architecture question is "how do these components communicate when they work at different speeds and need different resources?"

That's when async, event-driven architecture starts making sense.

For the Backend team specifically: your job isn't to wait for ML. Your job is to orchestrate, persist, and notify. Let the ML agent work in the background. That's how you build systems that actually scale.

The Challenge

System Design Overview

1. Frontend (React/Vue)

2. Backend (Node.js)

3. ML Agent (Python)

4. NATS (Message Broker)

5. Database

The Connection:

Why This Approach?

Option 1: Synchronous Approach (Simple but Problematic)

Option 2: Asynchronous Approach with NATS (What We Chose)

Key Take Aways

Share this post