So, I got the chance to mentor a team in the Dicoding Asah Program, and honestly, it was pretty interesting from the start.
The team got assigned a capstone project that required building an AI agent system for generating personalized recommendations. Not a simple machine learning model that just predict something. A full-blown AI agent that could reason, retrieve information, and generate recommendations dynamically.
It's a solid project brief, but it comes with complexity. The team was already divided into three sub-teams: Front-end, Back-end and Machine Learning. Which is good-it meant they could work in parallel.
In this article, I'm sharing how we designed this system and why we made certain choices.
The Challenge
When the team started diving deeper into the project, the Backend team came to me with a bunch
of specific questions:
- "What exactly does the backend need from the ML team?"
- "How do we connect our Node.js backend with a Python AI model?"
- "What backend framework should we actually use?"
- "How should we structure the backend code—routes, controllers, services—so it's maintainable?"
- "Where do we deploy this thing, and how?"
These are solid questions. But here's the thing: they were all interconnected. You can't answer
"what framework should we use?" without first understanding the architecture. You can't talk about
deployment without knowing how the components communicate.
So instead of answering each question in isolation, I realized I needed to show them the entire
picture first. Not just "use this framework" or "deploy on this platform," but "here's why this
architecture makes sense, and here's how each piece fits together."
That's when I decided to walk them through the full system design—starting with the high-level
architecture, then breaking down each component, and finally showing how all the pieces connect.
The key insight: they needed to move from synchronous, blocking calls to an asynchronous,
event-driven architecture. Because an AI agent doesn't work like a simple API—it takes time.
Sometimes a lot of time. And you can't just make the user wait.
That's where NATS and webhooks came in.
System Design Overview
Instead of jumping into framework recommendations or deployment strategies, I showed them the full system design first. Here's the architecture we landed on:
Now that you see the overall architecture, let's break down each component and what it does.
1. Frontend (React/Vue)
The frontend's job is simple: send the request and wait for the result without blocking the user.
What it does:
- User fills in preferences and clicks "Get Recommendation"
- Send request to backend with user data
- Backend returns immediately with a task_id and status "processing"
- Frontend stores this task_id and starts polling or listening via WebSocket
- When status changes to "completed", fetch and display the result
- Key point: Frontend never waits for the AI agent to finish. It gets an immediate response,
shows "processing..." to the user, and updates when ready.
2. Backend (Node.js)
The backend is the orchestrator. It's not doing the heavy computation—that's the ML agent's job.
Instead, it's:
- Receiving requests from the frontend
- Validating them
- Publishing tasks to NATS
- Storing results when the ML agent is done
- Serving results back to the frontend
What it does:
- REST API to receive recommendation requests
- Publish task to NATS message broker
- Return immediately to frontend with task_id
- Receive webhook callback from ML agent when done
- Store result in database
- Serve result back to frontend on request
Key point: Backend doesn't wait for ML agent. It's fully non-blocking.
3. ML Agent (Python)
This is where the actual AI work happens. The ML agent:
Listens to NATS for incoming tasks
- Receives a request from the backend (via NATS)
- Initializes the AI agent framework (LangChain, CrewAI, etc.)
- Calls LLM API to generate recommendations
- Publishes result back to NATS
- Sends webhook to backend to notify completion
What it does:
- Subscribe to NATS topic "recommendation.create"
- Process the task (could take 10 seconds to minutes)
- Call LLM API with user preferences
- Generate recommendation
- Publish result to NATS
- Hit backend webhook to signal completion
Key point: ML agent is completely decoupled. It works at its own pace, doesn't care about
timeouts or frontend users waiting.
4. NATS (Message Broker)
NATS is the glue that connects backend and ML agent. It's a publish-subscribe system.
What it does:
- Backend publishes task → NATS
- ML agent subscribes to task → Gets notified
- ML agent publishes result → NATS
- Backend can optionally subscribe to results
Key point: NATS decouples backend and ML agent. They don't need to know about each other directly.
5. Database
Stores:
- User data
- Task status (processing, completed, failed)
- Recommendation results
- Historical data for analytics
Why needed: Even though NATS carries messages, we need persistence. If backend crashes,
we still have the task history. If user refreshes, we can retrieve their past recommendations.
The Connection:
Frontend → Backend (HTTP) → NATS (async task) → ML Agent → NATS (result) → Backend (webhook)
→ Frontend (polling/WebSocket)
Each component is independent. One failing doesn't block the others. ML agent takes 5 minutes?
User sees "processing" but can still use the app. Backend crashes? NATS has the messages
in a queue, ready to process when it's back.
Why This Approach?
Why not just have the backend call the ML agent directly and wait for the response?"
Option 1: Synchronous Approach (Simple but Problematic)
Frontend → Backend → Python ML Agent (blocking call) → wait for response → return
Pros:
- Simple to understand
- Easier to debug (because everything in order)
- No message broker setup needed
Cons:
- If ML agent is slow, entire request times out
- No scalability (can't handle many concurrent requests)
- Backend can't do anything else while waiting
- Poor UX (loading spinner spinning for minutes)
Option 2: Asynchronous Approach with NATS (What We Chose)
Architecture:
Frontend → Backend → NATS → ML Agent (async, non-blocking)
Backend returns immediately → ML Agent works in background → webhook callback → Frontend notified
Pros:
- User gets immediate response (task_id, "processing")
- UX is smooth (no long wait)
- Scalable (can queue many tasks)
- Resilient (if ML agent crashes, NATS has the message)
- Each component is independent
- Easy to add more workers (run multiple ML agent instances)
Cons:
- More complex setup (NATS, webhooks, polling/WebSocket)
- Harder to debug (async flow is trickier)
Key Take Aways
So, to recap what we discussed with the Backend team:
What does backend need from ML?
→ Just a webhook callback when the recommendation is ready. Everything else is async.
How do we connect Node.js backend with Python ML agent?
→ Through NATS message broker. They don't talk directly, they publish/subscribe messages.
What backend framework should we use?
→ Node.js with Express works fine. Just make sure the endpoints are non-blocking.
How should we structure the backend code?
→ Simple: routes for API, services for business logic, separate module for NATS publishing.
Where do we deploy and how
→ NATS broker on one server, Backend on another, ML agent on another.
They communicate via message queue, not direct connections. Easier to scale independently.
The bigger lesson:
The team initially thought in terms of "how do I call function X from language Y?" But the real architecture question is "how do these components communicate when they work at different speeds and need different resources?"
That's when async, event-driven architecture starts making sense.
For the Backend team specifically: your job isn't to wait for ML. Your job is to orchestrate, persist, and notify. Let the ML agent work in the background. That's how you build systems that actually scale.