From Next.js Monolith to Event-Driven Architecture: Why We Started and What We Built

Part 1 of the IHA Migration Series — documenting the move from a Next.js monolith to a distributed, event-driven architecture using RabbitMQ, and the three pain points that made it necessary.

This is my first post in a short series documenting the migration of It Happened Again (IHA) from a Next.js monolith to a distributed, event-driven architecture. This series documents what was built, what broke, and the things I have learned.


What Is IHA?

IHA is a community platform for tracking recurring events - things that keep happening again and again. Users submit and verify occurrences, leave comments, add tags, earn badges, and get notified when patterns they follow are updated. You can think of it as a structured, community-verified record of things that keep repeating themselves.

The backend handles a few distinct workloads:

  • Write path: comments, occurrences, reactions, verifications - transactional, user-facing, latency-sensitive.
  • Side effects: search indexing, notifications, badge awarding, visit aggregation - can tolerate latency, must not block writes.
  • Read path: record pages, occurrence timelines, user profiles - heavily cached, served via RSC and Redis.
  • Real-time: SSE streams for live notification delivery to connected users.

For the last few months, all of this lived in a single Next.js application. That made sense in the MVP stage, because it was all in one place. As one can imagine, as the codebase grew, each change started touching too many moving parts.


The Monolith

At the beginning, the backend was simple and clean. A few API routes, a handful of services, a PostgreSQL database, Redis for caching and queues, Meilisearch for full-text search. Standard Next.js setup with some discipline around layering: Presentation -> Application -> Service/Domain -> Infrastructure, dependencies pointing inward.

Then features piled up and refactors became routine, and the monolith started showing cracks.

By early 2026, the numbers looked like this:

  • Over 130 API routes under api/
  • 70+ services registered in ServiceRegistry
  • 4 infrastructure dependencies: PostgreSQL (via Drizzle ORM), Redis (caching + BullMQ queues), Meilisearch, and our newest addition, RabbitMQ
  • A ServiceRegistry with async initialization, health checks, bidirectional dependency wiring, and HMR-safe global state management

The ServiceRegistry itself became a significant chunk of infrastructure. It handles registration order, async factory functions, timeout protection on initializeAll(), HMR re-registration detection via globalThis.__servicesRegistered, and graceful degradation in production when a service fails to initialize.

None of that came from bad engineering, but from solving real problems inside one process. At some point, that same complexity became a sign that those parts should be separated.


The Three Pain Points Problem

1. Tight Coupling Between Write and Side-Effect Logic

The clearest example is CommentService.createComment(). Before the migration, the method that saved a comment to the database also called SearchService.indexDocumentNow() and NotificationService.sendCommentNotifications() synchronously before returning.

What it meant in practice:

  • A sneaky Meilisearch timeout causes the comment creation endpoint to return a 500.
  • A notification delivery failure rolls back a successful write.
  • Adding a new side effect (say, awarding a badge for a first comment) requires modifying CommentService and adding another service dependency.

The issue was not code quality, but the boundary between services. Saving a comment and indexing a comment have different failure and latency requirements. When they are coupled, the write path inherits failures from side effects.

2. SSE Scaling

Our NotificationService maintains live SSE connections in an in-process data structure. And this works perfectly for a single instance. But when you start to run two instances - for a deploy, for load, for anything - the connections are split across processes. A notification triggered in instance A cannot reach a user connected to instance B. Simple as that.

The usual fix is Redis pub/sub as a connection broker. I could have bolted that into the monolith, but that would add even more state and coupling to an already busy service. A cleaner approach was to move notification delivery into its own process so it owns SSE connections directly and consumes events.

3. Deploy Coupling

As with the previous point, every backend change - a new field in payload, a fix to a background queue handler, a tweak to notification logic - triggers a full Next.js rebuild. Turbopack could help in development, but the production build is still a monolithic artifact. You cannot deploy just the notification logic without redeploying the entire application.

When notification delivery, search indexing, badge jobs, and the API evolve at different speeds, that coupling starts to release friction.


Why RabbitMQ Instead of Something Simpler?

Good question: IHA already had BullMQ + Redis in the stack. And don’t get me wrong, BullMQ is excellent, the app uses it for scheduled jobs, retry queues, and batch processing. The actual question was whether to extend BullMQ for all event-driven side effects or introduce RabbitMQ.

The case for BullMQ alone:

  • Already in the stack, no new infrastructure
  • BullBoard can be used for queue inspection
  • First-class TypeScript support
  • Solid dead letter queue semantics via removeOnFail

The case for RabbitMQ:

  • Topic exchanges with routing key patterns: a single events exchange can fan-out comment.created to both search-indexer-comment and notifications-comment queues simultaneously, without the publisher knowing about either consumer (BullMQ queues are point-to-point by design).
  • Consumer isolation: each consumer service declares its own durable queue bound to the exchange. Adding a new consumer (let’s say, a moderation service that wants to review new comments) requires no changes to the publisher.
  • Cross-process fan-out: when the notification service and search service become separate processes, they each connect to RabbitMQ and bind their own queues. The publisher (EventBusService.emit()) doesn’t change.
  • Protocol-level durability: messages survive broker restarts when declared persistent. RabbitMQ AMQP acknowledgement semantics give us finer control over retry behavior than polling a BullMQ queue.

The answer: use BullMQ for scheduled and batch work (badge processing, visit aggregation, data retention), and RabbitMQ for domain events that need fan-out to multiple consumers.

BullMQ is here to stay as both can coexist. They solve different problems.


The RabbitMQ POC: What Was Built

The proof-of-concept had basically three goals:

  1. Prove the routing topology works (topic exchange + multiple queue bindings per event)
  2. Prove that CommentService.createComment() can emit an event replacing calling services directly
  3. Prove the consumer can reconstruct the required side effects from the event payload

The Interface

As usually, I had to start with an interface so the EventBusService is not coupled to RabbitMQ:

The interface defines four methods: connect(), close(), publish(), and subscribe(). This means we can swap RabbitMQ for any other broker - or for a no-op stub when the feature flag is disabled. The publish method returns a Promise<boolean> (that can be described as “false signals backpressure”, as I have learned some time ago), and subscribe is generic over the message type T.

The Event Names and Schemas

Events follow a {entity}.{action} routing key convention. All constants live in src/lib/events/constants.ts:

EVENT_NAMES contains COMMENT_CREATED mapped to the string literal comment.created, COMMENT_UPDATED to comment.updated, RECORD_CREATED to record.created, OCCURRENCE_VERIFIED to occurrence.verified, and so on.

QUEUES maps consumer identifiers to their durable queue names: SEARCH_INDEXER_COMMENT maps to search-indexer-comment, NOTIFICATIONS_COMMENT to notifications-comment, and so on for each consumer-purpose pair.

Payloads are validated with Zod schemas at consumer boundaries. CommentCreatedPayloadSchema requires a commentId as UUID, a userId as string, an occurrenceId as string, a content string, and an optional parentId UUID.

The EventBusService

The new EventBusService is a thin orchestration layer over IMessageBrokerProvider. Its two public methods are emit() and on(). It has message buffering built in: emit() calls before initialize() completes are queued in pendingMessages and flushed once connected, rather than failing silently. The isReady flag guards all publish paths.

The Emit Site

In CommentService, the change from direct calls to event emission reduces the method to its core function: persist the comment, then emit an event. The event payload carries just enough data for consumers to fetch what they need - the commentId, userId, occurrenceId, content, and optional parentId (for nested comments).

Before: createComment() awaited both searchService.indexDocumentNow() and notificationService.sendCommentNotifications() before returning. A failure in either blocked or failed the entire operation.

After: createComment() awaits eventBus.emit(EVENT_NAMES.COMMENT_CREATED, payload) and returns. Search indexing and notification delivery happen asynchronously in consumer handlers. The comment creation endpoint succeeds or fails on its own merits.

The Consumer

CommentCreatedConsumer registers two queue bindings for the same event - one for search indexing, and one for notifications. Both subscriptions use eventBus.on() with the event name EVENT_NAMES.COMMENT_CREATED, but different queue names (QUEUES.SEARCH_INDEXER_COMMENT and QUEUES.NOTIFICATIONS_COMMENT).

RabbitMQ delivers a copy of each published message to both queues. The two handlers then execute independently. A failure in handleIndexing does not affect handleNotifications, and vice versa.

handleIndexing fetches the full comment object via commentService.getComment(commentId) - the event payload carries just the ID, not the full object to keep payloads small and avoid serialization of potentially stale data. Then it calls searchService.indexDocumentNow(). If this throws, the error propagates and RabbitMQProvider NACKs the message, triggering the DLX retry pattern.

handleNotifications delegates to CommentNotificationHelper, which holds the complex notification rules (notify occurrence subscribers, notify parent comment authors, skip the author of the triggering comment, and more). This helper existed before the event bus - the consumer just instantiates it with its dependencies rather than duplicating the logic.


POC: The Good

The core design did hold up. The topic exchange fan-out worked as expected (emitting comment.created once delivers a copy to both search-indexer-comment and notifications-comment). Adding a third consumer (e.g. moderation review queue) requires zero changes to the publisher. Which is one of the main reasons to use events.

The IMessageBrokerProvider interface paid off immediately. For unit tests and for the ENABLE_EVENT_BUS=false code path, I registered a no-op stub with connect, close, publish, and subscribe all returning immediately (or returning false for publish). No RabbitMQ process is needed for tests or for environments where the event bus is disabled.

The CommentNotificationHelper was reused as-is. The consumer delegates complex notification rules to the same helper class that the synchronous path used, rather than just duplicating logic. The helper is instantiated with its dependencies passed as constructor arguments, and it is straightforward to test in isolation.

Typing was meant to be solid, so CommentCreatedPayloadSchema provides runtime validation at consumer entry points, and the inferred TypeScript type CommentCreatedPayload gives compile-time safety on the payload fields. EVENT_NAMES and QUEUES as const objects prevent string literal typos at emit and subscribe sites.


POC: The Bad - Three Bugs That Had To Be Fixed

The initial POC had a few defects. These three were the most interesting and the most useful to fix early.

Bug 1: Single Channel Shared Between Publisher and Consumer

The original RabbitMQProvider used one AMQP channel for both publish() and subscribe(). That looked fine in simple tests, but it caused trouble under higher artificial load. AMQP flow control is channel-scoped, so slow consumers could backpressure the same channel used for publishing.

The fix was to split responsibilities: publishChannel for publish(), consumeChannel for subscribe(). Now slow consumption does not stall publishing. I also run setupDLXInfrastructure() on publishChannel, since exchange assertions can run on any channel.

Bug 2: No Dead Letter Exchange

The first POC had no proper retry or dead-lettering logic. If a handler threw (for example when Meilisearch was temporarily unavailable), messages could just bounce in unhelpful ways and failed payloads were hard to inspect.

I implemented a Dead Letter Exchange (DLX) pattern. Now the infrastructure uses three exchanges and three queue variants per every consumer:

  • events.dlx: the dead letter exchange. Messages go here when NACKed with requeue=false.
  • events.retry: a retry exchange. Failed messages are published here with a TTL and then expire back to the main events exchange via x-dead-letter-exchange (on the retry queue).
  • Per-queue DLQ (e.g. search-indexer-comment.dlq): the permanent dead letter queue for messages that have exhausted all retries.

The retry counter lives in the x-retries message header. On each failed delivery, the handler increments the counter and republishes to the retry queue. After MAX_RETRY_COUNT (3) failures, the message is NACKed with requeue=false and lands in the DLQ, where it can be inspected and replayed from the RabbitMQ Management UI.

RETRY_DELAY_MS is set to 5000 (5 seconds). It is simple and predictable for now. Exponential backoff can be added later.

Bug 3: Consumer Registration Timing

The first draft of initializeServices() registered consumers before serviceRegistry.initializeAll(). CommentCreatedConsumer called registerConsumer(), which resolved EventBusService before its async initialization finished. Subscription calls then ran against a bus that was not connected yet.

We fixed this by deferring consumer registration until after initializeAll() completes. The initializeServices() bootstrap sequence now uses this order:

  1. registerProviders(), registerEventBusServices(), registerCoreServices(), and all other registration calls. These only register service factories with the ServiceRegistry - they do not instantiate or connect anything.
  2. serviceRegistry.initializeAll(). This one awaits each factory in registration order, including the async EventBusService factory that establishes the RabbitMQ connection.
  3. setupBidirectionalDependencies(). Wires up circular references between services that cannot be expressed as constructor dependencies.
  4. initializeRabbitMQConsumers(), guarded by the ENABLE_EVENT_BUS flag. By this point, EventBusService is guaranteed to be connected and ready. Consumers then can safely call eventBus.on() and the broker will accept the subscription.

The ENABLE_EVENT_BUS Feature Flag

Shipping a fundamental change to how side effects are triggered requires a way to turn it off instantly if something goes wrong. The ENABLE_EVENT_BUS environment variable controls whether EventBusService is backed by a real RabbitMQ connection or a no-op stub.

When ENABLE_EVENT_BUS is not set or is false:

  • No RabbitMQ connection is attempted at startup.
  • EventBusService is registered with a no-op provider that implements the full IMessageBrokerProvider interface but in practice does nothing: connect() resolves immediately, publish() returns false, subscribe() resolves immediately.
  • emit() calls in application code do not throw and do not block.
  • CommentService can safely call getService(SERVICE_NAMES.EVENT_BUS_SERVICE) and emit events that won’t be delivered.
  • The legacy synchronous side-effect path remains in place as a fallback.

When ENABLE_EVENT_BUS=true:

  • RabbitMQProvider connects during initializeAll().
  • EventBusService.initialize() awaits the connection before returning.
  • CommentCreatedConsumer registers its queue bindings against the live broker.
  • createComment() emits events instead of calling services directly.
  • The synchronous fallback is bypassed.

The no-op path creates a stub that fully satisfies the IMessageBrokerProvider interface contract. The real path creates a RabbitMQProvider, wraps it in EventBusService, and awaits the connection inside an async factory so ServiceRegistry only marks the service ready once the broker is connected.

This means the migration can be shipped behind the flag, enabled in staging, load tested, validated, and enabled in production without a code change. A rollback is a single environment variable update and a process restart.


Architecture Diagrams

Monolith Architecture (Before)

The current state with pain points annotated:

graph TB
    subgraph "Browser"
        Client[React Client]
    end
    
    subgraph "Next.js Monolith (iha-web)"
        AppRouter[App Router / RSC]
        APIRoutes["~130 API Routes"]
        ServiceRegistry["ServiceRegistry<br/>(70+ services)"]
        
        subgraph "Services"
            CommentSvc[CommentService]
            RecordSvc[RecordService]
            OccurrenceSvc[OccurrenceService]
            SearchSvc[SearchService]
            NotifSvc[NotificationService]
            BadgeSvc[BadgeService]
        end
        
        subgraph "Providers"
            Drizzle[Drizzle ORM]
            Redis[Redis Client]
            Meili[MeiliSearch Client]
        end
    end
    
    subgraph "Infrastructure"
        Postgres[(PostgreSQL)]
        RedisDB[(Redis)]
        MeiliDB[(MeiliSearch)]
    end
    
    Client --> AppRouter
    Client --> APIRoutes
    AppRouter --> ServiceRegistry
    APIRoutes --> ServiceRegistry
    ServiceRegistry --> CommentSvc
    ServiceRegistry --> RecordSvc
    CommentSvc --> SearchSvc
    CommentSvc --> NotifSvc
    RecordSvc --> SearchSvc
    RecordSvc --> BadgeSvc
    Drizzle --> Postgres
    Redis --> RedisDB
    Meili --> MeiliDB

Target Event-Driven Architecture

The target state with all extracted services:

graph TB
    subgraph "Clients"
        Browser[Browser]
    end
    
    subgraph "CDN / Proxy"
        CDN[Cloudflare / Traefik]
    end
    
    subgraph "apps/web - Next.js BFF"
        SSR[SSR + RSC Pages]
        BFF[BFF Proxy /api/*]
        Auth[Better Auth<br/>Session Management]
        CMS[Payload CMS]
    end
    
    subgraph "apps/api - Hono Backend"
        HonoRoutes[~130 Routes]
        SvcRegistry[ServiceRegistry]
        EventBus[EventBusService]
    end
    
    subgraph "apps/notification-service"
        SSEMgr[SSE Connection Manager]
        NotifConsumer[Event Consumers]
        EmailPipeline[Email Pipeline]
    end
    
    subgraph "apps/search-service"
        SearchConsumer[Indexing Consumers]
        SearchAPI[Search Query API]
    end
    
    subgraph "apps/worker-service"
        BadgeWorker[Badge Processor]
        VisitWorker[Visit Aggregator]
        RetentionWorker[Data Retention]
    end
    
    subgraph "Message Broker"
        RabbitMQ[RabbitMQ<br/>events exchange]
    end
    
    subgraph "Infrastructure"
        Postgres[(PostgreSQL)]
        Redis[(Redis)]
        MeiliSearch[(MeiliSearch)]
    end
    
    Browser --> CDN
    CDN --> SSR
    CDN --> BFF
    CDN --> SSEMgr
    BFF --> HonoRoutes
    HonoRoutes --> SvcRegistry
    HonoRoutes --> EventBus
    EventBus --> RabbitMQ
    RabbitMQ --> NotifConsumer
    RabbitMQ --> SearchConsumer
    RabbitMQ --> BadgeWorker
    NotifConsumer --> SSEMgr
    SSEMgr --> Redis
    SearchConsumer --> MeiliSearch
    SvcRegistry --> Postgres
    SvcRegistry --> Redis

What Comes Next

The POC established that the pattern works and the bugs are fixable. What remains is doing this systematically.

Phase 1: Systematic decoupling within the monolith

The next services to migrate are RecordService, OccurrenceService, BadgeService (as a consumer), and NotificationService (as a consumer). Each follows the same pattern:

  1. Define the event schema in src/lib/events/schemas/.
  2. Add the EVENT_NAMES and QUEUES constants.
  3. Replace direct side-effect calls in the service with eventBus.emit().
  4. Write a consumer class that handles the side effects.
  5. Register the consumer in initializeRabbitMQConsumers().
  6. Write unit tests for the consumer handlers in tests/unit/.

By the end of Phase 1, the monolith, although more modular, is still a monolith, but the coupling between write paths and side-effect paths is severed. Every side effect is now a consumer of a message queue, and the synchronous call graph becomes significantly more shallow.

Phase 2: Turborepo monorepo and extracting a Hono backend

Once the event boundaries are clean, the next step is process extraction. The plan is to move the ~130 API routes to a standalone Hono application (apps/api) with Bun runtime and use Next.js purely as a BFF (Backend for Frontend) that proxies API calls, with additional headers, and handles SSR. The notification service becomes its own process (apps/notification-service) that subscribes to RabbitMQ and manages SSE connections backed by Redis pub/sub. As for the search indexer and worker jobs, they move to apps/search-service and apps/worker-service.

The new Turborepo monorepo structure will share types and schemas through workspace packages (packages/events, packages/db, packages/shared), so the extracted services have compile-time safety without duplicating definitions.

Part 2 of this series covers Phase 1: the systematic decoupling work, the event schema design decisions, and what I have learned about ordering consumer registration across a service graph with bidirectional dependencies. Stay tuned!