Automation & Monitoring

The moment your product is live, things can break at any time. You need a smoke detector.

6 min read + 10 min appendix

You need systems that tell you when something goes wrong before your users do — and routines that handle the boring stuff while you sleep.

The Smoke Detector

The moment your product is live, things can break at any time. At 3 AM. On a holiday. During the one hour you decided to take off. You need systems that tell you when something goes wrong before your users do.

RESTAURANT: A smoke detector doesn't cook the food or serve the guests. But without one, a small grease fire becomes a closed restaurant. Monitoring is your smoke detector. Automation is the system that turns off the stove before the fire starts.

What to Monitor

Edge function errors. Every failed request should trigger a notification. Not an email you'll check tomorrow — an instant notification on your phone. Use a Discord webhook: when an error occurs, your edge function sends a message to a Discord channel. Your phone buzzes. You know immediately.

Payment failures. Stripe webhook failures are silent killers. A failed webhook means a user paid but didn't get activated, or cancelled but didn't get deactivated. Monitor your webhook-receiver for any non-200 response.

Database health. Connection counts, slow queries, table sizes. Supabase provides a dashboard, but set alerts for thresholds: if active connections exceed 80% of your limit, you need to know before users start getting errors.

NOTE: The most dangerous failures are the silent ones. The feature that fails but doesn't throw an error — it just returns bad data. Add explicit checks for impossible states: a user with a subscription but no Stripe ID, a conversation with zero duration, a payment with no associated user.

What to Automate

Scheduled health checks. A cron job that runs every hour and verifies your critical systems are responding. Edge functions, database, third-party APIs. If any check fails, it alerts you immediately.

Data cleanup. Expired sessions, stale tokens, orphaned records. Run these nightly during low-traffic hours. Don't let garbage accumulate in your database.

Content generation. If your product uses AI-generated content, automate the pipeline. Generate content in batches during off-peak hours when API costs may be lower and you're not competing with user-facing requests for resources.

The Alert Severity Framework

Not every alert deserves the same response. Categorize them:

Critical: Users are affected right now. Payment failures, auth outages, complete function failures. You stop what you're doing and fix it.

Warning: Something is degraded but still working. Elevated error rates, slow response times, approaching capacity limits. You investigate within the hour.

Info: Nothing is broken but something is worth knowing. Unusual traffic patterns, new user spikes, content generation completion. You check it when convenient.

Chapter Appendix