The moment your product is live, things can break at any time. You need a smoke detector.
The moment your product is live, things can break at any time. At 3 AM. On a holiday. During the one hour you decided to take off. You need systems that tell you when something goes wrong before your users do.
RESTAURANT: A smoke detector doesn't cook the food or serve the guests. But without one, a small grease fire becomes a closed restaurant. Monitoring is your smoke detector. Automation is the system that turns off the stove before the fire starts.
Edge function errors. Every failed request should trigger a notification. Not an email you'll check tomorrow — an instant notification on your phone. Use a Discord webhook: when an error occurs, your edge function sends a message to a Discord channel. Your phone buzzes. You know immediately.
Payment failures. Stripe webhook failures are silent killers. A failed webhook means a user paid but didn't get activated, or cancelled but didn't get deactivated. Monitor your webhook-receiver for any non-200 response.
Database health. Connection counts, slow queries, table sizes. Supabase provides a dashboard, but set alerts for thresholds: if active connections exceed 80% of your limit, you need to know before users start getting errors.
NOTE: The most dangerous failures are the silent ones. The feature that fails but doesn't throw an error — it just returns bad data. Add explicit checks for impossible states: a user with a subscription but no Stripe ID, a conversation with zero duration, a payment with no associated user.
Scheduled health checks. A cron job that runs every hour and verifies your critical systems are responding. Edge functions, database, third-party APIs. If any check fails, it alerts you immediately.
Data cleanup. Expired sessions, stale tokens, orphaned records. Run these nightly during low-traffic hours. Don't let garbage accumulate in your database.
Content generation. If your product uses AI-generated content, automate the pipeline. Generate content in batches during off-peak hours when API costs may be lower and you're not competing with user-facing requests for resources.
Not every alert deserves the same response. Categorize them:
Critical: Users are affected right now. Payment failures, auth outages, complete function failures. You stop what you're doing and fix it.
Warning: Something is degraded but still working. Elevated error rates, slow response times, approaching capacity limits. You investigate within the hour.
Info: Nothing is broken but something is worth knowing. Unusual traffic patterns, new user spikes, content generation completion. You check it when convenient.