A technical case study on the design, architecture, and security posture of Minuto — a SaaS platform that turns confidential professional-services meetings into structured, auditable, encrypted institutional memory.
I. The Problem Nobody Wanted to Solve
In a 14th-floor conference room in Makati, a senior audit manager places her phone face-down on the polished table and presses record on a different device — a small black recorder she bought specifically because it does not connect to the internet. Across the table, three partners from a regional consumer-goods conglomerate are about to walk her through a revenue-recognition issue that, depending on how it is characterized in the working papers, could shift the company's reported earnings by eight figures.
She is not paranoid. She is doing her job.
For the better part of a decade, the global meeting-transcription market has been dominated by tools optimized for a very different buyer: the product manager who wants to remember what marketing said about Q3 launch dates. Otter.ai, Fireflies, Fathom, tl;dv, Bluedot, Fellow — a constellation of consumer-grade and prosumer-grade SaaS products built around a shared assumption that the meeting in question is not, fundamentally, a secret.
For audit firms, law firms, tax consultancies, and management consulting practices in Southeast Asia, that assumption is not just wrong. It is disqualifying. A tool that streams audio to a third-party model provider, stores the resulting transcript in plaintext on a shared S3 bucket, and lets any teammate with a link see the entire conversation, is a tool that cannot be deployed inside a Big Four practice. It cannot be deployed inside a mid-tier law firm. It cannot, in many cases, even be discussed at the partner level without triggering a compliance review.
This is the gap Minuto was built to close.
Minuto is a meeting intelligence platform — a system that ingests audio recordings of professional services engagements, transcribes them with speaker diarization, generates structured Minutes of Meeting (MoM) documents with an AI model, extracts action items into a Kanban workflow, and binds everything to a hierarchy of clients, engagements, and workspaces that mirrors how an audit or consulting firm actually operates. It is, on the surface, a category of software that already exists.
What makes Minuto different is what happens beneath the surface: a deliberately constructed zero-trust architecture in which the database itself does not believe its own users, where every sensitive field is encrypted with a key derived from the user's identity before it is written to disk, and where mandatory TOTP multi-factor authentication is enforced not by a login screen but by the row-level security policies on the tables themselves.
This is the story of how that architecture was built, why each decision was made, and what it costs — in latency, in complexity, and in product velocity — to treat a meeting transcript as if it were a financial ledger entry. Because, for Minuto's users, that is precisely what it is.
II. A Stack Built Around a Threat Model
Most SaaS architecture decisions begin with a question about scale. Minuto's began with a question about blast radius.
"If a single user's account is compromised," asked Lance Delariarte, the platform's founder and sole administrator of record in the production database, "what is the absolute maximum amount of data the attacker can access?" The answer the team kept arriving at — across whiteboard sessions, across three architecture rewrites, across one late-night incident in which a misconfigured RLS policy briefly exposed a single client name to the wrong workspace — was always the same. They wanted that number to be small. They wanted it, ideally, to be one meeting. One client. One conversation.
That ambition shaped the entire stack.
The frontend: deliberately boring
The application runs on React 18 with Vite 5 and TypeScript 5, styled with Tailwind CSS v3 and a curated set of shadcn/ui components built on Radix primitives. There is nothing experimental about this choice. The frontend team explicitly rejected proposals to adopt Next.js, Remix, or any framework requiring a server runtime, because every server-side rendering surface is also a server-side credential surface — a place where, in a worst-case incident, an attacker could potentially read environment variables containing service-role keys.
The application is therefore a pure client-side single-page application, deployed as a static bundle to Vercel and a custom domain at minuto.vibepoint.dev. There is no Node.js server in the production path. There is no Express middleware. There is no SSR cache that might inadvertently serve another tenant's data. Every authenticated request is made directly from the user's browser to the backend, signed with a JWT issued at login.
Visually, the application leans into what the design memory file calls "Productivity SaaS aesthetic" — a near-white background, a single primary indigo (#4355B9), 10-pixel border radii, and a typographic system that uses Playfair Display exclusively for hero headlines and Inter for everything else. Dark editorial themes were considered and rejected; auditors, the team learned in user interviews, work in fluorescent-lit offices and prefer interfaces that resemble Microsoft Office more than they resemble Linear.
The backend: Lovable Cloud as a managed Supabase
For its backend, Minuto runs on Lovable Cloud, a managed Supabase environment that provides a Postgres database, authentication, object storage, edge functions on the Deno runtime, and realtime subscriptions over WebSockets. The team made an early decision to treat Lovable Cloud not as a backend service but as a backend substrate — a foundation on top of which the actual security-critical logic would be implemented in two layers above it: row-level security policies in the database, and edge functions for any operation that required a service-role credential or a third-party API call.
The result is an architecture in which the client never holds a service-role key, the edge functions never trust the client's claims about who it is, and the database itself enforces every access decision through policies that cannot be bypassed by a clever query. There are, as of this writing, more than forty distinct RLS policies across the production schema, plus a smaller set of SECURITY DEFINER Postgres functions used to encapsulate operations that legitimately need to bypass RLS — all of which begin with an explicit auth.uid() check before doing anything else.
The data model: workspaces, engagements, clients, meetings
The schema is organized around four nested concepts that mirror the operational hierarchy of a professional services firm:
- A workspace is the top-level tenant boundary. It typically represents a firm, a practice, or a department. Every meeting, every client, every engagement, every credit balance is bound — directly or indirectly — to exactly one workspace.
- A client is an organization that the workspace serves. Clients carry their own metadata: a fiscal year, a service line, a specialization (audit, tax advisory, M&A, litigation), a list of members.
- An engagement is a unit of work for a client — an audit cycle, a quarterly advisory project, a specific litigation matter. Engagements have their own membership, their own team leader, and their own lifecycle (active, closed, archived).
- A meeting is a single recorded conversation, attached to a client and optionally to an engagement, with its own attendees, audio file, transcription, and generated MoM document.
This hierarchy is not cosmetic. It is the access-control graph. A user who is added to an engagement gains read access to the engagement's meetings — and only those meetings — through a recursive RLS policy that walks from meetings.engagement_id to engagement_members.user_id. A user added to a workspace can see the workspace's clients, but cannot see meetings unless they are also added to the relevant engagement. This separation, the team found, was non-negotiable for audit firms, where the partner running an engagement and the manager staffing it may not want every workspace member browsing the working papers.
III. The Encryption Layer Nobody Talks About
The most distinctive technical decision in the Minuto codebase is one that, by design, the user never sees.
Every transcript, every MoM document, and every MoM version stored in the database is encrypted at the application layer with AES-256-GCM, using a key derived per user via HKDF-SHA-256 from a master key held only in the edge function environment. The implementation lives in supabase/functions/_shared/encryption.ts and is, in its entirety, fewer than eighty lines of code.
async function deriveKey(userId: string): Promise<CryptoKey> {
const masterKeyHex = Deno.env.get("ENCRYPTION_MASTER_KEY");
const masterKeyBytes = new Uint8Array(
masterKeyHex.match(/.{1,2}/g)!.map((b) => parseInt(b, 16))
);
const keyMaterial = await crypto.subtle.importKey(
"raw", masterKeyBytes, "HKDF", false, ["deriveKey"]
);
return crypto.subtle.deriveKey(
{
name: "HKDF",
hash: "SHA-256",
salt: encoder.encode("minuto-encryption-salt"),
info: encoder.encode(userId),
},
keyMaterial,
{ name: "AES-GCM", length: 256 },
false,
["encrypt", "decrypt"]
);
}
The mechanics are deliberately simple. For each user, a unique 256-bit AES key is derived deterministically from the master key and the user's UUID. The master key never leaves the edge function isolate. The derived key never leaves the edge function isolate. Plaintext arrives at the edge function over TLS, is encrypted, and is then written to Postgres in a self-describing string format (enc:<iv-base64>:<ciphertext-base64>) that includes the GCM initialization vector inline.
The choice to encrypt at the application layer rather than relying solely on Supabase's at-rest disk encryption is the philosophical center of the platform. Disk encryption protects against an attacker who steals the physical hard drives. It does not protect against a SQL injection bug in an edge function, or a misconfigured RLS policy, or — as the team is acutely aware — a future Supabase support engineer with database read access. By encrypting with a per-user-derived key, Minuto ensures that even a complete database dump, in the wrong hands, would yield ciphertext bound to user identities the attacker would still need to compromise individually.
The trade-offs are real and the team is honest about them. Search across encrypted transcripts cannot be done at the database level; it must be done by decrypting in the edge function and matching in memory, which is why the AI-powered cross-meeting search (powered by google/gemini-2.5-flash through the Lovable AI Gateway) routes all queries through a dedicated search-meetings edge function rather than a Postgres full-text index. Backups are similarly opaque: a restored database without the master key is, by intention, a wall of base64.
IV. The MFA Decision and the AAL2 Wall
In most consumer SaaS products, multi-factor authentication is offered as a setting and adopted by single-digit percentages of users. In Minuto, it is enforced at the database layer as a precondition for writing to any sensitive table — a design decision that has more in common with a banking application than with a productivity tool.
The mechanism is a Postgres function called is_aal2(), which returns true only if the JWT presented with the current request carries an Authentication Assurance Level of 2 — Supabase's term for "this session was authenticated with both a password and a verified TOTP code." Every RLS policy on the meetings, transcriptions, mom_documents, and profiles tables for write operations includes a clause that reads, in essence: and the caller's session must be AAL2.
The user-facing flow that grew up around this constraint is unusual. After signup, users land in a seven-step onboarding wizard that captures profile metadata (industry use case, organization, position, default workspace), and then — before the user can create a meeting, before they can save their profile, before they can do anything that writes to a sensitive table — they are routed to /setup-mfa, where a TOTP factor must be enrolled and verified. The application checks getAuthenticatorAssuranceLevel() on every protected route; users without an enrolled factor are bounced back to setup, and users with a factor whose session has decayed back to AAL1 are bounced to /auth to re-verify.
This was, the team admits, the most contentious product decision in the platform's history. Forcing MFA before first use depresses signup conversion. But the alternative — allowing audit-firm users to start uploading transcripts of partner meetings while protected only by a password they may have reused on three other sites — was incompatible with the platform's stated SOC 2 Type 2 readiness commitment. The decision was made to optimize for the buyer (the firm's IT director, who needs to be able to defend the tool to compliance) over the user (the manager, who would prefer not to type a six-digit code every fifteen minutes).
A 15-minute idle timeout, implemented in useIdleTimeout.ts, completes the picture: sessions that go quiet for too long are silently terminated and the user is returned to the auth screen. Combined with password-complexity rules enforced both client-side in validation.ts and server-side in the signup edge function, the resulting authentication posture is, by any reasonable measure, indistinguishable from what a regulated financial institution would require of its own internal tooling.
V. The Meeting Pipeline: From Audio to Intelligence
The core user journey — recording a meeting, getting back a structured MoM document — is implemented as a multi-stage pipeline that crosses four distinct execution environments. Understanding it requires walking through what happens to a single audio file from the moment a user clicks "New Meeting" to the moment a partner receives an email containing a link to the finished minutes.
Stage one: the wizard
The journey begins at /new-meeting, a five-step wizard that captures the structural metadata required to file the meeting correctly: client (selected from existing or newly created), engagement (optional), service line, fiscal year, attendees with structured roles (Partner, Director, Senior Manager, Manager, Senior Associate, Associate, drawn from the firm's actual hierarchy), meeting date, venue, and language preference for transcription. The wizard is built around a ClientPickerDialog and a NewClientDialog that allow on-the-fly client creation, with industry-specific field filtering driven by the constants in src/constants/services.ts.
A single meeting "draft" row is inserted into the meetings table with status draft as soon as the wizard begins, so that the audio upload that follows can be associated with a known meeting ID. This intermediate state matters: it means a user who abandons the wizard halfway through leaves behind a recoverable draft rather than an orphan audio file.
Stage two: client-side audio normalization
When the user uploads audio — supported formats include WAV, MP3, M4A, OGG, FLAC, and WebM — the file is processed entirely in the browser before it touches any server. The application uses ffmpeg.wasm, served from public/ffmpeg/, to convert the uploaded file to a normalized 16kHz mono WAV stream optimized for speech-to-text. This conversion happens in a Web Worker so the main UI thread stays responsive, and the resulting normalized blob is uploaded to a private Supabase Storage bucket called meeting-audio under a path of the form {user_id}/{meeting_id}/{filename}.
The decision to do this client-side was driven by two considerations. First, it eliminates a tier of server-side compute that would otherwise need to scale with usage. Second, and more importantly, it means the original raw audio — which may contain ambient room conversation, side comments, and unintended captures — never leaves the user's device in its unprocessed form. The 16kHz mono WAV that does get uploaded is stripped of stereo channel information and downsampled to a rate optimized for speech intelligibility, which has the side effect of reducing forensic value should the file ever be exfiltrated.
Stage three: transcription
Once the normalized audio is in storage, the application invokes the transcribe edge function, passing the meeting ID, the storage path, the user's chosen language (or "auto" for automatic detection), and diarization preferences (whether to identify distinct speakers, and if so, the expected min and max speaker counts).
The edge function — whose source lives at supabase/functions/transcribe/index.ts — is a careful piece of code. It validates the request body with a Zod schema. It rate-limits to ten transcriptions per user per ten minutes via an in-memory token bucket keyed on transcribe:{userId}. It generates a one-hour signed URL for the audio file using a service-role client, then submits a prediction request to Replicate running a WhisperX-derived model with optional pyannote-based speaker diarization (the HuggingFace token, when present, unlocks the diarization capability). The function polls Replicate's prediction endpoint at five-second intervals until the prediction either succeeds or fails — a pattern chosen over webhooks because it keeps the function self-contained and avoids the need for a publicly addressable callback endpoint.
When the transcription returns, the segments — each with start time, end time, speaker label, and text — are joined into a single full-text representation, encrypted with the meeting owner's per-user-derived key (both the full text and the segment array are encrypted independently), and inserted into the transcriptions table. The meeting status is updated to completed, and a notification row is inserted that triggers a downstream email via the send-notification-email function.
Stage four: MoM generation
The MoM (Minutes of Meeting) generation is invoked separately, either automatically as part of the pipeline or on-demand from the meeting detail page. The generate-mom edge function pulls the encrypted transcription, decrypts it in memory, builds a structured prompt that includes the meeting metadata (client, attendees, service line, language) and the transcript, and submits it to a Gemini model through the Lovable AI Gateway — a managed inference proxy that handles API key rotation, model versioning, and quota enforcement without requiring the project to maintain its own LLM credentials.
The model is prompted to produce a JSON document conforming to a schema defined in src/lib/meetingTemplates.ts, including sections for meeting overview, attendees, agenda items, key discussion points, decisions made, action items (with assignees and due dates), and follow-up items. The structured output is then encrypted and written to the mom_documents table, with a copy archived in mom_versions for revision history.
The choice of Gemini Flash specifically — rather than a more capable model like GPT-5 or Gemini 2.5 Pro — was an explicit cost-versus-quality trade-off informed by the credit economics described later in this case study. Flash, the team found, was sufficient for structured extraction tasks where the source material was already organized (a meeting transcript follows predictable conversational patterns), and the latency advantage was meaningful for users who often refresh the page waiting for their MoM.
Stage five: editing, versioning, and export
Once the MoM exists, the user enters a Markdown-based editor implemented in MomEditor.tsx, where they can revise the generated content. Every save creates a new row in mom_versions, preserving the full history. Exports are available in four formats — .md, .txt, .pdf, and .docx — implemented in src/lib/exportMom.ts using client-side libraries so that no plaintext MoM content is ever sent to a server during export.
Exports are watermarked with the Minuto branding and a footer noting that the document was generated with AI assistance, unless the user spends 5 credits to remove the watermark — one of several monetization touchpoints woven into the pipeline.
VI. The Credit Economy
Minuto is a paid product, and the payment system itself is treated as a security-critical subsystem. The platform uses a credit model rather than a subscription model: new users receive 3 free credits, additional credits are purchased through Lemon Squeezy in USD, and credits are consumed at well-defined points in the pipeline (1 credit per MoM generation, 5 credits to export an unwatermarked document, with admin accounts exempt from deduction for testing purposes).
The credit ledger lives in the user_credits and credit_transactions tables, with a deduct_credit Postgres function that performs the deduction atomically and returns false (rather than throwing) if the user has insufficient balance — allowing the application to present a friendly "Buy more credits" dialog instead of an opaque error. Credit purchases flow through two edge functions: lemonsqueezy-checkout, which creates a checkout session and records a row in payment_sessions, and lemonsqueezy-webhook, which receives the asynchronous payment confirmation, verifies the webhook signature, and credits the user's balance.
The webhook function is, by design, paranoid. It verifies the Lemon Squeezy signature against the raw request body before parsing any JSON. It checks that the corresponding payment_sessions row exists and is in a pending state before crediting. It is idempotent on the Lemon Squeezy order ID, so a retried webhook delivery cannot double-credit. The team treats the credit system, in the language of its own internal documentation, as "a financial ledger — because it is."
VII. Sharing Without Trust
One of the more difficult product surfaces in any meeting platform is the share link — the ability to give someone outside the system read access to a single meeting, often a client or an external collaborator. Most platforms implement this with a long URL containing an unguessable token, and call it done.
Minuto's implementation is an order of magnitude more careful.
A shared link, created from the meeting detail page, is a row in the shared_links table with a generated cryptographic token, an explicit 15-minute expiry, and a password hash computed via PBKDF2 with 100,000 iterations. The recipient is given the URL and, separately, the password — communicated through whatever out-of-band channel the sender chooses. To view the meeting, the recipient must enter the password, which is verified against the hash by the verify-share-link edge function. The function rate-limits attempts to 5 per 5 minutes per IP-token combination, and every successful or failed attempt is recorded in share_audit_logs along with the requesting IP address.
The sharing model is intentionally paranoid in three ways at once. The expiry is short enough that a leaked URL becomes useless quickly. The password requirement means the URL alone is insufficient. The audit log means the meeting owner can see exactly who has accessed the link and from where, supporting the kind of "who saw what, when" reporting that compliance-sensitive industries require.
The shared meeting view itself, served at /shared/:token, is rendered by a public route that calls a dedicated decryption function, decrypt-meeting-data, which decrypts only the specific fields needed for the read view — never returning the full encrypted payload to the browser. The page is rendered without the application's main navigation, with a watermark, and with no editing affordances.
VIII. Collaboration as a Privacy-First Design Problem
The collaboration features — adding teammates to meetings, building friendship graphs, sharing engagements — were built around a constraint the team articulated early: users should be discoverable only by exact match on a username they themselves chose, never by browsing.
The reasoning is straightforward. In a platform serving competing audit firms, leaking the membership of one firm's workspace to another firm's user (even just a count of users, or a list of names) would be a meaningful information disclosure. So the global "discover people" feature was scrapped in favor of a search_by_username Postgres function that returns matches only on a username string the searcher already knows, and the get_profiles_by_ids function returns profile data only when the caller can already prove they need it (because of an existing friendship, shared meeting, or shared workspace).
Friend requests, implemented through the friendships table with a request/accept handshake, follow the same logic. Workspace and engagement invites are token-based, with the invite code generated server-side, an explicit expiry, and a maximum-uses counter.
IX. The Administrator's View
The platform supports a single administrative role, persisted in a dedicated user_roles table (never on the profiles table — a deliberate design choice to prevent privilege-escalation attacks via profile updates). Admin checks are performed by a has_role(user_id, role) SECURITY DEFINER function that is the only sanctioned path to verify admin status, called from both an AdminRoute component on the client and from the RLS policies on the admin-only tables.
The admin panel — accessible only to users in user_roles with role admin — exposes a set of admin_* RPC functions that allow listing all users, all meetings, all clients, all payments, and all audit logs across the platform; merging duplicate clients; manually adjusting credit balances (with a required reason that gets written to the audit log); deleting meetings; and managing workspace memberships directly without going through the invite flow.
Every administrative action is logged to the audit_logs table via the log_audit_event function, which is called from inside each admin RPC. The platform's single production administrator, by current count, is me@lmadelariarte.com — a deliberate concentration designed to keep the trusted set as small as possible until further role separation is needed.
X. The PWA and the Mobile Reality
Although Minuto's primary user is a knowledge worker at a desk, a meaningful fraction of the platform's traffic comes from mobile devices — partners checking action items during a taxi ride, managers reviewing minutes before walking into a follow-up meeting. The platform is built as a Progressive Web App with a network-first service worker registered only in production builds (development PWAs being a notorious source of stale-cache debugging incidents).
The mobile experience replaces the desktop sidebar with a bottom navigation bar at viewports below 768 pixels, with every tap target sized to a minimum of 44 pixels per platform accessibility guidelines. The application is installable on iOS and Android home screens, with a manifest at public/manifest.json and platform-appropriate splash screens.
Meeting recording on mobile is an explicit non-feature. The team considered and declined to implement in-app audio capture, on the grounds that the operating system's native voice recorder is more reliable, more familiar to users, and — crucially — less likely to be blocked by enterprise mobile-device-management policies that restrict microphone access for installed PWAs.
XI. The Engineering Choices That Did Not Make It
A case study that listed only successes would be uninteresting and, more importantly, incomplete. Several technical decisions in Minuto's history were reversed, and the reversals are instructive.
An early proposal to store raw audio indefinitely was rejected after a legal review noted that retention of voice recordings beyond the period necessary to generate the transcript would expose the platform to additional regulatory obligations under GDPR-equivalent Southeast Asian privacy regimes. Audio files are now eligible for deletion once their transcript has been generated, with the application offering users an explicit "delete audio" action and a forthcoming automatic purge policy.
An attempt to implement search via Postgres full-text indexes was abandoned when the team realized that the encrypted-at-rest design made server-side text search structurally impossible. The replacement — AI-powered cross-meeting search routed through Gemini in an edge function that decrypts in memory — is more expensive per query but consistent with the platform's threat model.
A first version of the share-link feature that omitted the password requirement was deployed for approximately three days before being replaced with the current PBKDF2-protected design. The change was prompted by a design review in which the team modeled a scenario in which a shared URL was accidentally forwarded in an email reply chain — a scenario the password requirement renders harmless.
A proposal to use Supabase's anonymous sign-in feature for share-link access was rejected because anonymous sessions would have polluted the auth metrics and complicated the audit log analysis. The current design uses no Supabase session at all for shared-link viewers; they are authenticated solely against the shared link's password hash.
XII. What This Case Study Is Really About
Strip away the specific technologies, the framework choices, the encryption modes, the rate-limit thresholds, and what remains is a single argument the Minuto team has been making, in code, for the better part of a year: that the conventions of consumer SaaS — implicit trust between teammates, plaintext storage at rest, optional MFA, share-by-link with no expiry, broad workspace visibility — are not safe defaults. They are choices made by product teams whose users are not, in any meaningful sense, adversaries to one another.
For the buyers Minuto serves, those defaults are precisely the obstacles preventing them from adopting any of the dozen tools competing for their budget. The platform's wager is that a meaningful and growing fraction of professional services firms, in Southeast Asia and eventually beyond, will pay a premium for software whose security posture matches the sensitivity of the conversations they record. The early evidence — paying customers, enterprise inquiries, an audit firm partner who, in a recent user interview, described the MFA enforcement as "the first thing that made me trust this" — suggests the wager is correct.
The 14th-floor conference room in Makati, with its black non-internet-connected recorder, is not the future the platform is selling against. It is the present it is trying to replace. The pitch is not "stop being paranoid." The pitch is: be exactly as paranoid as you already are, and use a tool that meets you there.
That, in the end, is what the encryption, the AAL2 wall, the per-user key derivation, the 15-minute share expiry, the audit logs, the username-only discovery, the watermarked exports, the credit ledger treated as a financial system, and the deliberately boring frontend stack are all in service of.
A meeting transcript, in the right industries, is not a productivity artifact. It is institutional memory under privilege. Minuto is the first system this team has built that takes that distinction seriously, all the way down to the schema.
Minuto is live at minuto.vibepoint.dev. The platform is in early production with a small but actively growing user base of audit, consulting, and legal services firms across Southeast Asia.