KeyLM Project Documentation
KeyLM is a hybrid free-tier and BYOK multi-provider chat app built with Next.js App Router, Prisma, and Postgres. This page documents the product flow, backend APIs, and data model in one place.
Overview
A single workspace where users can start on a shared Groq free pool, then move to their own OpenAI, Gemini, or Anthropic keys.
Hybrid Access
New accounts get a shared Groq fallback before switching to personal keys.
Model Catalog
Models are normalized across providers and cached for 24 hours per user.
Streaming Chat
Server-sent events deliver token deltas with stop and retry safety.
Threaded History
Threads persist provider, model, settings, message history, and token usage.
Quick Start
1. Install dependencies
npm install2. Create the environment file
cp .env.example .env
# set DATABASE_URL, APP_AUTH_SECRET, APP_ENCRYPTION_KEY, GROQ_API_KEY3. Generate an encryption key
node -e "console.log(Buffer.from(require('crypto').randomBytes(32)).toString('base64'))"4. Run database migrations
npm run prisma:migrate5. Start the dev server
npm run devEnvironment
Required variables for local development and production.
- DATABASE_URL
- Postgres connection string used by Prisma.
- APP_AUTH_SECRET
- HMAC secret for signing session tokens.
- APP_ENCRYPTION_KEY
- 32-byte base64 key for encrypting provider secrets.
- GROQ_API_KEY
- Server-only API key used for the shared KeyLM Free Groq pool.
- GROQ_BASE_URL
- Groq base URL, defaults to https://api.groq.com/openai/v1.
- GROQ_FREE_MODEL
- Fixed shared free model, defaults to moonshotai/kimi-k2-instruct-0905.
- GROQ_FREE_FALLBACK_MODELS
- Optional comma-separated Groq fallback models if the primary free model is unavailable.
- FREE_USER_DAILY_LIMIT
- Per-user daily free request limit, defaults to 50.
- FREE_GLOBAL_DAILY_LIMIT
- Global daily free request limit, defaults to 1000.
- RATE_LIMIT_PER_MINUTE
- Optional request limit for chat and password reset endpoints.
- PASSWORD_RESET_TTL_MINUTES
- Optional TTL for password reset tokens (defaults to 60).
User Flow
- Create an account or sign in.
- Use KeyLM Free immediately if daily user/global Groq quota is still available.
- Add a provider key and validate it with a lightweight request when you want BYOK mode.
- Load the model list for connected providers and create BYOK threads.
- Send a message and stream responses via SSE.
- Persist assistant output, token usage, and continue the thread.
Architecture
The app is split into route handlers under src/app/api and reusable services undersrc/lib.
Auth and sessions
Email and password auth with signed, httpOnly session cookies.
Key management
Provider keys are stored encrypted, masked in UI, and audited.
Provider adapters
OpenAI, Gemini, Anthropic, and Groq adapters normalize models, streaming, and usage.
Model service
Model lists are cached per key and refreshed on demand.
Thread service
Threads and messages are persisted with idempotent request IDs.
Free-tier quotas
Per-user and global daily counters gate the shared Groq fallback.
Project Structure
src/app
App Router pages and API route handlers.
src/lib
Core services, providers, crypto, auth, and utilities.
prisma
Database schema and migrations.
src/app/globals.css
Shared theme and component styles.
API Endpoints
Auth
/api/auth/register
Create an account and start a session.
/api/auth/login
Authenticate and start a session.
/api/auth/logout
Clear the session cookie.
/api/auth/me
Return the current session user.
/api/auth/password-reset/request
Create a password reset token.
/api/auth/password-reset/confirm
Finish a password reset.
Provider keys
/api/providers/:provider/keys
Validate and store a new key.
/api/providers/:provider/keys
List keys for a provider.
/api/providers/:provider/keys/:keyId/validate
Re-validate a stored key.
/api/providers/:provider/keys/:keyId
Revoke a key.
Models
/api/providers/:provider/models
Return cached models, with optional refresh=true.
/api/providers/:provider/models/refresh
Force a model refresh and update cache.
Free usage
/api/usage/free
Return the current user/global Groq free quota snapshot.
Threads and messages
/api/threads
Create a BYOK or KeyLM Free thread.
/api/threads
List threads for the user.
/api/threads/:threadId
Get a thread and its messages.
/api/threads/:threadId
Delete a thread.
/api/threads/:threadId/messages
Send a message, stream SSE deltas, and persist token usage.
Data Model
User
id, email, passwordHash, createdAt
ProviderKey
provider, keyCiphertext, keyMask, status, lastValidatedAt, lastUsedAt
ProviderModelCache
provider, keyId, models, fetchedAt, expiresAt
Thread
provider, model, systemPrompt, settings, status, updatedAt
Message
threadId, role, content, providerMessageId, clientRequestId, metadata.usage
AuditLog
action, provider, keyId, metadata, createdAt
PasswordResetToken
tokenHash, expiresAt, usedAt
UserDailyFreeUsage
userId, day, count
GlobalDailyFreeUsage
day, count
UX Behavior
- Users without active keys can start on KeyLM Free while quota remains.
- Model dropdown appears only after a provider key is active.
- Model lists are cached for 24 hours and can be refreshed manually.
- Threads are locked to the provider and model chosen at creation, including Groq free threads.
- After 5 free requests, the UI shows a persistent reminder to connect a personal key for better output.
- Streaming responses show deltas in real time with stop support.
- Each assistant reply shows prompt, output, and total token usage when available.
Security
- Provider keys are encrypted at rest and never returned in plaintext.
- The shared Groq key stays server-side and is never exposed to clients.
- Passwords are hashed with bcrypt and sessions are signed server-side.
- Rate limiting protects chat streaming and password reset requests.
- Audit logs track key lifecycle events for traceability.
- Model and thread access is scoped to the authenticated user.
Edge Cases
- A key that was valid can be revoked later; validation endpoints update status.
- If a model refresh fails, cached models are served with a stale flag.
- Duplicate message requests are deduped via clientRequestId.
- Free quota resets at 00:00 UTC for both the user bucket and the global pool.
- Rate limits return retryable errors with 429 responses.
Testing
- Unit: provider adapters, crypto helpers, and validation schemas.
- Integration: key validation, free quota reservation, model caching, and thread persistence.
- E2E: use KeyLM Free, exhaust quota, connect a key, stream chat, and save history.
- Security: verify secrets never leak to logs or responses.
Roadmap
- Tool calling and structured output support.
- Vision attachments with capability gating.
- Usage analytics and per-model cost reporting.
- Team workspaces with shared key vaults.