Full-StackAIRAGFastAPINext.jspgvector

Adaptive RAG

A full-stack retrieval-augmented generation platform that ingests PDFs, generates sentence-level embeddings, and returns grounded answers with page-level source citations. Five vector pruning strategies plus a live pipeline inspector.

Pipeline inspector and chat interface

Pipeline inspector and chat interface

Embedding visualizer and pruning report

Embedding visualizer and pruning report

PDF Ingestion

Drag and drop upload. Documents are chunked at the sentence level with overlap windows for maximal recall.

Embedding Engine

768D sentence-transformer embeddings stored in pgvector. Supports cosine, whitened cosine, k-means, and MMR retrieval.

Five Pruning Strategies

Cosine, Cosine Whitened, K-Means Clustering, MMR, Threshold. Switchable at query time via the UI.

Dev Mode

Live pipeline inspector exposes per-chunk pruning reports and an interactive 768D embedding visualizer.

Grounded Answers

Every response is anchored to page-level citations. No hallucination without a traceable source.

Stack

FastAPI, pgvector (PostgreSQL), Next.js, TypeScript, sentence-transformers, OpenAI

Problem

Standard RAG pipelines retrieve too many irrelevant chunks, inflating context and degrading answer quality. Adaptive RAG lets you tune the pruning strategy per use-case.

Architecture

Decoupled FastAPI backend handles all ML inference. Next.js frontend streams answers via SSE. pgvector handles ANN search with IVFFLAT indexing.

My contribution

Built the full frontend, including the pipeline inspector, embedding visualizer, and chat interface. On the backend, worked on the cosine similarity, cosine similarity whitened, and k-means clustering pruning strategies.