AI Project

MagicLens — Real-Time Vision Recognition for the Web

MagicLens is a real-time multimodal visual recognition interface powered by Nuxt 3 and OpenAI Vision. Designed for speed, structure, and real-world use.

MagicLens Thumbnail

MagicLens began as a Disney character detector and quickly evolved into a flexible vision layer capable of identifying flowers, foods, logos, objects in nature, and more.

The interface processes a live camera stream, sends frames to a lightweight Nuxt API route, and receives structured results from the OpenAI Vision model. No backend database or external server is required.

How It Works

Camera Stream

A Nuxt camera component accesses the device’s camera using native browser APIs and captures frames at a controlled interval.

Nuxt API Route

Frames are posted to a lightweight Nuxt server route (`/api/vision-identify`) which encodes the image and pairs it with a detection-mode prompt.

OpenAI Vision Model

The server forwards the frame to the OpenAI Vision endpoint with a refined prompt for the active mode (Disney, Flowers, Food, Landmarks, Logos).

Structured JSON Output

The backend enforces strict JSON output (id, label, summary, confidence), ensuring predictable rendering across every scanning mode.

Architecture Overview

System Architecture

1

Camera Stream (Client)

A Nuxt 3 component accesses the device camera using native browser APIs and streams frames at a controlled interval.

2

Nuxt API Route

Frames are sent to a lightweight `/api/vision-identify` endpoint inside the Nuxt app. This endpoint handles image encoding, prompt selection, and request formatting.

3

OpenAI Vision Request

The API route sends the frame to OpenAI Vision with curated mode-specific prompts (Disney, Flowers, Food, Landmarks, etc.). The request is optimized for low latency.

4

Structured JSON Response

The backend enforces strict JSON output (ID, label, summary, confidence) ensuring consistent handling across all detection modes.

5

MagicLens UI Layer

The client interface animates detection results, confidence scoring, category labels, and user-selected mode changes in real time.

Tech Stack