Multimodal AI · Creative Tools · Privacy · Fine-Tuning

Yuni

A private AI platform that lets creatives, brands, and studios train models on their own work — building a personalized AI that generates in their voice, not the internet's. Your model, your outputs, your IP.

Client: YuniRole: AI Platform EngineerYear: 2024hiyuni.co ↗

Multimodal AIFine-TuningLoRACreative ToolsPrivacy-FirstGenerative AI

The Problem

The generative AI boom created a paradox for professional creatives. The tools are astonishingly powerful — but they're trained on everyone's work, stylistically generic by design, and legally ambiguous. A fashion photographer who uses Midjourney gets results that look vaguely like every fashion photographer on the internet, not like their specific visual language built over a decade.

More critically: when a creative uploads their work to a commercial AI platform, they typically grant a license for that work to be used in future model training. For a brand or agency with proprietary visual assets, that's an IP exposure risk they can't accept.

Yuni's premise: creatives should own the model trained on their work, generate exclusively in their aesthetic, and retain full IP over every output. The platform provides the infrastructure for private fine-tuning and inference — you bring the training data, Yuni provides the compute and the workflow.

Platform Architecture

Custom Model Training Pipeline

Users upload a training set — typically 20–100 images for style fine-tuning, or structured text samples for voice/copy fine-tuning. The pipeline runs LoRA (Low-Rank Adaptation) fine-tuning on Stable Diffusion XL with DreamBooth-style subject binding for brand-specific assets. Training is fully tenant-isolated: model weights are stored in a private S3 prefix scoped to the user's account, never shared across tenants, and optionally encrypted with user-managed keys. The full training run typically completes in 15–45 minutes on an A100 instance, managed by a Celery task queue with progress webhooks to the frontend.

Multimodal Creative Canvas

The generation interface is a canvas-based workspace, not a prompt box. Users can layer: text prompts, style references (pulled from their trained model or uploaded references), composition sketches (rough bounding boxes that constrain layout), and negative prompts. The canvas supports iterative refinement — take a generated image, select a region, and regenerate just that area with a local prompt. For text/copy generation, the canvas supports "voice stitching" — generating new content that blends the user's fine-tuned style model with a specific tone instruction (formal, playful, urgent). This is particularly valuable for brand teams maintaining consistency across large content volumes.

Collaborative Workspaces

Studio teams need to iterate together. Yuni's workspace model supports real-time collaboration via a CRDT-based state sync — multiple users can view, comment on, and fork generations in a shared board. Access controls are granular: a brand can invite a freelancer to a specific campaign workspace with view-and-comment permissions while keeping their base model private. Generated assets carry embedded provenance metadata — model ID, timestamp, prompt hash — so teams can reconstruct how any asset was generated even months later.

Privacy Guarantees by Architecture

Unlike public AI tools, Yuni's training jobs run in isolated compute environments with no persistent logging of training data beyond the user's own storage. Fine-tuned model weights are never used to update the base model. Inference runs against the user's private weights, not shared infrastructure. This lets enterprise customers satisfy their legal and compliance teams' requirements around third-party AI tools.

Key Engineering Challenges

Training stability at small dataset sizes

20–100 images is a tiny training set for fine-tuning. We tuned LoRA rank (r=16 to r=64), learning rate schedules, and regularization to prevent overfitting to the training images while preserving generalization capability. Automatic early stopping based on FID/CLIP score on a held-out validation set.

Prompt-to-style coherence

Fine-tuned models drift when users combine strong style embeddings with complex compositional prompts. We built a prompt strength calibration step that automatically adjusts the style weight in the conditioning stack to prevent the fine-tuned aesthetic from overwhelming the intended composition.

Cost-efficient inference

Private model weights mean we can't share inference infrastructure across tenants. We implemented model caching — keeping recently-used fine-tuned models hot in GPU VRAM with an LRU eviction policy — to amortize the per-inference cold start cost across a session.

Asset provenance at scale

Each generated image is embedded with a C2PA-compliant content credential: model ID, generation parameters, timestamp. This invisible metadata chain survives re-saves and supports auditability for brand compliance — you can always prove an asset was AI-generated and trace it to its source model.

Stack

PythonFastAPIStable Diffusion XLLoRA Fine-TuningDreamBoothHugging Face DiffusersCLIPPyTorchAWS S3Celery + RedisNext.jsTypeScripttRPCPostgreSQL

← TickerLens Next: Subclarity →