Arvind Narayan — Staff AI/ML Engineer

Hybrid RAG & agents

Enterprise AI research — Perplexity for scientific teams

I architected and shipped a production-grade hybrid RAG + agentic research platform that became the primary AI interface for lab workflows — an enterprise Perplexity for scientific teams that significantly boosted researcher productivity and day-to-day usage across the organization.

The retrieval stack combines BM25/TF-IDF with dense embeddings, late fusion, and cross-encoder reranking for high recall on domain-specific queries over large chemical and biological corpora. A LIGHT-style memory subsystem scales to millions of tokens of conversational history via episodic retrieval, structured working memory, and a compressed scratchpad — so multi-session research stays context-aware. Domain agentic tools (toxicity lookup, molecular property prediction, structure normalization, ChEMBL & PubChem connectors) let the LLM invoke specialized ML models and databases as callable actions. Inference runs on self-hosted Qwen 2.5 27B on AWS SageMaker with SGLang for high-throughput, low-latency serving, with a modular path to swap in larger models plus evaluation harnesses and domain fine-tunes.

The daily AI research platform for scientific teams

Molecular ML & GNNs

Drug property prediction before the wet lab

I build graph neural networks that treat molecules as graphs — atoms as nodes, bonds as edges — and predict whether a compound will be toxic, absorbed correctly, or cross the blood–brain barrier long before it reaches a wet lab, hitting near state-of-the-art accuracy on biomedical benchmarks.

The work spans the full molecular ML lifecycle: curating and featurizing chemical datasets, training message-passing GNNs for ADMET and BBB tasks, running ablations and evaluation against strong baselines, and deploying models behind cloud inference APIs that discovery teams can call from real workflows. The same judgment that shapes larger systems shows up here — reliable metrics, reproducible training, and serving that scientists actually use.

F1 ≈ 0.90 · AUROC ≈ 0.92 on ADMET / BBB tasks

Clinical analytics

Finding why a Phase 3 trial “failed”

On a Phase 3 drug trial that looked like a miss on paper, I investigated why expected efficacy wasn’t showing up — and found that how patients spoke during clinic visits, not only what they reported on instruments, predicted who was responding to placebo.

That pointed to a clinic-habituation effect: longitudinal speech and visit features carried signal that standard endpoints were missing. The analysis delivered concrete recommendations for how later trials should account for placebo response — turning a confusing null result into actionable clinical research analytics for the VistaGen PAL-3 program.

Analytics shipped for VistaGen PAL-3

EdTech & retention ML

TikTok-style micro-learning that actually sticks

I conceived and built Shorts end-to-end — a TikTok-style micro-learning feed for upGrad — combining the SM-2 (SuperMemo-2) spaced-repetition algorithm with a feed-forward neural network that personalizes content sequencing and retention scheduling for each learner.

Vanilla SM-2 is static; the neural layer adjusts review intervals using activity patterns, concept-level recall history, domain difficulty, and course progress so cards surface when a learner is actually likely to engage. In parallel, I developed predictive models that flag students likely to fail or drop out months in advance from engagement, attendance, and social interaction signals — giving academic and growth teams time to intervene instead of finding out after the fact.

Personalized retention scheduling · early dropout risk signals

LLMOps & serving

LLM infrastructure that survives production

I ship models into production with the boring parts done right — model registry and versioning, CI/CD for ML, canary and staged rollouts, monitoring, and drift checks — so inference stays reliable after the demo glow fades.

That same path extends to open-source and frontier models routed through LiteLLM and AWS Bedrock, with self-hosted high-throughput stacks (SageMaker, SGLang) where latency and cost matter. The architecture is modular on purpose: swap in a newer checkpoint or a larger model without rewriting how traffic is served, evaluated, or observed — the difference between a prototype and infrastructure teams can trust.

SageMaker · SGLang · LiteLLM · Bedrock

Hi,

I'm Arvind Narayan

Products & Case Studies

Cohort AI

Yuni

upGrad Shorts

TickerLens

upGrad LMS

Praana Foods

upGrad Lite

Subclarity

Production AI for science, learning, and scale.