Large Language Models


  • LLM Psychometrics
  • LLM Jailbreaking
  • LLM Interpretability
  • LLMs imitate human writing; but how human-like are they?
  • Can we hijack LLM's persona and steer it adversarially using just conversational history? If yes, what could be potential risks in real world?
  • LLM interpretability to study its persona through the lens of neuroscience.

LLM Psychometrics
  • How human-like are LLMs?
  • Context-aware personality evaluation framework
  • Role-playing agents and persona stability via psychometric evaluation
LLM Persona Jailbreaking
  • Black-box persona editing
  • Persona Hijacking via Implicit Steering in History
  • A new vulnerability in LLM impacts education, mental health and customer support
LLM Interpretability
  • How do LLMs get persona?
  • Mechanistic interpretability for synthesizing persona
  • Understanding mechanisms like in-context learning via neuro-inspired analysis

AI for Social Good


  • AI for Healthcare
  • AI for Science
  • AI for Edge-devices
  • Build real world AI applications that solve practical, high-impact problems where utility matters more than flashy demos.
  • Convert recent advances in LLMs into deployable tools that strengthen social infrastructure
  • Improve human-AI collaboration

AI for Healthcare
  • Rural India still face limited access to healthcare.
  • Building assistive AI that scales human doctors without replacing them
  • Features: multilingual intake, symptom logging, clinical note drafting, etc.
AI for Science
  • Conference scale makes review guideline adherence difficult to maintain.
  • How can AI assist reviewers without replacing them?
  • Developing an OpenReview integrated system that supports reviewers.
AI for Edge-devices
  • How to make AI accessible locally for mobile users?
  • Developing multilingual, on-device SLM assistants for tasks like messaging, etc.
  • Efficiency, privacy-preserving personalization for mobile-first communities.

Representation Learning


  • Pretraining
  • Applications:
    • Culture Linguistics
    • Image Caption Learning
  • Learning structured abstractions: Build latent spaces that capture the essential structure of data: syntax, semantics, vision, or multimodal cues.
  • Task-aligned embeddings: Shape representations to encode properties that downstream tasks need, improving generalization and robustness.
  • Cross-domain transfer: Use shared representations to transfer knowledge across languages, modalities, or low-resource settings.

Pretraining
  • Can we design pretraining methods that don’t rely on massive corpora?
  • Crucial, because world’s 7,000+ languages lack data.
  • Yes! Light-weight, linguistically guided task-specific pretraining works.
Cultural Linguistics
  • Can we algorithmically quantify cultural proximity among Indian languages?
  • It guides cross-lingual culture transfer in NLP.
  • Our phono-semantic framework quantifies the cultural distance.
Image Caption Learning
  • Can topic pretraining produce more robust image-caption evaluation metrics?
  • Lexical metrics fail under paraphrasing, multilinguality, and stylistic variation.
  • Solution: TAGSim, a topic-pretrained metric

Sanskrit Computational Linguistics


  • Deep learning
  • Tokenization
  • Compound Identification
  • Dependency Parsing
  • Shloka Recommendation
  • Anvaya Generation
  • Machine Translation
  • Digitized manuscripts remain inaccessible due to language barriers and loss of nuance in standard translation systems.
  • We built deep-learning models that let users assist in classical texts reading.
  • These models power SanskritShala, a web-based neural toolkit that preserves grammatical structure while automating analysis for scholars and learners

Tokenization
  • Sanskrit word segmentation is hard due to sandhi.
  • TransLIST combines linguistic cues with transformers.
  • Achieves strong gains over prior state of the art.
Compound Identification
  • Multi-component compounds have nested semantics.
  • Proposed dependency-based framework.
  • Results: +13.1 F1 gain and 5× faster inference.
Dependency Parsing
  • Which low-resource strategies truly generalize across languages and why?
  • Systematically evaluate 5 low-resource strategies.
  • Proposed model surpasses Sanskrit SOTA parsing.
Shloka Recommendation
  • Readers need related ślokas sharing similar essence.
  • Solution: Interactive śloka recommendation platform.
  • Features: Ranked verses, similarity rationale, and visual verse clusters.
Anvaya Generation
  • Can LLMs Outperform Smaller Seq2Seq Models on Anvaya Task?
  • Compare LLMs with task-specific models.
  • Our fine-tuned ByT5-Sanskrit model outperforms general-purpose LLMs.
Machine Translation
  • Google Translate underperforms on domain-specific Sanskrit texts.
  • We curate multi-domain data, fine-tune LLMs.
  • RAG-integrated, linguistically informed LLMs yield better translations.