Nihal Nayak

One-Page Resume | CV (Updated in Jan 2025)

I am a PostDoc at Harvard University (SEAS) working with David Alvarez-Melis. My research involves identifying the limitations of various components in intelligent systems, such as large language models (LLMs), and improving them with data-centric solutions.

I completed my Ph.D. in Computer Science at Brown University where I worked with Stephen Bach. During my Ph.D., I focused on building zero-shot systems, a class of intelligent systems that generalize to new classes, tasks, and environments without human annotations. I introduced principled approaches to building and evaluating these systems through synthetic datasets (Bonito), composition (CSP, CLIP Binding), and structured knowledge (ZSL-KG).

Email : nnayak [at] seas [dot] harvard [dot] edu

news

Jul 8, 2025	Our work on pre-training foundation models in Academia was accepted to COLM 2025.
Jun 1, 2025	Started a new position as a Postdoctoral Fellow at Harvard University (SEAS).
May 16, 2025	Our work on predicting unobserved drug interactions using graph paths with large language models was accepted to KDD 2025.
Apr 25, 2025	Invited talks at Ai2 (Video), Netflix, and Snowflake on Data-Centric Approaches to Adapting Foundation Models.

selected publications

Findings

Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

Nihal V. Nayak, Yiyang Nan, Avi Trost, and Stephen H. Bach

In Findings of the Association for Computational Linguistics: ACL 2024 2024

Bib PDF Blog Code Project

@inproceedings{bonito:aclfindings24,
  title = {Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation},
  author = {Nayak, Nihal V. and Nan, Yiyang and Trost, Avi and Bach, Stephen H.},
  year = {2024},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2024},
  selected = {true},
  preprint = {false},
  bibtex_show = {true},
  abbr = {Findings},
  pdf = {https://arxiv.org/abs/2402.18334},
  blog = {https://snorkel.ai/how-bonito-helps-fine-tune-specialized-llms-faster-than-ever/},
  code = {https://github.com/BatsResearch/nayak-arxiv24-code},
  project = {https://github.com/BatsResearch/bonito}
}

Findings

Does CLIP Bind Concepts? Probing Compositionality in Large Image Models

Martha Lewis, Nihal V. Nayak, Peilin Yu, Qinan Yu, Jack Merullo, Stephen H. Bach, and Ellie Pavlick

In Findings of the Association for Computational Linguistics: EACL 2024 2024

Bib PDF Code

@inproceedings{lewis:eacl24,
  title = {Does CLIP Bind Concepts? Probing Compositionality in Large Image Models},
  author = {Lewis, Martha and Nayak, Nihal V. and Yu, Peilin and Yu, Qinan and Merullo, Jack and Bach, Stephen H. and Pavlick, Ellie},
  year = {2024},
  booktitle = {Findings of the Association for Computational Linguistics: EACL 2024},
  selected = {true},
  preprint = {false},
  bibtex_show = {true},
  abbr = {Findings},
  pdf = {https://arxiv.org/abs/2212.10537},
  code = {https://github.com/marthaflinderslewis/clip-binding}
}

ICLR

Learning to Compose Soft Prompts for Compositional Zero-Shot Learning

Nihal V. Nayak, Peilin Yu, and Stephen H. Bach

In International Conference on Learning Representations (ICLR) 2023

Bib PDF Code

@inproceedings{csp:iclr23,
  title = {Learning to Compose Soft Prompts for Compositional Zero-Shot Learning},
  author = {Nayak, Nihal V. and Yu, Peilin and Bach, Stephen H.},
  year = {2023},
  booktitle = {International Conference on Learning Representations (ICLR)},
  abbr = {ICLR},
  bibtex_show = {true},
  selected = {true},
  pdf = {https://arxiv.org/abs/2204.03574},
  code = {https://github.com/BatsResearch/csp}
}

TMLR

Zero-Shot Learning with Common Sense Knowledge Graphs

Nihal V. Nayak, and Stephen H. Bach

Transactions on Machine Learning Research 2022

Bib PDF Code Project

@article{nayak:tmlr22,
  title = {Zero-Shot Learning with Common Sense Knowledge Graphs},
  author = {Nayak, Nihal V. and Bach, Stephen H.},
  year = {2022},
  journal = {Transactions on Machine Learning Research},
  pdf = {https://arxiv.org/abs/2006.10713},
  project = {https://github.com/BatsResearch/zsl-kg},
  code = {https://github.com/BatsResearch/nayak-arxiv20-code},
  abbr = {TMLR},
  bibtex_show = {true},
  selected = {true}
}

ICLR

Multitask Prompted Training Enables Zero-Shot Task Generalization

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal V. Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Stella Biderman, Leo Gao, Tali Bers, Thomas Wolf, and Alexander M. Rush

In International Conference on Learning Representations (ICLR) 2022

Bib PDF Code Project

@inproceedings{sanh:iclr22,
  title = {Multitask Prompted Training Enables Zero-Shot Task Generalization},
  author = {Sanh, Victor and Webson, Albert and Raffel, Colin and Bach, Stephen H. and Sutawika, Lintang and Alyafeai, Zaid and Chaffin, Antoine and Stiegler, Arnaud and Scao, Teven Le and Raja, Arun and Dey, Manan and Bari, M Saiful and Xu, Canwen and Thakker, Urmish and Sharma, Shanya Sharma and Szczechla, Eliza and Kim, Taewoon and Chhablani, Gunjan and Nayak, Nihal V. and Datta, Debajyoti and Chang, Jonathan and Jiang, Mike Tian-Jian and Wang, Han and Manica, Matteo and Shen, Sheng and Yong, Zheng Xin and Pandey, Harshit and Bawden, Rachel and Wang, Thomas and Neeraj, Trishala and Rozen, Jos and Sharma, Abheesht and Santilli, Andrea and Fevry, Thibault and Fries, Jason Alan and Teehan, Ryan and Biderman, Stella and Gao, Leo and Bers, Tali and Wolf, Thomas and Rush, Alexander M.},
  year = {2022},
  booktitle = {International Conference on Learning Representations (ICLR)},
  pdf = {https://arxiv.org/abs/2110.08207},
  project = {https://github.com/bigscience-workshop/promptsource},
  code = {https://github.com/bigscience-workshop/t-zero},
  abbr = {ICLR},
  bibtex_show = {true},
  selected = {true}
}