Best Paper Awards

Best Paper Awards

  • Mission: Impossible Language Models
    Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
  • Semisupervised Neural Proto-Language Reconstruction
    Liang Lu, Peirong Xie, David R Mortensen
  • Why are Sensitive Functions Hard for Transformers?
    Michael Hahn, Mark Rofin
  • Natural Language Satisfiability: Exploring the Problem Distribution and Evaluating Transformer-based Language Models
    Tharindu Madusanka, Ian Pratt-Hartmann, Riza Batista-Navarro
  • Deciphering Oracle Bone Language with Diffusion Models
    Haisu Guan, Huanxin Yang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu
  • Causal Estimation of Memorisation Profiles
    Pietro Lesci, Clara Meister, Thomas Hofmann, Andreas Vlachos, Tiago Pimentel
  • Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
    Ahmet Üstün, Viraat Aryabumi, Zheng Xin Yong, Wei-Yin Ko, Daniel D’souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, Sara Hooker

Best Social Impact Paper Awards

  • How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
    Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, Weiyan Shi
  • DIALECTBENCH: An NLP Benchmark for Dialects, Varieties, and Closely-Related Languages
    Fahim Faisal, Orevaoghene Ahia, Aarohi Srivastava, Kabir Ahuja, David Chiang, Yulia Tsvetkov, Antonios Anastasopoulos
  • Having Beer after Prayer? Measuring Cultural Bias in Large Language Models”
    Tarek Naous, Michael J Ryan, Alan Ritter, Wei Xu

Best Resource Paper Awards

  • Latxa: An Open Language Model and Evaluation Suite for Basque
    Julen Etxaniz, Oscar Sainz, Naiara Perez Miguel, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa
  • Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
    Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Evan Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo
  • AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
    Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian

Best Theme Paper Awards

  • OLMo: Accelerating the Science of Language Models
    Dirk Groeneveld, Iz Beltagy, Evan Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, William H. Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi

Outstanding Papers

  • Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
    Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Qing Li, Yong Jiang, Zhihao Jia
  • L-Eval: Instituting Standardized Evaluation for Long Context Language Models
    Chenxin An;Shansan Gong;Ming Zhong;Xingjian Zhao;Mukai Li;Jun Zhang;Lingpeng Kong;Xipeng Qiu
  • Causal-Guided Active Learning for Debiasing Large Language Models
    Zhouhao Sun;Li Du;Xiao Ding;Yixuan Ma;Yang Zhao;Kaitao Qiu;Ting Liu;Bing Qin
  • CausalGym: Benchmarking causal interpretability methods on linguistic tasks
    Aryaman Arora;Dan Jurafsky;Christopher Potts
  • Don’t Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
    Shangbin Feng;Weijia Shi;Yike Wang;Wenxuan Ding;Vidhisha Balachandran;Yulia Tsvetkov
  • Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
    Marco Gaido;Sara Papi;Matteo Negri;Luisa Bentivogli
  • Must NLP be Extractive?
    Steven Bird
  • IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators
    Indraneil Paul;Goran Glavaš;Iryna Gurevych
  • MultiLegalPile: A 689GB Multilingual Legal Corpus
    Joel Niklaus, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho
  • PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
    Zaibin Zhang, Yongting Zhang, Lijun Li, Jing Shao, Hongzhi Gao, Yu Qiao, Lijun Wang, Huchuan Lu, Feng Zhao
  • Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation
    Dongjin Kang, Sunghwan Kim, Taeyoon Kwon, Seungjun Moon, Hyunsouk Cho, Youngjae Yu, Dongha Lee, Jinyoung Yeo
  • Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
    Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schuetze, Dirk Hovy
  • Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
    Mosh Levy, Alon Jacoby, Yoav Goldberg
  • Do Llamas Work in English? On the Latent Language of Multilingual Transformers
    Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West
  • Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models
    Zachary Horvitz, Jingru Chen, Rahul Aditya, Harshvardhan Srivastava, Robert West, Zhou Yu, Kathleen McKeown
  • Estimating the Level of Dialectness Predicts Inter-annotator Agreement in Multi-dialect Arabic Datasets
    Amr Keleg, Walid Magdy, Sharon Goldwater
  • G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation
    Xingyuan Pan, Luyang Huang, Liyan Kang, Zhicheng Liu, Yu Lu, Shanbo Cheng
  • Media Framing: A typology and Survey of Computational Approaches Across Disciplines
    Yulia Otmakhova, Shima Khanehzar, Lea Frermann
  • SPZ: A Semantic Perturbation-based Data Augmentation Method with Zonal-Mixing for Alzheimer’s Disease Detection
    FangFang Li;Cheng Huang;PuZhen Su;Jie Yin
  • Greed is All You Need: An Evaluation of Tokenizer Inference Methods
    Omri Uzan, Craig W Schmidt, Chris Tanner, Yuval Pinter
  • Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t
    Chihiro Taguchi, David Chiang
  • Steering Llama 2 via Contrastive Activation Addition
    Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan J Hubinger, Alexander Matt Turner
  • EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities
    Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao
  • M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
    Wai-Chung Kwan, Xingshan Zeng, Yufei Wang, Yusen Sun, Liangyou Li, Yuxin Jiang, Lifeng Shang, Qun Liu, Kam-Fai Wong
  • CHECKWHY: Causal Fact Verification via Argument Structure
    Jiasheng Si;Yibo Zhao;Yingjie Zhu;Haiyang Zhu;Wenpeng Lu;Deyu Zhou
  • On Efficient and Statistical Quality Estimation for Data Annotation
    Jan-Christoph Klie, Juan Haladjian, Marc Kirchner, Rahul Nair
  • Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
    Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao
  • IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
    Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad B, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M Khapra
  • MultiPICo: Multilingual Perspectivist Irony Corpus
    Silvia Casola, Simona Frenda, Soda Marem Lo, Erhan Sezerer, Antonio Uva, Valerio Basile, Cristina Bosco, Alessandro Pedrani, Chiara Rubagotti, Viviana Patti, Davide Bernardi
  • MMToM-QA: Multimodal Theory of Mind Question Answering
    Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu
  • MAP’s not dead yet: Uncovering true language model modes by conditioning away degeneracy
    Davis Yoshida, Kartik Goyal, Kevin Gimpel
  • NounAtlas: Filling the Gap in Nominal Semantic Role Labeling
    Roberto Navigli, Marco Lo Pinto, Pasquale Silvestri, Dennis Rotondi, Simone Ciciliano, Alessandro Scirè
  • The Earth is Flat because…: Investigating LLMs’ Belief towards Misinformation via Persuasive Conversation
    Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei Xu, Han Qiu
  • Let’s Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
    Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeonghun Yeo, Yong Man Ro
  • Word Embeddings Are Steers for Language Models
    Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek F. Abdelzaher, Heng Ji

SAC Awards

  • Deciphering Oracle Bone Language with Diffusion Models
    Computational Social Science and Cultural Analytics
    Haisu Guan, Huanxin Yang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu
  • Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understanding of Discourse Relations
    Discourse and Pragmatics
    Yisong Miao, Hongfu Liu, Wenqiang Lei, Nancy F. Chen, Min-Yen Kan
  • RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models via Romanization
    Efficient/Low-Resource Methods for NLP
    Jaavid Aktar Husain J, Raj Dabre, Aswanth Kumar M, Jay Gala, Thanmay Jayakumar, Ratish Puduppully, Anoop Kunchukuttan
  • Steering Llama 2 via Contrastive Activation Addition
    Ethics, Bias, and Fairness
    Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan J Hubinger, Alexander Matt Turner
  • MAP’s not dead yet: Uncovering true language model modes by conditioning away degeneracy
    Generation
    Davis Yoshida, Kartik Goyal, Kevin Gimpel
  • Spiral of Silence: How is Large Language Model Killing Information Retrieval?—A Case Study on Open Domain Question Answering
    Information Retrieval and Text Mining
    Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, Yingfei Sun
  • CausalGym: Benchmarking causal interpretability methods on linguistic tasks
    Interpretability and Analysis of Models for NLP
    Aryaman Arora, Dan Jurafsky, Christopher Potts
  • COKE: A Cognitive Knowledge Graph for Machine Theory of Mind
    Linguistic theories, Cognitive Modeling and Psycholinguistics
    Jincenzi Wu, Zhuang Chen, Jiawen Deng, Sahand Sabour, Helen M. Meng, Minlie Huang
  • Why are Sensitive Functions Hard for Transformers?
    Machine Learning for NLP
    Michael Hahn, Mark Rofin
  • Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
    Machine Translation
    Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli
  • AI ‘News’ Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian
    Multilinguality and Language Diversity
    Giovanni Puccetti, Anna Rogers, Chiara Alzetta, Felice Dell’Orletta, Andrea Esuli
  • CaMML: Context-Aware Multimodal Learner for Large Models
    Multimodality and Language Grounding to Vision, Robotics and Beyond
    Yixin Chen, Shuai Zhang, Boran Han, Tong He, Bo Li
  • Greed is All You Need: An Evaluation of Tokenizer Inference Methods
    Phonology, Morphology and Word Segmentation
    Omri Uzan, Craig W Schmidt, Chris Tanner, Yuval Pinter
  • Don’t Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
    Question Answering
    Shangbin Feng, Weijia Shi, Yike Wang, Wenxuan Ding, Vidhisha Balachandran, Yulia Tsvetkov
  • VariErr NLI: Separating Annotation Error from Human Label Variation
    Resources and Evaluation
    Leon Weber-Genzel, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank
  • Distributional Inclusion Hypothesis and Quantifications: Probing for Hypernymy in Functional Distributional Semantics
    Semantics: Lexical
    Chun Hei Lo, Wai Lam, Hong Cheng, Guy Emerson
  • MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning
    Semantics: Sentence-level Semantics, Textual Inference and Other areas
    Inderjeet Jayakumar Nair, Lu Wang
  • CHECKWHY: Causal Fact Verification via Argument Structure
    Sentiment Analysis, Stylistic Analysis, and Argument Mining
    Jiasheng Si, Yibo Zhao, Yingjie Zhu, Haiyang Zhu, Wenpeng Lu, Deyu Zhou
  • Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t
    Speech recognition, text-to-speech and spoken language understanding
    Chihiro Taguchi, David Chiang
  • COSMIC: Mutual Information for Task-Agnostic Summarization Evaluation
    Summarization
    Maxime DARRIN, Philippe Formont, Jackie CK Cheung, Pablo Piantanida
  • Tree-Averaging Algorithms for Ensemble-Based Unsupervised Discontinuous Constituency Parsing
    Syntax: Tagging, Chunking and Parsing
    Behzad Shayegh, Yuqiao Wen, Lili Mou