Hybrid-Mode Workshop at COLING 2022 at Gyeongju, Republic of Korea, October 17th, 2022
The CreativeSumm 2022 shared task is divided into four sub-tasks, namely:
The training data for each sub-task comes from existing, well-established datasets, but for 3 of 4 datasets we will provide new, unseen test inputs for evaluation (see below).
This dataset pairs chapters of novels released as part of Project Gutenberg with corresponding summaries. For this shared task, we provide the novel chapters here (add link to new repo for this workshop). We unfortunately cannot provide the summaries, as the study guide websites are copyrighted. The provided data, however, for each novel chapter, does have a link to the page where the summary text may be found.
Please see the associated papers Ladhak et al., 2020 and Kryściński et al., 2021 for more information on how they collected the summaries.
For the novel chapter summarization task, please do not use the test splits (of either NovelChapter or BookSum) for either training or development. We will use the test set of BookSum as part of the final evaluation. We may or may not provide new, unseen test inputs for the final evaluation as well.
This dataset pairs movie transcripts with their corresponding Wikipedia summaries.
This dataset pairs TV transcripts from primetime shows with their corresponding Wikipedia summaries. The data may be downloaded from the shared task repository.
This dataset pairs soap opera transcripts with summaries written by TV Megasite contributors. The data may be downloaded from the shared task repository.
Submitted summaries will be evaluated using standard automatic evaluation metrics, including ROUGE, METEOR, and pretrained automatic metrics.
All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).
We have given more time for system summaries and submissions of system output. Your system output will be scored by the organizing team software and scores will be returned to you by July 25th. That gives you a week to write your paper.
Divyansh Agarwal, divyansh.agarwal@salesforce.com
Bryan Li, bryanli@seas.upenn.edu
Sam Wiseman, swiseman@cs.duke.edu
Mingda Chen, Zewei Chu, Sam Wiseman, Kevin Gimpel. 2022. SummScreen: A Dataset for Abstractive Screenplay Summarization. In ACL.
Philip John Gorinski, Mirella Lapata. 2015. Movie Script Summarization as Graph-based Scene Extraction. In NAACL.
Wojciech Kryściński, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, Dragomir Radev. 2021. BookSum: A Collection of Datasets for Long-form Narrative Summarization.
Faisal Ladhak, Bryan Li, Yaser Al-Onaizan, Kathlen McKeown. 2020. Exploring Content Selection in Summarization of Novel Chapters. In ACL.