Information Sciences and Technology

IST researchers earn best paper award at natural language generation conference

A paper authored Penn State researchers received the Best Long Paper award at the 16th International Natural Language Generation Conference. Pictured, left to right, are co-authors Kenneth Huang, assistant professor of information sciences and technology (IST); Chieh-Yang Huang, a recent graduate of the informatics doctoral program in the College of IST; and Ting-Yao Hsu, a computer science doctoral student in the College of Engineering. Not pictured is co-author C. Lee Giles, David Reese Professor of Information Sciences and Technology. Credit: Jena Soult / Penn StateAll Rights Reserved.

UNIVERSITY PARK, Pa. — A paper authored by students and faculty from the Penn State College of Information Sciences and Technology (IST) received the Best Long Paper award at the 16th International Natural Language Generation (INLG) Conference, which took place Sept. 11-15 in Prague, Czech Republic.

“Only one long paper award was given at this conference, and we were pleased to receive the honor,” said Ting-Hao “Kenneth” Huang, assistant professor of IST. Huang and co-author C. Lee Giles, David Reese Professor of Information Systems and Technology served as faculty co-advisers on the research project.

The paper, “Summaries as Captions: Generating Figure Captions for Scientific Documents With Automated Text Summarization,” was selected based on its originality, impact and contribution to the field of natural language generation, according to the conference website.

“Despite their importance, writing captions for figures in a scientific paper is not often a priority for researchers,” Kenneth Huang said. “A significant portion of the captions we reviewed in our corpus of published papers were terrible. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality.”

In their work, the researchers addressed the limitations of existing natural language processing (NLP) tools that approach automatic caption generation as a vision-to-language task. They aimed to show that using NLP tools to summarize a paper’s textual content would generate better figure captions than those created by the vision-based algorithms.

“A vision-to-language approach creates captions based on the image,” said co-author Ting-Yao Hsu, a computer science doctoral student in the College of Engineering. “We fine-tuned a pre-trained abstractive summarization model to specifically summarize paragraphs that reference figures — for example, ‘as shown in Figure 1’ — into captions.”

A good caption should help readers understand the complexity of a paper’s figures, such as bar charts, line charts or pie charts. Using context from the document’s full text makes sense, according to lead co-author Chieh-Yang Huang, who earned his doctorate in informatics from Penn State this month.

“Scientific papers typically include extensive text that can aid caption generation,” he said. “Our analysis showed that more than 75% of words in figure captions could be aligned with the words in the paragraphs mentioning those figures.”

According to the researchers, automatic evaluation showed summarizing paragraphs that reference a figure resulted in better captions than using vision-based methods. Captions generated by the researchers’ model also performed better than vision-generated captions when evaluated by human external domain experts.

The work began with SCICAP, a large-scale figure/caption dataset previously developed by the researchers. SCICAP contains more than 416,000 line charts and captions extracted from more than 290,000 computer science papers.

“SCICAP does not contain the paragraphs that mention the figure,” Kenneth Huang said. “To address this, we downloaded the original scientific papers and extracted all the mentions and associated paragraphs that referenced the papers’ figures.”

The researchers used a summarization model, fine-tuned to the SCICAP dataset, to automatically generate figure captions. The work took two challenges into consideration: the common presence of low-quality, author-written captions and the lack of clear standards for what a good caption should be.

“More than half of the author-written captions in our sample were deemed unhelpful,” Hsu said. “This has implications for the design of future captioning systems and underscores the influence of data quality on captioning performance.”

This study lays the groundwork for further research, according to the researchers.

“Technology-wise, we are not presenting anything new but rather taking a new direction with existing tools to solve a problem motivated by human use,” Kenneth Huang said. “Effective caption generators will ultimately benefit both writers and readers of scientific papers, ideally across all disciplines.”

A seed grant from the College of IST and gift funds from Adobe Research supported this work. Collaborators from Adobe Research included Gromit Chan, Sungchul Kim, Eunyee Koh, Ani Nenkova and Ryan Rossi.

Last Updated October 18, 2023

Contact