Caring about data before it was cool – language data between computational linguistics and real-world applications

Abstract

Computational linguists have cared about data “before it was cool”. In the community of ML/AI practitioners, however, “model work” gets more love than the “data work”. Small and medium business, while not immune to the AI hype, often (1) do not have enough (representative) data for training their machine learning modules (2) lack the in-house expertise and the resources to collect realistic data (3) underestimate the effort needed to prevent data-related issues. I will present recent studies showing the importance of a more data-oriented approach when it comes to use-case specific models. I will discuss how a scarce attention to data has consequences on its quality as well as ethical consequences and argue that a data-centered and user-centered perspective is a missing link when transferring technologies outside academia and into industrial use cases.

Date
Jul 19, 2022
Location
online
Alessandra Zarcone
Alessandra Zarcone
Professor of Language Technologies and Cognitive Assistants

Computational linguist with a background in NLP and in psycholinguistics, working on AI, NLP and human-machine interaction.