Caring about data before it was cool – language data between computational linguistics and real-world applications

Name: Caring about data before it was cool – language data between computational linguistics and real-world applications
Start: 2022-07-19T11:00:00Z
Location: online

Abstract

Computational linguists have cared about data “before it was cool”. In the community of ML/AI practitioners, however, “model work” gets more love than the “data work”. Small and medium business, while not immune to the AI hype, often (1) do not have enough (representative) data for training their machine learning modules (2) lack the in-house expertise and the resources to collect realistic data (3) underestimate the effort needed to prevent data-related issues. I will present recent studies showing the importance of a more data-oriented approach when it comes to use-case specific models. I will discuss how a scarce attention to data has consequences on its quality as well as ethical consequences and argue that a data-centered and user-centered perspective is a missing link when transferring technologies outside academia and into industrial use cases.

Date

Jul 19, 2022

Event

Invited GSCL Talk (Gesellschaft für Sprachtechnologie und Computerlinguistik)

Location

online

Caring about data before it was cool – language data between computational linguistics and real-world applications

Abstract

Alessandra Zarcone

Professor of Language Technologies and Cognitive Assistants