Tags

benchmark
conversational AI
crowdsourcing
data-centric
evaluation
NLU
semantic similarity
human upper bound