QALD is an open challenge that aims at all systems that is mediate between a user (expressing his or her information need in natural language) and structured (in particular RDF) data. Our goal is to get a picture of the strengths and shortcomings of state-of-the-art systems, as well as to gain insight into how we can develop approaches that deal with the fact that the amount of RDF data available is huge, that this data is distributed between different datasets, and that it is heterogeneous, noisy and sometimes even inconsistent.

 

Task

The general task for participants is the following one: Given one or several RDF dataset(s) and natural language questions, return the correct answers or a SPARQL query that retrieves these answers.

 

Benchmark

Since QALD-3 the question sets for multilingual question answering over DBpedia have a DOI. Please use this when citing the data.

QALD-3: doi:10.4119/unibi/citec.2013.6

  • Reference: DBpedia 3.8
  • Languages: English, German, Spanish, Italian, French, Dutch
  • 200 questions (train: 1-100, test: 101-200)

QALD-4: doi:10.4119/unibi/2687439

  • Reference: DBpedia 3.9
  • Languages: English, German, Spanish, Italian, French, Dutch, Romanian
  • 250 questions (train: 1-200, test: 201-250)

QALD-5: doi:10.4119/unibi/2900686

  • Reference: DBpedia 2014
  • Languages: English, German, Spanish, Italian, French, Dutch, Romanian
  • 420 questions (multilingual QA - train: 1-340, test: 341-390, hybrid QA - train: 391-410, test: 411-420)