Project CommuniData – Open Data for Local Communities

Congratulations to Svitlana Vakulenko with the paper & talk at ISWC 2018!

Svitlana Vakulenko, presented a paper that she wrote together with Michael Coches, Maarten de Riijke, Axel Polleres and Vadim Savenkov at the International Semantic Web Conference 2018 held in October 2018 in Monterey, California. Here is an excerpt from Svitlana’s blog:

Imagine: you sit in a train, in which many different conversations are going on at the same time. A couple next to the window is planning their honey-moon trip; the girls at the table are discussing their homework; a granny is in a call with her great-grandson. You close your eyes and try to recover who is talking to whom by paying attention to the content of the conversations and not to the origin of the sound waves:

— Mhm .. and then we add the two numbers from the Pythagoras’ equation?
— … but I am not quite sure about that hotel you booked on-line… Don’t you think we should stick with the one we found on Airbnb?
—  All right, sweetheart, kiss your mum for me! I will be back before the Disney movie starts, I promise.
—  I think it is the one we did on the blackboard on Monday or was it the one with Euclidean distance?
—  For me both options are really fine as long as it is on Bali.

It is relatively easy to tell the three conversations apart. We hypothesize that this is due to certain semantic relations between utterances from the same dialogue that make it meaningful, or coherent, which brought us to the following set of questions:

What are the relations between the words in a dialogue (or rather the concepts they represent) that make a dialogue semantically coherent, i.e. making sense? and
Can we use available knowledge resources (e.g. a knowledge graph) to tell whether a dialogue makes sense?

The later is particularly important for dialogue systems that need to correctly interpret the dialogue context and produce meaningful responses.

Illustration by zvisno(c)

To study these two questions we cast the semantic coherence measurement task as a classification problem. The objective is to learn to distinguish real (mostly coherent) dialogues from artificially generated dialogues, which were made incoherent by design. Intuitively, the classifier is trained to assign a higher score to the coherent dialogues and a lower score to the incoherent (corrupted) dialogues, so that the output score reflects the degree of coherence in the dialogue.

We extended the Ubuntu Dialogue Corpus, which is a large dialogue dataset containing almost 2M dialogues extracted from IRC (public chat) logs, with generated negative samples to provide an evaluation benchmark for the coherence measurement task. We came up with 5 different ways to generate negative samples, i.e. incoherent dialogues by a) sampling the vocabulary (1. uniformly at random; 2. according to the corpus-specific distribution) and b) permutations of the original dialogues (3. shuffling the sequence of entities, or combining two different dialogues via 4. horizontal and 5. vertical splits).

We also implement and evaluate three different approaches on this benchmark.
Two of them are based on a neural network classifier (Convolutional Neural Network) using word or, alternatively, Knowledge Graph embeddings; and the third approach is using the original Knowledge Graph (Wikidata+DBpedia converted to HDT) to induce a semantic subgraph representation for each of the dialogues.

Read the full story in Svitlana’s blog!

Paper Geo-Semantic Labeling of Open Data at SEMANTiCS 2018!

The paper Geo-Semantic Labeling of Open Data by Sebastian Neumaier, Axel Polleres, and Vadim Savenkov was presented at the SEMANTiCS 2018 conference, which was held in Vienna, on 10-13 September 2018.

The paper describes the architecture of geo-labeling used in the dataset search implementation.

Profiles & Data:Search Workshop at TheWebConf 2018

Vadim Savenkov became a co-organizer of the International Workshop on Profiling and Searching Data on the Web, co-located with TheWebConf’2018 (formerly known as WWW Conference) which took place in Lyon, France.

The workshop attracted four full and two short submissions on different aspects of web data management directly (see the proceedings), and included two great keynote talks by Maarten de Rijke and and Aidan Hogan followed by a panel on Data Search with Paul Groth, Aidan Hogan, Jeni Tennison,Stefan Dietze and Natasha Noy.

Best Paper Award in the Societal Challenges Category at KESW 2017

The paper Ontology for Representing Human Needs by Soheil Human, Florian Kragulj, Florian Fahrenbach and Vadim Savenkov received a Best Paper award in the category Societal Challenges at KESW 2017, the Knowledge Engineering and Semantic Web conference, held in November 2017 in Szczecin, Poland.

The paper describes the new ontology for representing human needs, and a need analysis experiment was conducted as part of the project pilot Expedition Stuwerviertel (description in German) with the help of Bewextra methodology.

Position paper on Conversational Search

Svitlana Vakulenko presented the vision of conversational exploratory search, developed jointly with Ilya Markov, and Maarten de Rijke at the Search-Oriented Conversational AI 2018 Workshop, co-located with EMNLP 2018 in Amsterdam.

This is a position paper discussing the research problems and possible connections for the novel search modality combining traditional search requests with knowledge and data exploration, to be used in chatbot assistants.

The full text Conversational exploratory search via interactive storytelling is available on arXiV.