Tesi etd-03122021-104324

Tipo di tesi

Tesi di laurea magistrale

Autore

CAVECCHIA, MIRKO

URN

etd-03122021-104324

Titolo

Studio di tecniche di Word Embedding per Dati Strutturati finalizzato al Keyword Search

Titolo in inglese

Study of Word Embedding techniques for supporting Keyword Search in Structured Data

Struttura

Dipartimento di Ingegneria

Corso di studi

Ingegneria Informatica (D.M.270/04)

Commissione

Nome Commissario	Qualifica
GUERRA FRANCESCO	Primo relatore
DEL BUONO FRANCESCO	Correlatore
PAGANELLI MATTEO	Correlatore

Parole chiave

Evalutation
Keyword Search
NLP
Relational Database
Word Embedding

Data inizio appello

2021-04-15

Disponibilità

Accessibile via web (tutti i file della tesi sono accessibili)

Riassunto analitico

Una delle principali sfide degli ultimi decenni è quella di capire come interrogare database relazionali senza fare uso di query complesse. Infatti, l’utente, nella maggior parte dei casi, non conosce alcuna informazione sullo schema dei dati relazionali, né tantomeno la sintassi complessa di un linguaggio standardizzato come l’SQL. Per questo motivo, la comunità scientifica ha proposto diverse tecniche di keyword search su dati strutturati, esattamente come avviene nell’ambito della ricerca sul Web.
Leggendo attentamente i vari report scientifici, sembra che finora nessuno abbia utilizzato una rappresentazione distribuita di parole per eseguire keyword search su database. Infatti, molti sistemi sviluppati si basano sulla creazione di grafi a partire dallo schema relazionale o a partire dai dati, ma nessuno, al meglio delle nostre conoscenze, impiega embedding per tradurre ciascun attributo e ciascun token dei database. Pertanto, il contributo principale di questa tesi è quello di impiegare diverse tecniche di word embedding su dati strutturati per creare degli embedding locali efficaci e valutare queste rappresentazioni vettoriali nel recuperare risultati rilevanti a partire da una query dell’utente.

Abstract

One of the main challenges of the last few decades has been to understand how to query relational databases without using complex queries. Indeed, the user, in most cases, does not know any information about the relational data schema, nor the complex syntax of a standardized language such as SQL. For this reason, the scientific community has proposed various keyword search techniques on structured data, exactly as it happens in the context of Web searching. By carefully reading many scientific reports, it seems that so far no one has used a distributed representation of words to perform keyword search on databases. In fact, many developed systems are based on the creation of graphs starting from the relational schema and from data, but none, to the best of our knowledge, uses embeddings to translate each attribute and each token of databases. Therefore, the main contribution of this thesis is to employ different embedding techniques on structured data to create effective local embeddings and evaluate these vector representations in retrieving relevant results from a user query.

File

Nome file		Dimensione	Tempo di download stimato (Ore:Minuti:Secondi)
Nome file		Dimensione	28.8 Modem	56K Modem	ISDN (64 Kb)	ISDN (128 Kb)	piu' di 128 Kb
	Master_Thesis_Mirko_Cavecchia.pdf	3.36 Mb	00:15:34	00:08:00	00:07:00	00:03:30	00:00:17
Contatta l'autore