Tesi etd-02062013-153523

Tipo di tesi

Tesi di dottorato di ricerca

Autore

ROTA, SILVIA

URN

etd-02062013-153523

Titolo

Keyword Search su Database Relazionali: un approccio basato su Hidden Markov Models

Titolo in inglese

Keyword Search over Relational Databases: a Hidden Markov Model approach

Settore scientifico disciplinare

ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI

Corso di studi

Scuola di D.R. in INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT)

Commissione

Nome Commissario	Qualifica
BERGAMASCHI SONIA	Primo relatore
GUERRA FRANCESCO	Correlatore
VITETTA GIORGIO MATTEO	Direttore Scuola di Dottorato

Parole chiave

Database
HMM
Metadata
Retrieval
Viterbi

Data inizio appello

2013-03-11

Disponibilità

Accessibile via web (tutti i file della tesi sono accessibili)

Riassunto analitico

La ricerca per keywords su database è diventata popolare in quanto permette ad utenti non esperti di formulare query senza la necessità di imparare un linguaggio strutturato e capire come i dati siano rappresentati all'interno di un database. Le tecniche attuali si basano sull'analizzare a priori il contenuto del database e sulla creazione di un indice che viene poi utilizzato in fase di esecuzione per identificare e recuperare le informazioni richieste dall'utente.
Questa tesi si concentra su un nuovo approccio che non assume alcuna conoscenza sull'istanza del database. Lo schema del database, o qualsiasi metadato che descriva la struttura del database, viene utilizzato per estrarne la semantica e per scoprire il significato delle query inserite dagli utenti. Le query vengono poi confrontate con la struttura del database, e combinate insieme per formulare una query strutturata. Poiché le query sono intrinsecamente ambigue, il processo di corrispondenza delle query con gli elementi del database, che è il nucleo dell'approccio, è probabilistico e prevede l'utilizzo di Hidden Markov Models.
Due diverse tecniche per utilizzare gli Hidden Markov Models sono presentate in questa tesi. Il primo approccio prevede l'utilizzo di regole euristiche per inizializzare i parametri del modello. Il secondo approccio, più sofisticato, presenta un nuovo algoritmo chiamato List Viterbi training algorithm, che è una versione del noto algoritmo Expectation Maximization, che utilizza l'algoritmo List Viterbi durante la fase di Expectation al posto dell'algoritmo forward-backward comunemente utilizzato.
Infine, questa tesi presenta un prototipo di motore di ricerca nominato KEYRY (dalle KEYwords alle queRY) che implementa le tecniche esposte. Il prototipo è stato utilizzato per testare le prestazioni degli algoritmi su database ben noti alla comunità scientifica.

Abstract

Keyword search over databases has become popular since it allows non-expert users to formulate queries without the need to learn a structured query language and understand how the data is represented inside a database. The existing techniques for keyword search over structured sources are based on a-priori instance-analysis that scans the database instance and constructs an index, or a similar structure, which is used at run-time to easily identify and retrieve the information requested by the user. This thesis focuses on a new approach that does not assume any knowledge of the database instance. In this approach the database schema, or any available metadata describing the database structure, is used to extract semantics and to discover the intended meaning of the keyword queries inserted by the users. The keyword queries are then matched against the underlying database structures, and combined together to form a structured query. Since keyword queries are intrinsically ambiguous, the matching process, which is the core of the approach, is probabilistic and involves the use of Hidden Markov Models. Two different techniques to employ Hidden Markov Models are presented in this thesis. The first approach involves the use of heuristic rules to initialize the model parameters. The second, more sophisticated approach, presents a new algorithm called List Viterbi training algorithm, which is a version of the well known Expectation-Maximization algorithm, that uses the List Viterbi algorithm during the Expectation step instead of the commonly used forward-backward algorithm. Finally, this thesis presents a prototype search engine named KEYRY (from KEYwords to queRY) that implements the techniques presented. The prototype has been utilized to test the performances of the algorithms on well known databases.

File

Nome file		Dimensione	Tempo di download stimato (Ore:Minuti:Secondi)
Nome file		Dimensione	28.8 Modem	56K Modem	ISDN (64 Kb)	ISDN (128 Kb)	piu' di 128 Kb
	Thesis.pdf	2.41 Mb	00:11:10	00:05:44	00:05:01	00:02:30	00:00:12
Contatta l'autore