Tesi etd-03122025-185313

Tipo di tesi

Tesi di laurea magistrale

Autore

COLOCA, VALERIA

URN

etd-03122025-185313

Titolo

Synthesizing hard matching records with LLMs

Titolo in inglese

Struttura

Dipartimento di Ingegneria

Corso di studi

Ingegneria informatica

Commissione

Nome Commissario	Qualifica
GUERRA FRANCESCO	Primo relatore
PAGANELLI MATTEO	Correlatore

Parole chiave

Ditto
ENTITY MATCHING
LLMs
Match Forti
top-k Retrieval

Data inizio appello

2025-04-14

Disponibilità

Accesso limitato: si può decidere quali file della tesi rendere accessibili. Disponibilità mixed (scegli questa opzione se vuoi rendere inaccessibili tutti i file della tesi o parte di essi)

Data di rilascio

2065-04-14

Riassunto analitico

Entity Matching (EM) aims to identify records referring to the same real-world entity, a critical task in data integration. Deep learning models have demonstrated superior performance over traditional rule-based approaches, particularly in handling textual and noisy data. However, these models require extensive labeled training data, limiting their practical applicability.
To address this challenge, this thesis investigates the use of Large Language Models (LLMs) for generating high-quality, hard-to-match synthetic records in a zero-shot setting. The proposed approach employs an LLM to produce alternative entity descriptions, followed by a two-step verification process: an LLM-based classifier ensures semantic consistency, while a retriever component evaluates distinctiveness. If the generated record fails either criterion, the pipeline iteratively refines the output based on feedback from previous attempts.
We experiment with open-source LLMs and evaluate the impact of generated matching pairs on EM model generalization across multiple benchmark datasets.
Our findings reveal that training with LLM-augmented data yields competitive and sometimes even superior results compared to training with original data.

Abstract

File

Nome file	Dimensione	Tempo di download stimato (Ore:Minuti:Secondi)
Nome file	Dimensione	28.8 Modem	56K Modem	ISDN (64 Kb)	ISDN (128 Kb)	piu' di 128 Kb
Ci sono 1 file riservati su richiesta dell'autore.
Contatta l'autore