Tesi etd-03092025-105811

Tipo di tesi

Tesi di laurea magistrale

Autore

CRETA, MARCO

URN

etd-03092025-105811

Titolo

Benchmarking LLM based systems: a performance evaluation toolkit for industrial applications

Titolo in inglese

Struttura

Dipartimento di Ingegneria

Corso di studi

Ingegneria informatica

Commissione

Nome Commissario	Qualifica
CALDERARA SIMONE	Primo relatore
MONTI ALESSIO	Correlatore

Parole chiave

Evaluation Toolkit
LLM Benchmarking
LLM Evaluation
LLM In Industry
Performance Metrics

Data inizio appello

2025-04-14

Disponibilità

Embargo di 3 anni

Data di rilascio

2028-04-14

Riassunto analitico

With the growing adoption of Large Language Models (LLMs) in industrial applications, the need for reliable evaluation methodologies has become crucial. This thesis explores the key metrics, tools, and tasks involved in assessing the performance of LLM-based systems. The evaluation spans multiple dimensions, such as natural language generation (NLG) on both technical requirements (accuracy, calibration, robustness) and social requirements (fairness, bias, toxicity), and some more advanced tasks like retrieval-augmented generation (RAG). Traditional metrics, whose performances are measured as Pearson correlation with human-written references are reviewed but also newer LLM-as-a-judge are explored. In addition, a benchmarking toolkit has been developed to facilitate the systematic evaluation of LLM performance across various tasks. This toolkit integrates multiple evaluation methodologies, enabling users to test, compare, and analyze LLM-based systems efficiently. The findings of this study provide insights into best practices for LLM evaluation, aiming to enhance their reliability and applicability in industrial contexts.

Abstract

File

Nome file	Dimensione	Tempo di download stimato (Ore:Minuti:Secondi)
Nome file	Dimensione	28.8 Modem	56K Modem	ISDN (64 Kb)	ISDN (128 Kb)	piu' di 128 Kb
Ci sono 1 file riservati su richiesta dell'autore.
Contatta l'autore