Riassunto analitico
Since 2004, the BootCaT software is being developed to help linguists quickly build disposable corpora for translation, terminological databases and machine-learning tasks via automatic web pages’ collection based on user-defined keywords. The present work attempts to utilize the software for the creation of comparable and diachronic web-corpora of different languages (English, German and Italian). It reports how the standard BootCat procedure has been adapted and integrated for this purpose and discusses the quality and usefulness of the obtained corpora. Taking these results into consideration, it recommends adjustments for future research attempts in this direction and hypothesizes ideal developments of linguistics in the automation of text collection and analysis.
|