Bibliographie complète
Historic UK Newspaper Datasets
Type de ressource
Page Web
Titre
Historic UK Newspaper Datasets
Résumé
This dataset contains text extracted at the article level from historic digitised newspapers from the Heritage Made Digital newspaper digitisation program at the British Library. The newspapers in the dataset were published between 1800 and 1896. This dataset contains ~2.5 billion tokens and 3,065,408 articles.
The dataset contains text generated from Optical Character Recognition software on digitised newspaper pages. This dataset includes the plain text from the OCR alongside some minimal metadata associated with the newspaper from which the text is derived and OCR confidence score information generated from the OCR software.
Date
2024-01-30
Consulté le
30/01/2024 09:47
Référence
Historic UK Newspaper Datasets. (2024, 30 janvier). https://huggingface.co/datasets/biglam/hmd_newspapers
Périodes
Lieux
Accès
Lien vers cette notice