Statistical Study on a Literary Romanian Corpus for the Beginning and Ending of the Words.
Details
Serval ID
serval:BIB_5E1C9E528CF8
Type
Inproceedings: an article in a conference proceedings.
Collection
Publications
Institution
Title
Statistical Study on a Literary Romanian Corpus for the Beginning and Ending of the Words.
Title of the conference
IEEE International Conference on Communications (COMM)
Publisher
IEEE
ISBN
978-1-4673-2573-8
Publication state
Published
Issued date
06/2012
Pages
81-84
Language
english
Abstract
The paper attempts to investigate the statistical structure of letters and of letter digrams with which the words begin and end, as well as of trigrams that link two successive words. The investigation is carried out on a printed Romanian literary corpus summing up about 12.5 million words. The impact of the orthography and punctuation marks in the language model assigned to the beginning and to the ending of words is considered.
Web of science
Create date
17/11/2021 16:13
Last modification date
26/07/2023 7:01