The cross-linguistic performance of word segmentation models over time.

08:00 EDT 1st November 2019 | BioPortfolio

Summary of "The cross-linguistic performance of word segmentation models over time."

We select three word segmentation models with psycholinguistic foundations - transitional probabilities, the diphone-based segmenter, and PUDDLE - which track phoneme co-occurrence and positional frequencies in input strings, and in the case of PUDDLE build lexical and diphone inventories. The models are evaluated on caregiver utterances in 132 CHILDES corpora representing 28 languages and 11.9 m words. PUDDLE shows the best performance overall, albeit with wide cross-linguistic variation. We explore the reasons for this variation, fitting regression models to performance scores with linguistic properties which capture lexico-phonological characteristics of the input: word length, utterance length, diversity in the lexicon, the frequency of one-word utterances, the regularity of phoneme patterns at word boundaries, and the distribution of diphones in each language. These properties together explain four-tenths of the observed variation in segmentation performance, a strong outcome and a solid foundation for studying further variables which make the segmentation task difficult.


Journal Details

This article was published in the following journal.

Name: Journal of child language
ISSN: 1469-7602
Pages: 1169-1201


DeepDyve research library

PubMed Articles [25120 Associated PubMed Articles listed on BioPortfolio]

Do visual word segmentation cues improve reading performance in Chinese reading?

It is controversial whether providing visual word segmentation cues can improve Chinese reading performance. This study investigated this topic by examining how visual word segmentation cues such as g...

Words, thoughts, and brains.

Knowledge of the world is fundamental to human thought and ability to navigate the world, and a large literature has accumulated on the neuroscience of semantic memory. Because language serves as one ...

Spotting Dalmatians: Children's ability to discover subordinate-level word meanings cross-situationally.

Even when children encounter a novel word in the situation of a clear and unique referent, they are nevertheless faced with the problem of semantic uncertainty: when "puziv" refers to a co-present spo...

Time-Gated Word Recognition in Children: Effects of Auditory Access, Age, and Semantic Context.

Purpose We employed a time-gated word recognition task to investigate how children who are hard of hearing (CHH) and children with normal hearing (CNH) combine cognitive-linguistic abilities and acous...

Linguistic Contributions to Word-Level Spelling Accuracy in Elementary School Children With and Without Specific Language Impairment.

Purpose Children with specific language impairment (SLI) are more likely than children with typical language (TL) to exhibit difficulties in word-level spelling accuracy. More research is needed to el...

Clinical Trials [8609 Associated Clinical Trials listed on BioPortfolio]

Efficacy of a Combined Linguistic/Communication Therapy in Acute Aphasia After Stroke

Linguistic training is traditionally the gold standard for rehabilitation of aphasia after stroke and efficacy criteria count early stage, intensity as well as personalized treatment. To d...

CBCT Segmentation in Volumetric Measurements of MS Using Different Soft Wares: A Validity and Reliability Study

Cone Beam Computed Tomography (CBCT) has been used to assess the volume of the maxillary sinus using the manual and semi-automatic segmentation. The majority of researches stressed on the ...

Volumetric Laser Endomicroscopy With Intelligent Real-time Image Segmentation (IRIS)

This is a prospective randomized clinical trial examining how IRIS (Intelligent Real-time Image Segmentation) affects biopsy patterns in VLE (Volumetric laser endomicroscopy).

Is There a Pancreatic Segmentation Based on the Pancreatic Duct Branching?

Limited pancreatic resections are increasingly performed, but the rate of postoperative fistula is higher than after classical resections. Pancreatic segmentation, anatomically and radiolo...

Virtual Reality Based Sensorimotor Speech Therapy

The purpose of this study is to determine whether VR based language rehabilitation scenario based on the core premises of ILAT has a beneficial effect on the linguistic performance (faster...

Medical and Biotech [MESH] Definitions

A performance test based on forced MOTOR ACTIVITY on a rotating rod, usually by a rodent. Parameters include the riding time (seconds) or endurance. Test is used to evaluate balance and coordination of the subjects, particular in experimental animal models for neurological disorders and drug effects.

Polyphenolic compounds with molecular weights of around 500-3000 daltons and containing enough hydroxyl groups (1-2 per 100 MW) for effective cross linking of other compounds (ASTRINGENTS). The two main types are HYDROLYZABLE TANNINS and CONDENSED TANNINS. Historically, the term has applied to many compounds and plant extracts able to render skin COLLAGEN impervious to degradation. The word tannin derives from the Celtic word for OAK TREE which was used for leather processing.

Statistical models used in survival analysis that assert that the effect of the study factors on the hazard rate in the study population is multiplicative and does not change over time.

Theoretical representations that simulate the behavior or activity of biological processes or diseases. For disease models in living animals, DISEASE MODELS, ANIMAL is available. Biological models include the use of mathematical equations, computers, and other electronic equipment.

Studies in which the presence or absence of disease or other health-related variables are determined in each member of the study population or in a representative sample at one particular time. This contrasts with LONGITUDINAL STUDIES which are followed over a period of time.

Quick Search

DeepDyve research library

Searches Linking to this Article