Predicting age groups of Twitter users based on language and metadata features.

08:00 EDT 29th August 2017 | BioPortfolio

Summary of "Predicting age groups of Twitter users based on language and metadata features."

Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for informing public health surveillance and evaluation research.


Journal Details

This article was published in the following journal.

Name: PloS one
ISSN: 1932-6203
Pages: e0183537


DeepDyve research library

PubMed Articles [26392 Associated PubMed Articles listed on BioPortfolio]

Exploring online communication about cigarette smoking among Twitter users who self-identify as having schizophrenia.

Novel approaches are needed to address elevated tobacco use among people with schizophrenia. This exploratory study examined the frequency, timing, and type of communication about tobacco-related cont...

Users' participation and social influence during information spreading on Twitter.

Online Social Networks generate a prodigious wealth of real-time information at an incessant rate. In this paper we study the empirical data that crawled from Twitter to describe the topology and info...

A study on real-time low-quality content detection on Twitter from the users' perspective.

Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts u...

Characterizing Blunt Use Among Twitter Users: Racial/Ethnic Differences in Use Patterns and Characteristics.

Young adult Twitter users are exposed to and often participate in tweets that promote risky behaviors, such as blunt use. Blunts are hollowed out cigars or cigarillos that are filled with marijuana.

Building a profile of subjective well-being for social media users.

Subjective well-being includes 'affect' and 'satisfaction with life' (SWL). This study proposes a unified approach to construct a profile of subjective well-being based on social media language in Fac...

Clinical Trials [5820 Associated Clinical Trials listed on BioPortfolio]

Retrospective Study Assessing Molecular Features Predicting Response to Cetuximab

The primary objective is to identify molecular features predicting response or resistance to cetuximab

Effectiveness of Early Parent-Based Language Intervention

The purpose of the study is to examine the effectiveness of a highly-structured parent-based language intervention group program for two-year-old children with language delay.

RCT of Parent-based Intervention for Language Delayed 2 to 3 Year Olds

The aim of the study is to evaluate the impact of parent based intervention on the language of 2 to 3 year old children from socially disadvantaged populations with a clinical diagnosis of...

Language Therapy in British Sign Language

In the United Kingdom, the language of the Deaf community is British Sign Language (BSL). A small proportion of Deaf young people who use BSL as their first or dominant language have speci...

The Use of a Language Toolkit for Toddlers

To investigate whether young children with isolated expressive language delay benefit from early intervention with a simple language toolkit and brief instructions provided to their caregi...

Medical and Biotech [MESH] Definitions

Organized groups of users of goods and services.

Tests designed to assess language behavior and abilities. They include tests of vocabulary, comprehension, grammar and functional use of language, e.g., Development Sentence Scoring, Receptive-Expressive Emergent Language Scale, Parsons Language Sample, Utah Test of Language Development, Michigan Language Inventory and Verbal Language Development Scale, Illinois Test of Psycholinguistic Abilities, Northwestern Syntax Screening Test, Peabody Picture Vocabulary Test, Ammons Full-Range Picture Vocabulary Test, and Assessment of Children's Language Comprehension.

A cognitive disorder marked by an impaired ability to comprehend or express language in its written or spoken form. This condition is caused by diseases which affect the language areas of the dominant hemisphere. Clinical features are used to classify the various subtypes of this condition. General categories include receptive, expressive, and mixed forms of aphasia.

Rehabilitation of persons with language disorders or training of children with language development disorders.

People who take drugs for a non-therapeutic or non-medical effect. The drugs may be legal or illegal, but their use often results in adverse medical, legal, or social consequences for the users.

Quick Search

DeepDyve research library

Relevant Topics

Arthritis Fibromyalgia Gout Lupus Rheumatic Rheumatology is the medical specialty concerned with the diagnosis and management of disease involving joints, tendons, muscles, ligaments and associated structures (Oxford Medical Diction...

Public Health
Alternative Medicine Cleft Palate Complementary & Alternative Medicine Congenital Diseases Dentistry Ear Nose & Throat Food Safety Geriatrics Healthcare Hearing Medical Devices MRSA Muscular Dyst...

Searches Linking to this Article