This page contains resources to enable other researchers to utilize social media data for public health.


We post social media trends to Check out this tutorial of how to use that site.


We publish open source software to support social media analysis.

Carmen [Java, Python]
Carmen is a library for geolocating tweets. Given a tweet, Carmen will return Location objects that represent a physical location. Carmen uses both coordinates and other information in a tweet to make geolocation decisions. It’s not perfect, but this greatly increases the number of geolocated tweets over what Twitter provides.

The Python and Java versions don’t give exactly the same results due to differences in the dependencies. Going forward, our development will focus on the Python version. If you use Carmen, please cite:
Mark Dredze, Michael J Paul, Shane Bergsma, Hieu Tran. Carmen: A Twitter Geolocation System with Applications to Public Health. AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI), 2013.

Twitter Stream Downloader [Link]
This small software package provides code for automating the downloading of data from the Twitter streaming API.

Demographer: Gender Identification for Social Media [Link]
Demographer is a Python package that identifies demographic characteristics based on a name. It’s designed for Twitter, where it takes the name of the user and returns information about his or her likely demographics.


As part of our research we collect and annotate social media datasets.

Format: Each dataset is encoded in JSON format, with one JSON record per line. Each record contains the following fields: id (the tweet id), label (a dictionary of annotations for this tweet, where key is the name of the annotation and value is the label.) Each record will either have a text field (contains the text of the tweet) or a tweet field (contains the full tweet object from Twitter.)

Flu Vaccination Tweets [Link]

This dataset contains annotations for whether a tweet is relevant to the topic of flu vaccination, and if the author intends to receive a flu vaccine. Analysis of this dataset was published in:

Xiaolei Huang, Michael C. Smith, Michael Paul, Dmytro Ryzhkov, Sandra Quinn, David Broniatowski, Mark Dredze. Examining Patterns of Influenza Vaccination in Social Media. AAAI Joint Workshop on Health Intelligence (W3PHIAI), 2017.

Vaccination Sentiment and Relevance Tweets [Link]

This dataset contains annotations for whether a tweet is relevant to the topic of vaccinations, and if the author is expressing a positive or negative view about vaccines. Analysis of this dataset was published in:

Michael Smith, David A. Broniatowski, Mark Dredze. Using Twitter to Examine Social Rationales for Vaccine Refusal. International Engineering Systems Symposium (CESUN), 2016.

Mark Dredze, David A. Broniatowski, Michael Smith, Karen M. Hilyard. Understanding Vaccine Refusal: Why We Need Social Media Now. American Journal of Preventive Medicine, 2015.

Zika Conspiracy Tweets [Link]

This dataset contains annotations for whether a tweet about Zika contains pseudo-scientific information. Analysis of this dataset was published in:

Mark Dredze, David A Broniatowski, Karen M Hilyard. Zika Vaccine Misconceptions: A social media analysis. Vaccine, 2016.

Data Visualization Tools

Our team has developed prototypes for users to explore trends in vaccine-related topics on two major social media platforms, Twitter [Link] and Reddit [Link]. Both tools are still under development.

For illustration, these tools can be utilized to compare trends in autism-related Reddit messages over time, stratified by user gender.

 Autism visualization

Or to compare geographic differences in vaccine-related topics in Twitter messages.

Vaccine topic map

Other Resources

Blogtrackers [Link] is a socio-computational tool that can be employed to track, monitor, and understand blogs, bloggers, and real-time world events. The tool also allows users to comprehend trends, leading narratives, and conduct sentiment analysis on a wide range of subjects.

Focal Structure Analysis [Link] is an algorithm to identify key sets of individuals called focal structures in a social network. These are individuals who collectively organize events, movements, and campaigns, among other complex social processes. The FSA platform allows users to identify focal structures, observe their interactions within the network, and measure their power.

Flu Near You [Link] epidemiologists at Harvard analyze thousands of  user provided health reports  and map them to generate local and national views of influenza-like illness.

GW Social Feed Manager [Link] is software to support research about social media including Twitter, Tumblr, Flickr, and Sina Weibo platforms.

ORA [Link] is software for social network analysis and visualization available through Netanomics.

Map My Health [Link] is a healthcare company based in the United Kingdom that develops evidence-based, disease-specific digital therapeutics.

Sickweather [Link] is an app that scans social networks for indicators of illness, allowing individuals to check for the chance of sickness in their area.

TweetTracker [Link] is a tool from Arizona State University that can help you track, analyze, and understand activity on Twitter.

Vaccine Sentimeter [Link] is a dashboard that provides real-time surveillance and trend analyses of vaccination conversations in mainstream and social media.

YouTubeTracker [Link] is a tool that can track, monitor, and identify influential YouTube groups and content. It allows users to gain insights into content engagement behaviors of individuals via likes, dislikes, comments, replies, shares, etc. Through visual analytics, YouTubeTracker can help identify trends, opinions, communities, anomalous behaviors such as bots, spam, or trolls, among other capabilities. Users can visualize networks among YouTubers, commenters, and content.