This page collects links to resources for academics and other investigators, including websites of our collaborators and academic partners, and datasets pertinent to the COVID-19 pandemic.

Contact info@socialmediaforpublichealth.org to add resources to this list.

Related Initiatives

The Pandemic Project
The study of people during COVID-19
https://utpsyc.org/covid19/

The Johns Hopkins Coronavirus Resource Center
Johns Hopkins experts in global public health, infectious disease, and emergency preparedness have been at the forefront of the international response to COVID-19.
https://coronavirus.jhu.edu

Crowdfight COVID-19
An initiative from the scientific community to put all available resources at the service of the fight against COVID-19
https://crowdfightcovid19.org

COVID Act Now
Created by a team of data scientists, engineers, and designers in partnership with epidemiologists, public health officials, and political leaders to help understand how the COVID-19 pandemic will affect their region.
https://covidactnow.org

COVID-19 Social Science Research Tracker
This international list tracks new research about COVID 19, including published findings, pre-prints, projects underway, and projects at least at proposal stage related to social science.
https://github.com/natematias/covid-19-social-science-research/

COVID-19 Open Research Dataset Challenge (CORD-19)
In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19).
https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

Awesome Coronavirus19 Dataset
A repository containing resources related to COVID-19.
https://github.com/bigheiniu/awesome-coronavirus19-dataset

CORD-19 Information Aggregator
This is a tool to browse answers the scientific literature may provide regarding various questions about the novel coronavirus and COVID-19.
http://phontron.com/misc/cord19_report/

Center for Informed Democracy & Social – cybersecurity (IDeaS)- Coronavirus Misinformation
A list of identified misinformation regarding coronavirus.
https://www.cmu.edu/ideas-social-cybersecurity/research/coronavirus.html

Coronavirus Tech Handbook
The Coronavirus Tech Handbook provides a library for technologists, civic organisations, public and private institutions, researchers, educators and specialists of all kinds to collaborate on an agile and sophisticated response to the coronavirus outbreak and sequential impacts.
https://coronavirustechhandbook.com/home

COVIDSearch: Making Sense of [Lots of] Open Data Related to COVID-19
A TREC style shared task with the goal of evaluating search algorithms and systems for helping scientists, clinicians, policy makers, and others manage the existing and the rapidly growing corpus of scientific literature related to COVID-19
https://dmice.ohsu.edu/hersh/COVIDSearch.html

Data Against COVID-19
A clearinghouse for matching requests for data cleaning of such datasets with volunteers willing to perform this clearing.
https://www.data-against-covid.org/

COVID-19 Data Collaboratives
A list of COVID-19 related data projects.
https://docs.google.com/document/d/1JWeD1AaIGKMPry_EN8GjIqwX4J4KLQIAqP09exZ-ENI/mobilebasic


Twitter Data

The following is a list of datasets containing social media and web data related to COVID-19.

COVID-19: The First Public Coronavirus Twitter Dataset
A multilingual coronavirus Twitter dataset starting in January 22, 2020.
https://arxiv.org/abs/2003.07372
https://medium.com/@isiminds/twitter-dataset-related-to-covid-19-coronavirus-released-b0610c718910
https://medium.com/@isiminds/twitter-covid-19-preliminary-geo-analysis-83f43fb4e0c3

GWU Libraries Dataverse: Coronavirus Tweet Ids
This dataset contains the tweet ids of 51,798,932 tweets related to Coronavirus or COVID-19. They were collected between March 3, 2020 and March 19, 2020 (midnight UTC-0) from the Twitter API using Social Feed Manager.
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/LW0BTB

TweetSets from GWU
Twitter datasets for research and archiving. Create your own Twitter dataset from existing datasets.
https://tweetsets.library.gwu.edu/

Crowdbreak Twitter Dataset
The data related to COVID-19 and vaccines and has been collected through the Twitter filter stream API using a list of keywords and languages.
https://www.crowdbreaks.org/en/data_sharing


Data from Natural Language Processing

Knowledge Extraction to Assist Scientific Discovery from Corona Virus Literature
Knowledge extraction NLP algorithms applied to the CORD-19 dataset of Coronavirus related scientific papers
http://blender.cs.illinois.edu/covid19/

Projects with Search Data

Tracking COVID-19 using online search
This work develops an unsupervised model for COVID-19 using search data
https://github.com/vlampos/covid-19-online-search

Coronavirus Google Searches Could Save Lives
An analysis of search data for COVID-19 based on location.
https://onezero.medium.com/google-needs-to-share-the-data-from-coronavirus-searches-62e6f60cc363

Text Data

COVID-19 Open Research Dataset (CORD-19)
Over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of viruses
https://pages.semanticscholar.org/coronavirus-research

Social Mobility Data

Unacast Social Distancing Scoreboard
A tool to provide organizations fighting COVID-19 with an understanding of the efficacy of social distancing initiatives — currently seen as the most effective way of slowing the spread of the virus.
https://www.unacast.com/covid19/social-distancing-scoreboard

Google COVID-19 Community Mobility Reports
Community Mobility Reports aim to provide insights into what has changed in response to policies aimed at combating COVID-19.
https://www.google.com/covid19/mobility/

Other Data

New York Times COVID-19 Dataset
An ongoing repository of data on coronavirus cases and deaths in the U.S. – https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
https://github.com/nytimes/covid-19-data

COVID-19 Dataset Clearinghouse
This is a repository for public data sets relating to the COVID-19 pandemic.
http://michaelnielsen.org/polymath1/index.php?title=COVID-19_dataset_clearinghouse
https://terrytao.wordpress.com/2020/03/25/polymath-proposal-clearinghouse-for-crowdsourcing-covid-19-data-and-data-cleaning-requests/

County-level Socioeconomic Data for Predictive Modeling of Epidemiological Effects
We aim to gather a machine readable dataset related to socioeconomic factors that may affect the spread and/or consequences of epidemiological outbreaks, particularly the novel coronavirus (COVID-19).
https://github.com/JieYingWu/COVID-19_US_County-level_Summaries

COVID-19 Epidemiological Data Repository by Johns Hopkins University Center for Systems Science & Engineering (JHU CCSE)
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).
https://github.com/CSSEGISandData/COVID-19

European Center for Disease Control & Prevention (ECDC) – COVID-19 Epidemiological Data
Data on the geographic distribution of COVID-19 cases worldwide
https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide

U.S. Hospital Capacity Estimates (Harvard Global Health Institute)
HGHI launches regionalized capacity estimates. | Counting every bed, in every major hospital market, to inform community response
https://globalepidemics.org/2020/03/17/caring-for-covid-19-patients/

U.S. State COVID-19 Testing Data: The COVID Tracking Project
The COVID Tracking Project collects information from 50 US states, the District of Columbia, and 5 other US territories to provide the most comprehensive testing data we can collect for the novel coronavirus, SARS-CoV-2. We attempt to include positive and negative results, pending tests, and total people tested for each state or district currently reporting that data.
https://covidtracking.com/

Italy COVID-19 Data
COVID-19 Italia – Monitoraggio situazione
https://github.com/pcm-dpc/COVID-19

WHO COVID-19 Data – Cases & Deaths in China (by province) and other countries
Coronavirus COVID-19 cummulative cases and deaths by province for China and aggregated by country for the rest of the World.
https://data.humdata.org/dataset/coronavirus-covid-19-cases-data-for-china-and-the-rest-of-the-world

ACAPS COVID-19: Government Measures Dataset
The COVID-19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories: – Social distancing – Movement restrictions – Public health measures – Social and economic measures – Human rights implications Each category is broken down into several types of measures. ACAPS consulted government, media, United Nations, and other organisations sources.
https://data.humdata.org/dataset/acaps-covid19-government-measures-dataset

World Bank Indicators of Interest to the COVID-19 Outbreak
World Bank Indicators of Interest to the COVID-19 Outbreak. This link is to a collection in the World Bank data catalog that contains datasets that may be useful for analysis, response or modelling.
https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak

GeneBank COVID-19 Genetic Sequences
SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2) Sequences.
https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/

Next Strain COVID-19 Genomics Database
Genomic epidemiology of novel coronavirus
https://nextstrain.org/ncov

India COVID-19 Tracker
A tracker for the spread of COVID-19 in India
https://www.covid19india.org/