CLAWS:

Racism is a Virus: Anti-Asian Hate and Counterhate in
Social Media during the COVID-19 Crisis

Summary

The spread of COVID-19 has sparked racism, hate, andxenophobia in social media targeted at the Chinese and the broader Asian communities. However, little is known about how racial hate spreads during a pandemic and the role of counterhate speech in mitigating the spread. Here we study the evolution and spread of anti-Asian hate speech through the lens of Twitter. To do so, we create COVID-HATE, the largest dataset of anti-Asian hate and counterhate spanning three months (i.e., from January 15, 2020 to April 17, 2020), containing over 30 million tweets. By creating a novel hand-labeled dataset of 2,400 tweets, we train a text classifier to identify hate and counterhate tweets and finally we identify 891,204 hate and 200,198 counterhate tweets in COVID-HATE. Using this data, we conduct a comprehensive overview of anti-Asian hate and counterhate speech on Twitter during COVID-19.

More details can be found in our paper.

Source (Citation)

Please cite the following papers if you use this resource or dataset:

@article{ziems2020racism,
title={Racism is a Virus: Anti-Asian Hate and Counterhatein Social Media during the COVID-19 Crisis},
author={Ziems, Caleb, He Bing, Soni Sandeep and Kumar Srijan},
journal={arXiv preprint arXiv:2005.12423},
year={2020}
}

Dataset

Download

You can use the following links to download the data:

Dataset statistics

The overall dataset statistics is as follows:

Property Statistic
Duration Jan 15–Apr 17, 2020
Number of tweets 30,929,269
Number of (frac.) hateful tweets 891,204 (2.88%)
Number of (frac.) counterhate tweets 200,198 (0.65%)
Number of (frac.) neutral tweets 26,837,429 (86.77%)
Number of users 7,833,194
Number of (frac.) hateful users 393,897 (5.03%)
Number of (frac.) counterhate users 136,154 (1.74%)
Number of (frac.) neutral users 6,812,695 (86.97%)
Number of nodes in the network 87,851,137
Number of edges in the network 717,087,317

Annotated hate, counterhate, and neutral tweets (Download)

This is a manually-annotated dataset of 2,319 COVID-19 related racial hate tweets categorized into four categories :

Please see the paper for the precise definitions of all categories.

All hate, counterhate, and neutral COVID-19 tweets (Download)

This dataset contains 30,929,269 COVID-19 tweets. All tweets are classified into hate, counterhate, and neutral categories using a machine learning classifier developed using the hand-labeled dataset.

  • Hate: 891,204 tweets. These tweets are classified as hateful by the classifier.
  • Counterhate: 200,198 tweets. These tweets are classified as counterhate.
  • Neutral: 26,837,429 tweets. These tweets are classified as neutral.
  • Other: 3,000,439 tweets. These tweets were classified into more than one of the above categories.
  • Tweet location information (Download)

    This file contains the infered location of tweets in the dataset. Each tweet is associated with a city, county, state, and country, whenever available. Locations were inferred using OpenStreetMaps.

    Twitter social network dataset (Download)

    This dataset contains the ego-networks of nodes active in the Anti-Asian hate and counterhate Twitter discussions. This dataset has a total of 489,011 ego-networks, containing 87,851,137 nodes and 717,087,317 edges.

    ​ ​

    Hate classifier (Download)

    This classifier categorizes tweets into hate, counterhate, and neutral categories. This model is trained using the annotated dataset.

    Readme file (Download)

    This file has the complete description of the dataset and its format.

    Note that the Twitter terms and conditions restrict sharing of tweet text. Thus, we have released the tweet ids. To retrieve the tweet text data, you need to first apply for a twitter API and use a third-party tool (e.g., twython) to extract the tweet data.

    Key Findings

    The COVID-HATE social network containing hate (orange), counterhate (blue) and neutral nodes (gray).
    Hate begets more hate in the neighbors. More details in the paper.
    Counterhate discourages neighbors from turning hateful. More details in the paper.

    From January 15, 2020 to April 17, 2020, a total of 891,204 hate tweets and 200,198 counterhate tweets were made. Hate content is always seen to exceed counterhate in terms of the number.

    From January 15, 2020 to April 17, 2020, a total of 891,204 hate tweets and 200,198 counterhate tweets were made. Nationally-relevant activity sparked nationwide hate. In USA, there is a huge spike shortly after President Trump’s ‘Chinese Virus’ tweet, and in India, a spike is observed after nationwide lockdown was announced.

    Bots form 10.4% of all hateful users, but are even more active and hateful compared to hateful non-bots.
    The state-level hatemap in US by individual count
    The state-level hatemap in US by cumulative count