Hate Speech Reach on Twitter Down More Than Expected, Independent Assessment Shows

The reach of hate speech on Twitter has declined more than expected under Elon Musk, according to the results of a new independent assessment. Twitter’s safety team shared the findings in a thread posted to the platform on March 21, noting that Twitter had recently partnered with software firm Sprinklr to conduct the independent assessment. According to Sprinklr, an “AI-based Toxicity Model” was used to analyze publicly available digital data on the site to detect the presence of “toxicity” and measure how often posts containing such “toxic” language, or hate speech, are seen. “Sprinklr’s AI-powered model found that the reach of hate speech on Twitter is even lower than our own model quantified,” Twitter’s safety team wrote. Sprinklr said it defines “hate speech” more narrowly by evaluating “slurs in the nuanced context of their use,” noting that Twitter has up until now, “taken a broader view of the potential toxicity of slur usage.” To quantify hate speech, Twitter provided Sprinklr with a list of 300 English-language “slur” words designed to capture hateful slurs and language that “targets marginalized and minority voices.” Neither Twitter nor Sprinklr identified what the slur words were. Sprinklr then analyzed every English-language public tweet on the platform between January and February 2023, including how often they were seen, and identified 550,000 tweets that included at least one word from the list Twitter provided. Assessment Findings It also found that, when compared to non-toxic tweets in the dataset containing slur keywords, toxic tweets received three times fewer views, or impressions, on average. Around 15 percent of the tweets identified in the data set containing slur keywords were toxic, according to Sprinklr, which noted that despite all of the identified tweets containing a slur word, the majority were used in “non-toxic contexts” like reclaimed speech or casual greetings. “Our focal metric is hate speech impressions, not the number of Tweets containing slurs,” Twitter’s safety team wrote. “Most slur usage is not hate speech, but when it is, we work to reduce its reach. Sprinklr’s analysis found that hate speech receives 67 percent fewer impressions per Tweet than non-toxic slur Tweets.” However, Twitter noted that “no model is ever perfect” and said more work still needs to be done to combat hate speech on the platform and improve data collection on such speech, such as incorporating other languages, new terms, and “more precise methodologies.” Twitter CEO Elon Musk also weighed in on the findings, noting the different methods used to assess hate speech on the platform. “This is a critical distinction,” Musk wrote. “It’s obviously trivial for a single person to create 10k [sic] bot accounts on one computer, each of which is tweeting slurs that are written to avoid text string detection. What matters is whether those tweets are actually shown to real users.” Musk has previously said that hate speech on Twitter will not be tolerated and that the platform cannot become a “free-for-all hellscape.” ‘Plausibly Antisemitic’ Posts on Twitter In November, the billionaire businessman also said that hate speech impressions on the platform were down by one-third from “pre-spike” levels seen a month prior, shortly after he took over the company. Also in November, Musk announced that negative or hate tweets on the site would be demonetized and deboosted. Commenting on the release of Tuesday’s independent assessment, Michael O’Herlihy, Director of Product for Trust & Safety at Twitter, said the results show that “the reach of toxic content is actually lower than Twitter’s own first-party estimates.” However, he noted that the platform’s approach to reducing such speech on the site still needs “refining.” The results of the assessment came shortly after a separate study was published on Tuesday by the Institute for Strategic Dialogue and CASM Technology, which claimed that antisemitic posts on the platform more than doubled on Twitter in the months following Musk’s takeover and have remained high since. That study found that from Oct. 27 until Feb 9, 2023, there was an average of 12,762 tweets deemed “plausibly antisemitic” on the site and a total of 325,739 antisemitic tweets in English in the 6 months from June 2022 to February 2023. The term “plausibly antisemitic” used in that study is based on the International Holocaust Remembrance Alliance’s definition of the term as “a certain perception of Jews, which may be expressed as hatred towards Jews.” However, the study authors noted that the machine-learning tools model used to identify the tweets made a correct decision an estimated 75 percent of the time.

Hate Speech Reach on Twitter Down More Than Expected, Independent Assessment Shows

The reach of hate speech on Twitter has declined more than expected under Elon Musk, according to the results of a new independent assessment.

Twitter’s safety team shared the findings in a thread posted to the platform on March 21, noting that Twitter had recently partnered with software firm Sprinklr to conduct the independent assessment.

According to Sprinklr, an “AI-based Toxicity Model” was used to analyze publicly available digital data on the site to detect the presence of “toxicity” and measure how often posts containing such “toxic” language, or hate speech, are seen.

“Sprinklr’s AI-powered model found that the reach of hate speech on Twitter is even lower than our own model quantified,” Twitter’s safety team wrote.

Sprinklr said it defines “hate speech” more narrowly by evaluating “slurs in the nuanced context of their use,” noting that Twitter has up until now, “taken a broader view of the potential toxicity of slur usage.”

To quantify hate speech, Twitter provided Sprinklr with a list of 300 English-language “slur” words designed to capture hateful slurs and language that “targets marginalized and minority voices.” Neither Twitter nor Sprinklr identified what the slur words were.

Sprinklr then analyzed every English-language public tweet on the platform between January and February 2023, including how often they were seen, and identified 550,000 tweets that included at least one word from the list Twitter provided.

Assessment Findings

It also found that, when compared to non-toxic tweets in the dataset containing slur keywords, toxic tweets received three times fewer views, or impressions, on average.

Around 15 percent of the tweets identified in the data set containing slur keywords were toxic, according to Sprinklr, which noted that despite all of the identified tweets containing a slur word, the majority were used in “non-toxic contexts” like reclaimed speech or casual greetings.

“Our focal metric is hate speech impressions, not the number of Tweets containing slurs,” Twitter’s safety team wrote. “Most slur usage is not hate speech, but when it is, we work to reduce its reach. Sprinklr’s analysis found that hate speech receives 67 percent fewer impressions per Tweet than non-toxic slur Tweets.”

However, Twitter noted that “no model is ever perfect” and said more work still needs to be done to combat hate speech on the platform and improve data collection on such speech, such as incorporating other languages, new terms, and “more precise methodologies.”

Twitter CEO Elon Musk also weighed in on the findings, noting the different methods used to assess hate speech on the platform.

“This is a critical distinction,” Musk wrote. “It’s obviously trivial for a single person to create 10k [sic] bot accounts on one computer, each of which is tweeting slurs that are written to avoid text string detection. What matters is whether those tweets are actually shown to real users.”

Musk has previously said that hate speech on Twitter will not be tolerated and that the platform cannot become a “free-for-all hellscape.”

‘Plausibly Antisemitic’ Posts on Twitter

In November, the billionaire businessman also said that hate speech impressions on the platform were down by one-third from “pre-spike” levels seen a month prior, shortly after he took over the company.

Also in November, Musk announced that negative or hate tweets on the site would be demonetized and deboosted.

Commenting on the release of Tuesday’s independent assessment, Michael O’Herlihy, Director of Product for Trust & Safety at Twitter, said the results show that “the reach of toxic content is actually lower than Twitter’s own first-party estimates.” However, he noted that the platform’s approach to reducing such speech on the site still needs “refining.”

The results of the assessment came shortly after a separate study was published on Tuesday by the Institute for Strategic Dialogue and CASM Technology, which claimed that antisemitic posts on the platform more than doubled on Twitter in the months following Musk’s takeover and have remained high since.

That study found that from Oct. 27 until Feb 9, 2023, there was an average of 12,762 tweets deemed “plausibly antisemitic” on the site and a total of 325,739 antisemitic tweets in English in the 6 months from June 2022 to February 2023.

The term “plausibly antisemitic” used in that study is based on the International Holocaust Remembrance Alliance’s definition of the term as “a certain perception of Jews, which may be expressed as hatred towards Jews.”

However, the study authors noted that the machine-learning tools model used to identify the tweets made a correct decision an estimated 75 percent of the time.