Study Finds YouTube Punishes Videos Labeled "Gay" or "Lesbian"

gwhrh
Submitted by Take Out on Tue, 10/15/2019 - 05:46

A new investigation from a coalition of YouTube creators and researchers is accusing YouTube of relying on a system of “bigoted bots” to determine whether certain content should be demonetized, specifically LGBTQ videos.

The investigation was conducted by three people: Sealow, the CEO of research firm Ocelot AI; YouTube creator Andrew who operates the YouTube Analyzed channel; and Een of popular YouTube commentary and investigative channel Nerd City.

The investigation was spurred by an interest to see what words were automatically demonetized by YouTube’s machine learning bots, as concerns over transparency between executives and YouTubers within the creator community grew. Andrew manually tested 15,300 words between June 2nd and July 5th, 2019, using the most common terms in Webster’s Dictionary, UrbanDictionary, and Google search results. The second round of experiment ran between July 6th and July 21st, and it included 14,000 words that were automated using YouTube’s data API by Sealow. Een collaborated with his own sources and helped produce the main video.

Andrew, Sealow, and Een each released their individual videos about the findings, alongside an Excel sheet listing all of the words they used and a white paper analysis of their findings. These words were used to test what YouTube’s bots deem automatically inappropriate for monetization. The team found that if words like “gay” and “lesbian” changed to random words like “happy,” the “status of the video changed to advertiser friendly” every time, Een says in his video.

Reached by The Verge, a YouTube spokesperson denied that there is a list of LGBTQ words that trigger demonetization, despite the investigation’s findings. The spokesperson added that the company is “constantly evaluating our systems to help ensure that they are reflecting our policies without unfair bias.”

“We’re proud of the incredible LGBTQ+ voices on our platform and take concerns like these very seriously,” the spokesperson said. “We use machine learning to evaluate content against our advertiser guidelines. Sometimes our systems get it wrong, which is why we’ve encouraged creators to appeal. Successful appeals ensure that our systems are updated to get better and better.”

YouTube’s systems for automated demonetization are based on many signals, but there is no specific list that’s built into the company’s machine learning system, according to the company. The company confirmed that it tests samples of videos from LGBTQ creators whenever new monetization classifiers are introduced to ensure that LGBTQ videos aren’t more likely to get demonetized. But the company claims that the current reviews system in place, which is used by human moderators who oversee appeals, properly reflects the company’s policies surrounding LGBTQ terms.

But the researchers’ findings suggest that there’s significant bias at work before the human moderators get involved. Their research led them to conclude that YouTube’s machine learning bots that are specifically used to examine whether a video is available to monetize use a “hidden confidence level ranging from 0 to 1.” Those closer to zero are approved for monetization, while others closer to one are demonetized. Effectively, if a video is deemed being above YouTube’s threshold, it’s immediately demonetized and has to undergo manual review.

“Youtube’s classifiers have been trained to try and predict how likely a video is to be demonetized based on the training data (based on prior manual review results),” Sealow told The Verge. “So a score of 1 is 100 percent confident that it should be demonetized while 0.5 is 50 percent and so on. Youtube have had to set a certain acceptable threshold — let’s say ‘35 percent confidence’ where any video that is over the 0.35 score will be demonetized and will require a manual review before being approved for monetization.”

In the analysis of their findings, Sealow states that the “list is best interpreted as a list of negatively charged keywords as certain words are deemed to be more severe than others.”

Every video uploaded for testing purposes ran between one and two seconds and “featured no visual or audio content that could trigger demonetization,” the report reads. The waiting period for monetization approval or denial was around two hours. Words associated with the LGBTQ community or terms used in commentary such as “democrat” or “liberal” are “likely negatively charged due to their use in political commentary that is often deemed to be non advertiser friendly,” the report reads.

“The exact same videos are monetized without the LGBTQ terminology,” Sealow says in his video. “This is not a matter of LGBTQ personalities being demonetized for something that everyone else would also be demonetized for, such as sex or tragedy. This is LGBTQ terminology like ‘gay’ and ‘lesbian’ being the sole reason a video is demonetized despite the context.”

Allegations made in the video aren’t new, but the study is the most extensive. YouTube executives, including CEO Susan Wojcicki and chief product officer Neal Mohan, have spoken about concerns that certain keywords in metadata and titles lead to automatic demonetization. It’s an especially prevalent concern within the LGBTQ community. YouTube has categorically denied that there are policies “that say ‘If you put certain words in a title that will be demonetized,’” as Wojcicki told YouTuber Alfie Deyes in a lengthy interview back in August.

“We work incredibly hard to make sure that when our machines learn something — because a lot of our decisions are made algorithmically — that our machines are fair,” Wojcicki added. “There shouldn’t be [any automatic demonetization].”

That hasn’t stopped creators from using secret language in their videos and including Google Documents in their comments section to communicate with viewers. YouTuber Petty Paige will flash the infamous yellow dollar sign image — a sign that both creators and audiences know means that a video is demonetized — meaning that her fans should read the document linked below to understand why she’s using specific words. She theorized, like many other LGBTQ personalities, that using words like “lesbian” or “transgender” could result in demonetization. Swapping out those terms for other random words seemingly didn’t.

“It’s just as discriminatory if you never say this, and even more exploitive if you do,” Een said.

Earlier this summer, a number of LGBTQ creators filed a lawsuit against YouTube for alleged discriminatory practices, including unfairly demonetizing content that included LGBTQ-friendly terms. The lawsuit also alleges that YouTube actively hurts their channels’ viewership numbers by placing videos in restricted mode, which the company has previously apologized for, and therefore limiting their ability to earn money. The lawsuit claims that “YouTube is engaged in discriminatory, anticompetitive, and unlawful conduct that harms a protected class of persons under California law.”

“We’re tired of being placated with clear lies and hollow promises that they’ve either fixed it or they’re going to fix it,” Chris Knight, who co-hosts an LGBTQ YouTube news show, GNews!, told The Verge at the time. “It’s clearly broken. There’s clearly a bias with their AI, their policies. What we really want is for them to change.”

Sealow and Een state that they don’t believe that YouTube or Wojcicki are homophobic or purposely employ alleged homophobic practices. They specifically add this isn’t because of specific YouTube policies or “a lack of programs in place to mitigate algorithmic discrimination.”

“It’s simply the result of the probabilistic nature of the machine learning classifiers used by the demonetization bot,” Sealow’s report adds.