Reverse Engineering the YouTube Algorithm

Published as: Parametric Algorithmic Transformer Based Weighted YouTube Video Analysis

PythonGraph AnalysisForceAtlas2LLM-as-JudgeResearch2023 / 2026

Original research & paper: 2023 - manual pipeline with Gephi, OpenAI GPT-4, and custom Python scripts.
Automated live implementation: 2026 - fully automated end-to-end with Claude Code.

View on ResearchGate Download Paper

Crawl YouTube's recommendation graph for any search query, detect community clusters with ForceAtlas2 + Louvain, then use an LLM judge to compare your video's transcript against the top recommended videos, weighted by engagement metrics.

Enter a search query, paste your video URL, pick an LLM, and get a full parametric breakdown of how your content stacks up, plus actionable recommendations on what to improve.

How It Works

Crawl

Search-based graph construction

Cluster

ForceAtlas2 + Louvain modularity

Analyze

LLM scores 12 transcript parameters

Score

Weighted normalization via engagement

Live Analysis

Requires YouTube Data API key + one LLM API key (server env vars or entered below)

Heads up: YouTube blocks transcript fetching from cloud server IPs, so live runs may fail on this hosted demo. Sample is pre-cached and always works. For live analysis on your own video, clone the repo and run it locally.

Search Query

The topic you want to optimize your video for

Your Video URL

The video you want to compare against recommendations

LLM Judge

Crawl Depth

Depth 2: ~200 videos, balanced

Analysis Parameters

Parameters that map to YouTube algorithm signals

Estimated cost: ~$0.38

~12 LLM calls via Claude Sonnet 4.6 (76k in / 9.8k out tokens) - ~2100 YouTube API quota units

Estimate only - actual usage varies

Methodology

Eq. 1Weight Allocation

W = (Popularity × Engagement × Sentiment) + Consistency

Popularity = Views × Subscribers

Engagement = (Likes + Comments) / (2 × Views)

Sentiment = Likes / Views

Consistency = (avgLikes + avgComments + avgViews) / avgViews

Eq. 9Parametric Value Generation

P_i = Σ (T1_j − T2_j) × W_j

For each parameter, the difference between your video's score and each comparison video's score is multiplied by that video's weight, then standardized via min-max normalization.

Eq. 10Min-Max Standardization

f(x) = (X − min) / (max − min)

Normalizes the weighted parametric differences to a 0-100 scale for interpretability.

Eq. 11Flooring Function

f(x) = −1/x

Floors weight values to maintain a similar baseline and deviation, ensuring proportional comparison.

Parameter Presets

YouTube Optimization

Parameters that map to YouTube algorithm signals

1Hook Strength

How engaging and attention-grabbing the opening 30 seconds are -does it create curiosity or promise value immediately

2Information Density

Amount of useful, actionable information per minute of content -high density keeps watch time up

3Narrative Structure

Clarity of story arc -setup, development, payoff -does it have a logical flow that drives completion

4Retention Language

Use of curiosity gaps, teasers, open loops, pattern interrupts, and 'but wait' moments that keep viewers watching

5SEO & Keyword Usage

How well the spoken content covers search-relevant keywords and phrases that match the title topic

6Emotional Engagement

Emotional peaks and valleys throughout -humor, surprise, empathy, excitement -drives likes and shares

7Clarity of Explanation

How well complex concepts are broken down -use of analogies, examples, step-by-step -affects satisfaction

8Pacing

Speed of content delivery -is it well-paced or does it drag/rush -affects audience retention curve

9Call to Action

Presence and quality of subscribe, like, comment prompts -drives engagement metrics YouTube tracks

10Shareability

Would someone send this to a friend -unique insights, surprising facts, quotable moments

11Authority & Credibility

Does the speaker sound knowledgeable, cite sources, demonstrate expertise -affects trust signals

12Script Quality

Tightness of script -minimal filler words, repetition, tangents -polished vs rambling delivery

Original Research (2023)

The 12 linguistic parameters from the published paper

1Readability

Overall readability score

2Flesch-Kincaid Grade

Flesch-Kincaid grade level -education level needed to understand the text

3Coleman-Liau Index

Coleman-Liau reading difficulty metric based on sentence and word structure

4Lexical Density

Percentage of content-bearing words vs total words

5Cohesion

Flow and connectivity of ideas throughout the transcript

6Sentiment Analysis

Emotional positivity and tone of the content

7Keyword Frequency

Density and frequency of topic-relevant keywords

8Relevance to Title

How well the spoken content matches the video title

9Ease of Understanding

How easy the content is to follow and comprehend for a general audience

10Technicality

Level of technical depth and specialized knowledge required

11Jargon Usage

Amount of domain-specific or specialized terminology used