← All posts Feb 13, 2026

How to Build a Reddit Sentiment Scanner in Python (Full Pipeline)

Published: March 2026 Reading time: ~12 minutes Tags: Python, Reddit API, sentiment analysis, PRAW, financial NLP, quant trading

If you’ve ever watched a stock move 15% on a Monday morning and then spent the weekend reading r/wallstreetbets wondering if the chatter had anything to do with it, you’re already halfway to understanding why Reddit sentiment data is worth building a pipeline for.

The problem isn’t getting the data. The Reddit API is free and Python makes it trivially easy. The problem is that raw Reddit posts are noise. Most of what gets posted on financial subreddits is low-quality, off-topic, or sarcastic — and a generic sentiment model trained on product reviews doesn’t know the difference between someone saying “🚀🚀🚀” ironically and someone saying it as a genuine bull signal.

This post walks through the full pipeline: collecting posts from Reddit with PRAW, classifying them with a finance-tuned model, and filtering down to the posts that are actually worth paying attention to. By the end you’ll have a script that runs in under a minute, processes 100 posts per subreddit, and returns a clean list of high-confidence signals.

What you’re building

A Python script that:

Pulls the top posts from one or more financial subreddits using the Reddit API
Sends them to a classification API in a single batch call
Filters the results by quality, directional conviction, and relevance to a specific ticker
Prints (or saves) only the posts worth looking at

The output looks like this:

NVDA — bullish [relevant | relevance: 0.89 | confidence: 0.74]
"Blackwell demand is insane. Hyperscalers not slowing capex. Long into earnings."

NVDA — bullish [relevant | relevance: 0.91 | confidence: 0.61]
"DD: Why I think NVDA prints to $200 by EOY — full breakdown inside"

---
2 signals from 100 posts

Clean, structured, filterable. Let’s build it.

Prerequisites

You need three things:

Python 3.8+
A Reddit developer app (free, takes two minutes — instructions below)
A FinSignals API key (free tier gives you 1,000 credits/month — enough to run this script 10 times a day on 100 posts)

Install the dependencies:

bash

pip install praw requests

If you have the FinSignals Python SDK:

bash

pip install finsignals-api

The examples below use the SDK. If you prefer raw HTTP, there’s a requests version at the end of the post.

Step 1: Create a Reddit developer app

Go to reddit.com/prefs/apps and click “are you a developer? create an app.”

Name: anything (e.g. sentiment-pipeline)
Type: select “script”
Redirect URI: http://localhost:8080 (doesn’t matter for script apps)

After you create it, you’ll see a client ID (the short string under the app name) and a client secret. Save both.

You’ll also need your Reddit username and password.

Step 2: Fetch posts with PRAW

PRAW (Python Reddit API Wrapper) handles all the OAuth and rate-limiting for you. Here’s a minimal fetch:

python

import praw

reddit = praw.Reddit(
    client_id="YOUR_CLIENT_ID",
    client_secret="YOUR_CLIENT_SECRET",
    username="YOUR_REDDIT_USERNAME",
    password="YOUR_REDDIT_PASSWORD",
    user_agent="finsignals-scanner/1.0 by YOUR_REDDIT_USERNAME"
)

To pull the top 100 posts from r/wallstreetbets right now:

python

subreddit = reddit.subreddit("wallstreetbets")
posts = list(subreddit.hot(limit=100))

print(f"Fetched {len(posts)} posts")

The hot feed returns posts ranked by current momentum. You can also use new (most recent), top with a time filter (day, week, month), or rising for posts gaining traction fast. For a real-time scanner, hot or rising tend to give you the most actionable data.

Each post object has .title, .selftext (the body), .score (upvotes), .num_comments, and a bunch of other metadata you can use later for additional filtering.

Step 3: Prepare the batch payload

The classification API accepts a list of items, each with optional ticker, title, and body fields. The more context you give it, the better the relevance scoring works.

python

def prepare_items(posts, ticker=None):
    items = []
    for post in posts:
        item = {
            "title": post.title,
            "body": post.selftext[:1500] if post.selftext else "",
        }
        if ticker:
            item["ticker"] = ticker
        items.append(item)
    return items

A few things worth knowing here:

Truncate long posts. Most of the signal is in the first 1,500 characters. Sending the full body of a 10,000-word DD post doesn’t improve classification meaningfully and burns more processing time.
Include the ticker when you have it. Supplying a ticker changes how the relevance_score is computed — it measures how on-topic the post is for that specific symbol, not just for financial content generally. A post about Apple earnings gets a high relevance score if you’re scanning for AAPL; a low one if you’re scanning for NVDA.
Title-only posts are fine. Many Reddit posts have no body text. The model classifies them on title alone — just pass an empty string for body.

Step 4: Classify the batch

With the SDK, this is a single call:

python

import finsignals

client = finsignals.Client("fs_your_key_here")

items = prepare_items(posts, ticker="NVDA")
results = client.classify_batch(items)

print(f"Credits charged: {results.credits_charged}")

A batch of 100 items costs 1.00 + 99 × 0.70 = 70.3 credits. At 1,000 free credits per month, that’s about 14 full scans on the free tier. Upgrade to Starter ($29/mo, 100,000 credits) and you can run this every few minutes all month without thinking about it.

The response comes back as a list of output objects in the same order as your input items. Each output has:

sentiment — positive, negative, or neutral with probabilities
directionality — bullish, bearish, or neutral_direction
quality — relevant, noise, or spam
post_type — dd, news_reaction, technical_analysis, fundamentals, question, or general
relevance_score — a float in [0, 1]
author_confidence — a float in [0, 1]
sarcasm — a boolean

Step 5: Filter for signal

Raw classification isn’t the endpoint — filtering is. Most of the posts in any financial subreddit are noise, off-topic, or low-quality. The classification output gives you the levers to filter them out programmatically.

Here’s a filter function that returns only the posts worth reading:

python

def filter_signals(posts, outputs, ticker=None, direction=None, min_relevance=0.65):
    signals = []

    for post, output in zip(posts, outputs):
        # Skip noise and spam
        if output.quality.label != "relevant":
            continue

        # Skip low relevance (off-topic for the ticker)
        if output.relevance_score < min_relevance:
            continue

        # Skip sarcasm — inverted sentiment is worse than no signal
        if output.sarcasm:
            continue

        # Optional: filter by direction
        if direction and output.directionality.label != direction:
            continue

        signals.append({
            "post": post,
            "output": output,
        })

    return signals

The thresholds here are a starting point. min_relevance=0.65 is conservative — lower it to 0.5 if you’re running a broader scan and want more volume; raise it to 0.80 if you want only the posts that are unambiguously about the ticker you care about.

The sarcasm filter is worth paying attention to. Reddit financial communities use heavy irony — “oh yeah definitely buy the top, brilliant strategy /s” — and a naive sentiment model reads it as positive. The sarcasm head flags these for removal. It’s marked as experimental in the docs, so don’t bet the farm on it, but in practice it catches the most egregious cases.

Step 6: Display the results

python

def print_signals(signals):
    if not signals:
        print("No signals found.")
        return

    for s in signals:
        post = s["post"]
        out = s["output"]
        direction = out.directionality.label
        relevance = round(out.relevance_score, 2)
        confidence = round(out.author_confidence, 2)
        post_type = out.post_type.label

        print(f"n{direction.upper()} [{post_type} | relevance: {relevance} | confidence: {confidence}]")
        print(f'"{post.title}"')
        print(f"  r/{post.subreddit.display_name} · {post.score} upvotes · {post.num_comments} comments")
        print(f"  https://reddit.com{post.permalink}")

The full script

Here it is end to end:

python

import praw
import finsignals

# ── Config ─────────────────────────────────────────────────────────────────
REDDIT_CLIENT_ID     = "YOUR_CLIENT_ID"
REDDIT_CLIENT_SECRET = "YOUR_CLIENT_SECRET"
REDDIT_USERNAME      = "YOUR_REDDIT_USERNAME"
REDDIT_PASSWORD      = "YOUR_REDDIT_PASSWORD"

FINSIGNALS_API_KEY   = "fs_your_key_here"

SUBREDDITS  = ["wallstreetbets", "investing", "stocks"]
TICKER      = "NVDA"           # Set to None to scan without ticker context
DIRECTION   = "bullish"        # "bullish", "bearish", or None for all directions
POSTS_LIMIT = 100              # Posts to fetch per subreddit
MIN_RELEVANCE = 0.65           # Minimum relevance score (0–1)
# ───────────────────────────────────────────────────────────────────────────


def fetch_posts(reddit, subreddit_name, limit):
    sub = reddit.subreddit(subreddit_name)
    return list(sub.hot(limit=limit))


def prepare_items(posts, ticker=None):
    items = []
    for post in posts:
        item = {
            "title": post.title,
            "body": post.selftext[:1500] if post.selftext else "",
        }
        if ticker:
            item["ticker"] = ticker
        items.append(item)
    return items


def filter_signals(posts, outputs, direction=None, min_relevance=0.65):
    signals = []
    for post, output in zip(posts, outputs):
        if output.quality.label != "relevant":
            continue
        if output.relevance_score < min_relevance:
            continue
        if output.sarcasm:
            continue
        if direction and output.directionality.label != direction:
            continue
        signals.append({"post": post, "output": output})
    return signals


def print_signals(signals, subreddit_name, total_posts):
    print(f"n── r/{subreddit_name} ── {len(signals)} signals from {total_posts} posts ──")
    if not signals:
        print("  No signals matched your filters.")
        return
    for s in signals:
        post = s["post"]
        out  = s["output"]
        print(
            f"n  {out.directionality.label.upper()} "
            f"[{out.post_type.label} | "
            f"relevance: {round(out.relevance_score, 2)} | "
            f"confidence: {round(out.author_confidence, 2)}]"
        )
        print(f'  "{post.title}"')
        print(f"  {post.score} pts · {post.num_comments} comments · https://reddit.com{post.permalink}")


def main():
    reddit = praw.Reddit(
        client_id=REDDIT_CLIENT_ID,
        client_secret=REDDIT_CLIENT_SECRET,
        username=REDDIT_USERNAME,
        password=REDDIT_PASSWORD,
        user_agent="finsignals-scanner/1.0",
    )
    client = finsignals.Client(FINSIGNALS_API_KEY)

    for sub_name in SUBREDDITS:
        posts = fetch_posts(reddit, sub_name, POSTS_LIMIT)
        items = prepare_items(posts, ticker=TICKER)

        results = client.classify_batch(items)

        signals = filter_signals(
            posts,
            results.outputs,
            direction=DIRECTION,
            min_relevance=MIN_RELEVANCE,
        )

        print_signals(signals, sub_name, len(posts))

    print(f"nDone. Total credits charged across all subreddits: check your dashboard.")


if __name__ == "__main__":
    main()

Copy that into scanner.py, fill in your credentials in the config block at the top, and run it:

bash

python scanner.py

You’ll get output like:

── r/wallstreetbets ── 3 signals from 100 posts ──

  BULLISH [dd | relevance: 0.91 | confidence: 0.69]
  "NVDA DD: Blackwell demand still accelerating, hyperscaler capex not slowing"
  847 pts · 213 comments · https://reddit.com/r/wallstreetbets/comments/...

  BULLISH [news_reaction | relevance: 0.88 | confidence: 0.55]
  "NVDA up pre-market on Taiwan Semi data — what this means"
  312 pts · 89 comments · https://reddit.com/r/wallstreetbets/comments/...

  BULLISH [technical_analysis | relevance: 0.72 | confidence: 0.61]
  "NVDA holding the 200-day, looking for a bounce entry"
  204 pts · 56 comments · https://reddit.com/r/wallstreetbets/comments/...

── r/investing ── 1 signal from 100 posts ──

  BULLISH [fundamentals | relevance: 0.83 | confidence: 0.74]
  "The case for NVDA long-term: not just AI, the data center transition"
  1,204 pts · 341 comments · https://reddit.com/r/investing/comments/...

Extending it

A few ways to take this further once the basic pipeline is working.

Schedule it to run at market open

Add a cron job (or use schedule in Python) to run the scanner at 9:15 AM Eastern on weekdays:

bash

# crontab -e
15 9 * * 1-5 /usr/bin/python3 /path/to/scanner.py >> /var/log/scanner.log 2>&1

Save output to a CSV or database

Replace print_signals with a function that writes to a CSV or SQLite database, and you have a growing dataset of labeled Reddit posts tied to specific tickers and timestamps. This is genuinely useful for backtesting — you can look back at the sentiment distribution for a ticker in the days before it moved and start building an intuition for what the signal distribution looks like ahead of a catalyst.

python

import csv
from datetime import datetime

def save_signals(signals, filename="signals.csv"):
    with open(filename, "a", newline="") as f:
        writer = csv.writer(f)
        for s in signals:
            post = s["post"]
            out  = s["output"]
            writer.writerow([
                datetime.utcnow().isoformat(),
                post.subreddit.display_name,
                post.id,
                post.title[:200],
                out.directionality.label,
                out.quality.label,
                round(out.relevance_score, 4),
                round(out.author_confidence, 4),
                out.sarcasm,
                out.post_type.label,
                post.score,
                post.num_comments,
                f"https://reddit.com{post.permalink}",
            ])

Scan multiple tickers

Change the config to loop over a watchlist:

python

WATCHLIST = ["NVDA", "AAPL", "TSLA", "META", "AMD"]

for ticker in WATCHLIST:
    for sub_name in SUBREDDITS:
        posts = fetch_posts(reddit, sub_name, 50)
        items = prepare_items(posts, ticker=ticker)
        results = client.classify_batch(items)
        signals = filter_signals(results.outputs, direction=None, min_relevance=0.70)
        print_signals(signals, sub_name, len(posts))

Note: running a full watchlist scan on three subreddits × 50 posts each = 150 posts per ticker. At 5 tickers that’s 750 posts per run, costing about 526 credits (batch pricing). At that rate, the Starter plan ($29/mo, 100,000 credits) comfortably covers hourly scans all month.

Filter by post engagement

The Reddit post object includes .score and .num_comments. You can add an engagement floor to ignore posts nobody’s reading:

python

posts = [p for p in raw_posts if p.score > 50 or p.num_comments > 20]

High-engagement posts aren’t inherently more accurate as signals, but they do represent the market’s attention — which is at least as relevant as whether the underlying thesis is correct.

Using `requests` instead of the SDK

If you prefer not to install the SDK, the REST call is straightforward:

python

import requests

def classify_batch(items, api_key):
    response = requests.post(
        "https://api.finsignals.ai/v1/classify/batch",
        headers={"X-API-Key": api_key, "Content-Type": "application/json"},
        json={"items": items},
        timeout=30,
    )
    response.raise_for_status()
    return response.json()

# Usage:
data = classify_batch(items, FINSIGNALS_API_KEY)
for item, output in zip(posts, data["outputs"]):
    print(item.title, "→", output["directionality"]["label"])

The response schema is documented in full at finsignals.ai/docs.

A word on what this isn’t

This pipeline surfaces posts worth reading. It doesn’t tell you what to trade, and the people running serious quant strategies treat social sentiment as one feature in a much larger model, not a standalone signal.

That said, the classification layer does something that a raw feed of Reddit posts can’t: it separates the posts that are directionally committed, high-quality, and on-topic from everything else. Whether you use that to trade, to research, or to build something on top of — that’s the part that’s actually hard to do well with a general-purpose model. A post that says “I wouldn’t touch this stock with a ten-foot pole 😂” is negative sentiment. A post that says “I wouldn’t touch this stock with a ten-foot pole” after three paragraphs of bullish analysis might be sarcasm. These distinctions are why domain-specific fine-tuning exists.

How to Build a Reddit Sentiment Scanner in Python (Full Pipeline)

How to Build a Reddit Sentiment Scanner in Python (Full Pipeline)

What you’re building

Prerequisites

Step 1: Create a Reddit developer app

Step 2: Fetch posts with PRAW

Step 3: Prepare the batch payload

Step 4: Classify the batch

Step 5: Filter for signal

Step 6: Display the results

The full script

Extending it

Schedule it to run at market open

Save output to a CSV or database

Scan multiple tickers

Filter by post engagement

Using requests instead of the SDK

A word on what this isn’t

What to read next

Using `requests` instead of the SDK