Building a Stock Screener in Python
A stock screener filters thousands of securities down to a shortlist matching your criteria — market cap, volume, technical signals, valuation metrics. Commercial screeners exist, but building your own in Python gives you complete control over the logic and the ability to add custom signals like insider trading activity. This tutorial builds a working screener from scratch using yfinance.
1. Prerequisites and Setup
You need Python 3.8 or later and the following packages:
pip install yfinance pandas numpy
yfinance is an open-source Python library that downloads market data from Yahoo Finance. It provides price history, fundamental data (market cap, P/E ratio, sector), and corporate actions. It is free and does not require an API key, though it is subject to Yahoo Finance’s rate limits and terms of service.
Yahoo Finance may throttle or block requests if you make too many in rapid succession. When fetching .info for individual tickers, add a time.sleep(0.5) delay between calls. For price history, use yf.download() with multiple tickers in a single call, which is much faster and less likely to be throttled.
2. Defining the Stock Universe
A screener needs a universe of tickers to filter. The most common starting point is the S&P 500, which represents approximately 500 large-cap US stocks. You can obtain the current S&P 500 constituents by scraping the Wikipedia page that maintains the list, or by using a static list.
Option A: Scrape from Wikipedia
import pandas as pd
def get_sp500_tickers():
"""Fetch current S&P 500 tickers from Wikipedia."""
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
tables = pd.read_html(url)
df = tables[0] # First table on the page
tickers = df["Symbol"].tolist()
# Clean: replace dots with hyphens (e.g., BRK.B -> BRK-B for yfinance)
tickers = [t.replace(".", "-") for t in tickers]
return tickers
tickers = get_sp500_tickers()
print(f"Universe: {len(tickers)} tickers")
The Wikipedia table contains the ticker symbol, company name, GICS sector, GICS sub-industry, date added to the index, and other metadata. The pd.read_html() function parses all HTML tables on the page and returns them as a list of DataFrames. The S&P 500 constituents table is the first one.
Option B: Static List
For reproducibility or offline use, save the ticker list to a file:
import json
# Save once
with open("sp500_tickers.json", "w") as f:
json.dump(tickers, f)
# Load later
with open("sp500_tickers.json") as f:
tickers = json.load(f)
3. Fetching Price Data in Bulk
The fastest way to get price data for many tickers is yf.download(), which fetches data for all tickers in a single batch request. This is dramatically faster than calling yf.Ticker(sym).history() for each ticker individually.
import yfinance as yf
def fetch_price_data(tickers, period="1y"):
"""Download OHLCV data for all tickers in one batch."""
data = yf.download(
tickers,
period=period,
group_by="ticker",
auto_adjust=True,
threads=True
)
return data
price_data = fetch_price_data(tickers, period="1y")
print(f"Downloaded price data: {price_data.shape}")
The group_by="ticker" parameter organizes the result so you can access each ticker’s data as price_data["AAPL"], price_data["MSFT"], etc. Each has columns for Open, High, Low, Close, and Volume. The auto_adjust=True parameter adjusts prices for splits and dividends, giving you clean adjusted prices.
The threads=True parameter enables multithreaded downloading, which significantly speeds up the batch request. For 500 tickers with one year of daily data, this typically completes in 30–90 seconds depending on your connection.
4. Fetching Fundamental Data
Fundamental data (market cap, P/E ratio, sector, earnings growth) requires individual yf.Ticker(sym).info calls. This is the slow part of the screener — each call takes 0.5–2 seconds, and with 500 tickers you are looking at 4–15 minutes.
import time
def fetch_fundamentals(tickers):
"""Fetch fundamental data for each ticker. Slow: ~0.5-2s per ticker."""
results = []
for i, sym in enumerate(tickers):
try:
tk = yf.Ticker(sym)
info = tk.info
results.append({
"symbol": sym,
"name": info.get("shortName", ""),
"sector": info.get("sector", ""),
"market_cap": info.get("marketCap", 0),
"pe_ratio": info.get("trailingPE"),
"forward_pe": info.get("forwardPE"),
"earnings_growth": info.get("earningsGrowth"),
"avg_volume": info.get("averageDailyVolume10Day", 0),
"fifty_two_week_high": info.get("fiftyTwoWeekHigh"),
"fifty_two_week_low": info.get("fiftyTwoWeekLow"),
"current_price": info.get("currentPrice",
info.get("regularMarketPrice")),
})
except Exception as e:
print(f" Error fetching {sym}: {e}")
if (i + 1) % 50 == 0:
print(f" Fetched {i + 1}/{len(tickers)} tickers")
time.sleep(0.5) # Rate limiting
return pd.DataFrame(results)
fundamentals = fetch_fundamentals(tickers)
The time.sleep(0.5) call is essential. Without it, Yahoo Finance will start returning errors or empty responses after a few dozen requests. If you see frequent errors, increase the delay to 1.0 seconds.
5. Computing Technical Indicators
With price data downloaded, you can compute technical indicators for each ticker. The three most useful for screening are: RSI-14 (Relative Strength Index), price relative to the 200-day moving average, and proximity to the 52-week high.
RSI-14 (Relative Strength Index)
RSI was developed by J. Welles Wilder Jr. and published in his 1978 book New Concepts in Technical Trading Systems. It measures the speed and magnitude of recent price changes on a scale from 0 to 100. Conventionally, RSI below 30 indicates an oversold condition (potential buy), and RSI above 70 indicates overbought (potential sell).
def compute_rsi(series, period=14):
"""Compute RSI-14 using Wilder's smoothing method."""
delta = series.diff()
gain = delta.where(delta > 0, 0.0)
loss = -delta.where(delta < 0, 0.0)
# Wilder's smoothing (exponential moving average with alpha = 1/period)
avg_gain = gain.ewm(alpha=1/period, min_periods=period).mean()
avg_loss = loss.ewm(alpha=1/period, min_periods=period).mean()
rs = avg_gain / avg_loss
rsi = 100 - (100 / (1 + rs))
return rsi
Wilder’s original smoothing method uses a specific exponential moving average with alpha = 1/period. This differs from a simple moving average and from the default EMA in most charting packages. The ewm(alpha=1/period) call in pandas replicates Wilder’s smoothing.
200-Day Moving Average
The 200-day simple moving average (200 MA) is one of the most widely followed trend indicators. When a stock’s price is above its 200 MA, the long-term trend is considered bullish. When below, bearish. This is not a predictive signal with strong academic backing, but it is a widely used filter to avoid trading against the primary trend.
def compute_ma(series, window=200):
"""Compute simple moving average."""
return series.rolling(window=window).mean()
52-Week High Proximity
Stocks trading near their 52-week high tend to exhibit positive momentum. George, Hwang, and Li found in their 2004 study (Journal of Finance) that the 52-week high is a significant predictor of future returns. This is related to the anchoring bias: investors use the 52-week high as a reference point, and stocks approaching it attract attention and buying pressure.
def high_proximity(current_price, fifty_two_week_high):
"""How close is the current price to the 52-week high (0 to 1)."""
if fifty_two_week_high and fifty_two_week_high > 0:
return current_price / fifty_two_week_high
return None
6. Computing Technical Signals for All Tickers
Now combine the price data and technical computations:
def compute_technicals(price_data, tickers):
"""Compute RSI-14 and 200 MA for each ticker."""
technicals = {}
for sym in tickers:
try:
close = price_data[sym]["Close"].dropna()
if len(close) < 200:
continue
rsi = compute_rsi(close, 14)
ma200 = compute_ma(close, 200)
technicals[sym] = {
"rsi_14": round(rsi.iloc[-1], 2),
"ma_200": round(ma200.iloc[-1], 2),
"last_close": round(close.iloc[-1], 2),
"above_200ma": close.iloc[-1] > ma200.iloc[-1],
}
except Exception:
continue
return pd.DataFrame.from_dict(technicals, orient="index")
tech_df = compute_technicals(price_data, tickers)
tech_df.index.name = "symbol"
tech_df = tech_df.reset_index()
7. Applying Filters
Now merge the fundamental and technical data and apply your screening criteria. The filters below are a reasonable starting point for identifying liquid, large-cap stocks with favorable technical and valuation characteristics.
def apply_filters(fundamentals, technicals):
"""Merge data and apply screening filters."""
df = fundamentals.merge(technicals, on="symbol", how="inner")
# === Liquidity filters ===
df = df[df["market_cap"] > 1_000_000_000] # Market cap > $1B
df = df[df["avg_volume"] > 500_000] # Avg volume > 500K shares
df = df[df["current_price"] > 5.0] # Price > $5 (avoid penny stocks)
# === Technical filters ===
# RSI: not overbought (RSI < 70)
df = df[df["rsi_14"] < 70]
# Above 200 MA (long-term uptrend)
df = df[df["above_200ma"] == True]
# === Fundamental filters ===
# P/E ratio: reasonable range (avoid negative earnings and extreme valuations)
df = df[df["pe_ratio"].notna()]
df = df[(df["pe_ratio"] > 10) & (df["pe_ratio"] < 25)]
return df
screened = apply_filters(fundamentals, tech_df)
print(f"Passed filters: {len(screened)} stocks")
Why These Specific Filters?
- Market cap > $1B: Excludes micro-caps and small-caps where data quality is lower, liquidity is thinner, and bid-ask spreads are wider. For most retail and institutional investors, large and mid caps provide sufficient opportunity.
- Average volume > 500K shares: Ensures you can enter and exit positions without significant market impact. Illiquid stocks can have wide spreads and slippage that erode returns.
- Price > $5: Many institutional investors and brokerages have minimum price thresholds. Stocks below $5 are often considered penny stocks with higher manipulation risk.
- RSI < 70: Avoids entering stocks that are already overbought. An RSI above 70 suggests the stock has risen rapidly and may be due for a pullback.
- Above 200 MA: Confirms the long-term trend is bullish. Trading with the trend has historically been more profitable than fighting it.
- P/E between 10 and 25: Filters out companies with negative earnings (P/E is meaningless for unprofitable companies) and extremely expensive growth stocks. This is a value-oriented filter that can be adjusted based on your strategy.
8. Scoring and Ranking
After filtering, you typically have dozens or hundreds of stocks that pass all criteria. A composite score ranks them by desirability. Assign points for each characteristic:
def compute_score(df):
"""Compute a composite score for ranking screened stocks."""
df = df.copy()
df["score"] = 0.0
# RSI score: lower RSI = more oversold = higher score
# RSI 30 -> 5 points, RSI 50 -> 2.5 points, RSI 70 -> 0 points
df["rsi_score"] = ((70 - df["rsi_14"]) / 40 * 5).clip(0, 5)
# 52-week high proximity: closer to high = more momentum = higher score
df["high_prox"] = df.apply(
lambda r: high_proximity(r["current_price"],
r["fifty_two_week_high"]),
axis=1
)
# Within 10% of 52-week high: 3 points; within 5%: 5 points
df["high_score"] = (df["high_prox"] * 5).clip(0, 5).fillna(0)
# P/E score: prefer lower P/E within range (value tilt)
# P/E 10 -> 5 points, P/E 17.5 -> 2.5 points, P/E 25 -> 0 points
df["pe_score"] = ((25 - df["pe_ratio"]) / 15 * 5).clip(0, 5)
# Earnings growth bonus: positive growth gets extra points
df["growth_score"] = df["earnings_growth"].apply(
lambda x: min(3.0, x * 10) if pd.notna(x) and x > 0 else 0
)
# Volume score: higher relative volume = more institutional interest
vol_median = df["avg_volume"].median()
df["vol_score"] = (df["avg_volume"] / vol_median).clip(0, 3)
# Composite
df["score"] = (
df["rsi_score"] +
df["high_score"] +
df["pe_score"] +
df["growth_score"] +
df["vol_score"]
)
return df.sort_values("score", ascending=False)
ranked = compute_score(screened)
print(ranked[["symbol", "name", "sector", "current_price", "pe_ratio",
"rsi_14", "score"]].head(20).to_string())
The scoring system above combines value (low P/E), momentum (52-week high proximity), mean reversion (low RSI), growth (earnings growth), and liquidity (volume). You can adjust the weights by multiplying each sub-score by a coefficient that reflects your investment philosophy.
9. Caching Results
Fetching data for 500 tickers takes several minutes. To avoid repeating this during development or when running the screener multiple times per day, save the results to disk:
import json
from datetime import datetime
def save_results(df, filename="screener_results.json"):
"""Save screener results to JSON."""
output = {
"timestamp": datetime.now().isoformat(),
"count": len(df),
"results": df.to_dict(orient="records"),
}
with open(filename, "w") as f:
json.dump(output, f, indent=2, default=str)
print(f"Saved {len(df)} results to {filename}")
save_results(ranked)
For the fundamental data (which changes slowly), you can cache it for an entire trading day. Price data and technical indicators should be refreshed at least daily, or intraday if you need current signals.
10. Performance Optimization
The bottleneck in this screener is the individual .info calls for fundamental data. Here are strategies to speed things up:
Batch Price Downloads
Always use yf.download(tickers, ...) for price data rather than looping through individual tickers. The batch method makes a single HTTP request for all tickers, which is 10–50x faster than individual calls.
Parallel Fundamental Fetching
You can use Python’s concurrent.futures.ThreadPoolExecutor to fetch fundamental data in parallel. Be conservative with the number of threads to avoid getting blocked:
from concurrent.futures import ThreadPoolExecutor, as_completed
def fetch_single_info(sym):
"""Fetch .info for one ticker with error handling."""
try:
info = yf.Ticker(sym).info
return {
"symbol": sym,
"market_cap": info.get("marketCap", 0),
"pe_ratio": info.get("trailingPE"),
"sector": info.get("sector", ""),
"avg_volume": info.get("averageDailyVolume10Day", 0),
"current_price": info.get("currentPrice",
info.get("regularMarketPrice")),
"fifty_two_week_high": info.get("fiftyTwoWeekHigh"),
"earnings_growth": info.get("earningsGrowth"),
"name": info.get("shortName", ""),
}
except Exception:
return None
def fetch_fundamentals_parallel(tickers, max_workers=5):
"""Fetch fundamentals using thread pool. Keep max_workers low."""
results = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {executor.submit(fetch_single_info, sym): sym
for sym in tickers}
for future in as_completed(futures):
result = future.result()
if result:
results.append(result)
return pd.DataFrame(results)
With max_workers=5, this reduces the fundamental fetch time from roughly 250 seconds (sequential, 0.5s per ticker) to roughly 50 seconds for 500 tickers. Do not set max_workers higher than 10 or you risk being rate-limited.
11. Adding an Insider Trading Filter
One powerful extension is filtering for stocks with recent insider buying. The yfinance library provides access to insider transaction data through the ticker.insider_transactions attribute:
def check_insider_buying(sym, days=90):
"""Check if insiders have been net buyers in the last N days."""
try:
tk = yf.Ticker(sym)
txns = tk.insider_transactions
if txns is None or txns.empty:
return 0
# Filter to recent transactions
cutoff = pd.Timestamp.now() - pd.Timedelta(days=days)
if "Start Date" in txns.columns:
txns = txns[pd.to_datetime(txns["Start Date"]) >= cutoff]
# Count purchases vs sales
buys = txns[txns["Text"].str.contains("Purchase", case=False, na=False)]
sells = txns[txns["Text"].str.contains("Sale", case=False, na=False)]
return len(buys) - len(sells)
except Exception:
return 0
A positive insider net buy count (more purchases than sales in the past 90 days) can be used as an additional scoring factor or a hard filter. Academic research has consistently shown that insider purchases are informative: Lakonishok and Lee (2001) in the Review of Financial Studies found that insider purchases predict future stock returns, especially in smaller firms.
For production-quality insider trading analysis, yfinance’s insider data is limited. The SEC EDGAR system provides comprehensive Form 4 filings with exact transaction dates, dollar amounts, and insider roles. Alpha Suite processes these filings directly from EDGAR for institutional-grade signal generation.
12. The Complete Script
Here is the full runnable script that ties everything together:
#!/usr/bin/env python3
"""
Stock Screener - filters S&P 500 stocks by technical
and fundamental criteria, ranks by composite score.
Usage: python screener.py
Output: screener_results.json
"""
import time
import json
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor, as_completed
import numpy as np
import pandas as pd
import yfinance as yf
# --- Universe ---
def get_sp500_tickers():
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
df = pd.read_html(url)[0]
return [t.replace(".", "-") for t in df["Symbol"].tolist()]
# --- Data Fetching ---
def fetch_price_data(tickers, period="1y"):
return yf.download(tickers, period=period,
group_by="ticker", auto_adjust=True, threads=True)
def fetch_single_info(sym):
try:
info = yf.Ticker(sym).info
return {
"symbol": sym,
"name": info.get("shortName", ""),
"sector": info.get("sector", ""),
"market_cap": info.get("marketCap", 0),
"pe_ratio": info.get("trailingPE"),
"forward_pe": info.get("forwardPE"),
"earnings_growth": info.get("earningsGrowth"),
"avg_volume": info.get("averageDailyVolume10Day", 0),
"fifty_two_week_high": info.get("fiftyTwoWeekHigh"),
"fifty_two_week_low": info.get("fiftyTwoWeekLow"),
"current_price": info.get("currentPrice",
info.get("regularMarketPrice")),
}
except Exception:
return None
def fetch_fundamentals(tickers, max_workers=5):
results = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {executor.submit(fetch_single_info, s): s
for s in tickers}
done = 0
for future in as_completed(futures):
result = future.result()
if result:
results.append(result)
done += 1
if done % 100 == 0:
print(f" Fundamentals: {done}/{len(tickers)}")
return pd.DataFrame(results)
# --- Technical Indicators ---
def compute_rsi(series, period=14):
delta = series.diff()
gain = delta.where(delta > 0, 0.0)
loss = -delta.where(delta < 0, 0.0)
avg_gain = gain.ewm(alpha=1/period, min_periods=period).mean()
avg_loss = loss.ewm(alpha=1/period, min_periods=period).mean()
rs = avg_gain / avg_loss
return 100 - (100 / (1 + rs))
def compute_technicals(price_data, tickers):
rows = {}
for sym in tickers:
try:
close = price_data[sym]["Close"].dropna()
if len(close) < 200:
continue
rsi = compute_rsi(close, 14)
ma200 = close.rolling(200).mean()
rows[sym] = {
"rsi_14": round(float(rsi.iloc[-1]), 2),
"ma_200": round(float(ma200.iloc[-1]), 2),
"last_close": round(float(close.iloc[-1]), 2),
"above_200ma": bool(close.iloc[-1] > ma200.iloc[-1]),
}
except Exception:
continue
df = pd.DataFrame.from_dict(rows, orient="index")
df.index.name = "symbol"
return df.reset_index()
# --- Filtering & Scoring ---
def apply_filters(fund_df, tech_df):
df = fund_df.merge(tech_df, on="symbol", how="inner")
df = df[df["market_cap"] > 1_000_000_000]
df = df[df["avg_volume"] > 500_000]
df = df[df["current_price"] > 5.0]
df = df[df["rsi_14"] < 70]
df = df[df["above_200ma"] == True]
df = df[df["pe_ratio"].notna()]
df = df[(df["pe_ratio"] > 10) & (df["pe_ratio"] < 25)]
return df
def compute_score(df):
df = df.copy()
df["rsi_score"] = ((70 - df["rsi_14"]) / 40 * 5).clip(0, 5)
prox = np.where(
df["fifty_two_week_high"] > 0,
df["current_price"] / df["fifty_two_week_high"],
0
)
df["high_score"] = np.clip(prox * 5, 0, 5)
df["pe_score"] = ((25 - df["pe_ratio"]) / 15 * 5).clip(0, 5)
df["growth_score"] = df["earnings_growth"].apply(
lambda x: min(3.0, x * 10) if pd.notna(x) and x > 0 else 0
)
med_vol = df["avg_volume"].median()
df["vol_score"] = (df["avg_volume"] / med_vol).clip(0, 3)
df["score"] = (df["rsi_score"] + df["high_score"] +
df["pe_score"] + df["growth_score"] + df["vol_score"])
return df.sort_values("score", ascending=False)
# --- Main ---
if __name__ == "__main__":
print("=== Stock Screener ===")
print("1. Fetching S&P 500 tickers...")
tickers = get_sp500_tickers()
print(f" Universe: {len(tickers)} tickers")
print("2. Downloading price data (1 year)...")
prices = fetch_price_data(tickers, period="1y")
print("3. Fetching fundamental data...")
fund_df = fetch_fundamentals(tickers, max_workers=5)
print(f" Got fundamentals for {len(fund_df)} tickers")
print("4. Computing technical indicators...")
tech_df = compute_technicals(prices, tickers)
print(f" Computed technicals for {len(tech_df)} tickers")
print("5. Applying filters...")
screened = apply_filters(fund_df, tech_df)
print(f" Passed filters: {len(screened)} stocks")
print("6. Scoring and ranking...")
ranked = compute_score(screened)
cols = ["symbol", "name", "sector", "current_price",
"pe_ratio", "rsi_14", "score"]
print("\nTop 20 stocks:")
print(ranked[cols].head(20).to_string(index=False))
output = {
"timestamp": datetime.now().isoformat(),
"count": len(ranked),
"results": ranked[cols].to_dict(orient="records"),
}
with open("screener_results.json", "w") as f:
json.dump(output, f, indent=2, default=str)
print(f"\nSaved {len(ranked)} results to screener_results.json")
13. Extension Ideas
Once you have the basic screener working, here are directions to extend it:
- Sector rotation filter: Compute relative strength for each GICS sector over the trailing 1–3 months. Overweight stocks from the top-performing sectors. This captures the sector rotation effect documented in academic research.
- Earnings date proximity: Flag stocks reporting earnings in the next 7–14 days. Some traders avoid entering positions right before earnings (binary event risk); others specifically target pre-earnings setups.
- Insider trading overlay: Add the insider buying filter from Section 11. Stocks with both strong technicals and recent insider purchases represent a confluence of signals.
- Multi-timeframe confirmation: Require the stock to be above the 200 MA on the daily chart AND above the 50 MA on the weekly chart. Multi-timeframe agreement reduces false signals.
- Volatility filter: Compute ATR (Average True Range) as a percentage of price. Filter for stocks in a desirable volatility range — not so volatile that stops are constantly triggered, not so quiet that there is no opportunity.
- Scheduled execution: Run the screener automatically each evening after market close using cron (Linux/macOS) or Task Scheduler (Windows). Save results to a database for tracking how screening results evolve over time.
Alpha Suite takes this concept much further: instead of a general-purpose screener, it focuses specifically on insider trading signals from SEC Form 4 filings. The system processes thousands of filings daily, applies conviction scoring based on insider role, transaction size, clustering, and timing, overlays technical indicators (RSI, moving averages, ATR, relative strength), and generates quantitative signals with take-profit targets, stop-loss levels, and Kelly-criterion position sizes. The screener you built in this tutorial is the starting point; Alpha Suite is what happens when you productionize and specialize that concept.