Congressional Trading Data for AI Models — Claude, GPT-4 & Python
The GovGreed API returns structured JSON covering Triple Signals, bill ML scores, executive pre-vote buys, and committee markup calendars. This guide shows how to pipe that data into Claude API, GPT-4, LangChain, scikit-learn, and custom ML pipelines.
What Is Congressional Trading Data — and Why Does It Work in AI Models?
Congressional trading data is a category of alternative data — non-traditional financial information derived from public government disclosures rather than market feeds. Members of Congress must disclose stock trades within 45 days under the STOCK Act (2012). Corporate executives must disclose within 2 business days under SEC Section 16(b). Lobbyists file quarterly with the Senate LDA. Campaign contributions are reported to the FEC.
None of this data appears in Bloomberg, Refinitiv, or standard market data feeds. That information gap is precisely the edge.
Why it trains well: Congressional trading data has the rare property of being labeled. You know the outcome — did the bill pass? Did the stock move? This creates a supervised learning setup: train on historical signal patterns, validate on bill passage outcomes, deploy forward.
The four signal types available via API
| Signal | API Endpoint | Why It's Useful for AI/ML |
|---|---|---|
| Triple Signal | rpc/get_triple_signals |
High-precision alert: committee overlap + trade + contribution. Structured, categorical, binary flag + score. |
| Bill Investability Score | bills?investability_score=gte.70 |
25-feature ML score (0–100). Use as a feature in your own model or as a pre-filter for signal universe. |
| Exec Pre-Vote Buy | rpc/exec_timing_signals |
Time-series feature: days before vote, position size, officer flag. Natural fit for LSTM/temporal models. |
| Committee Markup Calendar | upcoming_markups |
Event-driven signal: scheduled markup = catalyst. Combine with bill score for timed entry. |
Fetching Data from the GovGreed API
The GovGreed API is a standard PostgREST REST interface. No SDK required — plain HTTP GET with your API key in the header.
# pip install requests pandas import requests import pandas as pd from datetime import datetime BASE_URL = "https://api.govgreed.com" HEADERS = { "apikey": "YOUR_GOVGREED_API_KEY", "Authorization": "Bearer YOUR_GOVGREED_API_KEY", "Accept": "application/json", } def get_triple_signals(min_score=60, limit=50): """Get Triple Signals ranked by score. These are bills where a committee member overlaps with a stock trade AND campaign contribution.""" resp = requests.get( f"{BASE_URL}/rest/v1/rpc/get_triple_signals", params={"min_score": min_score, "limit": limit}, headers=HEADERS ) resp.raise_for_status() return pd.DataFrame(resp.json()) def get_high_investability_bills(min_score=70): """Bills scoring ≥70 on investability (5.4× more likely to pass).""" resp = requests.get( f"{BASE_URL}/rest/v1/bills", params={ "congress": "eq.119", "investability_score": f"gte.{min_score}", "order": "investability_score.desc", "select": "id,bill_number,title,investability_score,hot_score,committee_name" }, headers=HEADERS ) return pd.DataFrame(resp.json()) def get_exec_timing_signals(min_score=5.0): """Exec buys with high timing scores — officer bought before bill vote.""" resp = requests.get( f"{BASE_URL}/rest/v1/exec_timing_signals_best", params={ "timing_score": f"gte.{min_score}", "transaction_type": "eq.Purchase", "order": "timing_score.desc", "limit": "30" }, headers=HEADERS ) return pd.DataFrame(resp.json()) # Usage signals_df = get_triple_signals(min_score=70) bills_df = get_high_investability_bills() print(f"Triple signals: {len(signals_df)} | High bills: {len(bills_df)}")
Using Claude API to Analyze Congressional Signals
Claude is well-suited for congressional signal analysis because it can reason about conflicting multi-factor patterns, generate plain-English investment thesis summaries, and flag which signals are most statistically unusual given historical context.
# pip install anthropic import anthropic import json client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_KEY") def analyze_signal_with_claude(signal_data: dict) -> str: """Pass a GovGreed Triple Signal to Claude for analysis. Returns an investment thesis summary.""" prompt = f"""You are analyzing a congressional insider trading signal. Here is the signal data in JSON: {json.dumps(signal_data, indent=2)} Fields explanation: - ticker: stock symbol affected by the bill - bill_number: the bill being voted on - investability_score: ML score 0-100 (≥70 = high signal) - committee_member: name of the politician with oversight - trade_amount: dollar value of their stock trade - days_before_vote: how far before the vote they traded - exec_buys: number of corporate execs also buying this stock - campaign_contributions: industry money received by committee member Provide: 1. A 2-sentence investment thesis (what the signal suggests) 2. The top risk factor for this signal 3. Confidence level (Low/Medium/High) with brief justification 4. Suggested action: WATCH / ENTER / AVOID Be specific and analytical. Reference the data.""" message = client.messages.create( model="claude-sonnet-4-6", max_tokens=400, messages=[{"role": "user", "content": prompt}] ) return message.content[0].text # Example usage with a real signal sample_signal = { "ticker": "NVDA", "bill_number": "HR.7530", "bill_title": "CHIPS and Science Act Expansion", "investability_score": 84.2, "committee_member": "[Senator Name]", "committee": "Senate Commerce, Science, and Transportation", "trade_amount": 485000, "days_before_vote": 38, "exec_buys": 3, "campaign_contributions": 125000, "triple_signal": True } thesis = analyze_signal_with_claude(sample_signal) print(thesis)
Using OpenAI GPT-4 for Batch Signal Ranking
GPT-4 with structured outputs is useful for batch-ranking multiple signals and returning machine-readable JSON for downstream processing.
# pip install openai from openai import OpenAI import json client = OpenAI(api_key="YOUR_OPENAI_KEY") def rank_signals_gpt4(signals: list[dict]) -> list: """Rank a list of Triple Signals using GPT-4. Returns signals with AI confidence score added.""" resp = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, messages=[ { "role": "system", "content": "You are a quantitative analyst ranking congressional trading signals. Return JSON with ranked_signals array." }, { "role": "user", "content": f"Rank these signals by expected alpha. Add ai_confidence (0-1) and ai_rank. Signals: {json.dumps(signals)}" } ] ) result = json.loads(resp.choices[0].message.content) return result.get("ranked_signals", [])
LangChain + Vector DB: Build a Congressional Signal RAG System
For more sophisticated AI analysis, store GovGreed data in a vector database (Pinecone, Qdrant, Supabase Vector) and use LangChain to build a retrieval-augmented system. Ask questions like "Which bills in the semiconductor sector have the highest triple signal count?" or "Show me historical patterns for defense sector signals in election years."
# pip install langchain langchain-openai langchain-community from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain.chains import RetrievalQA from langchain.docstore.document import Document # Convert bill signals to documents for vector storage def signals_to_documents(bills_df): docs = [] for _, row in bills_df.iterrows(): content = f"""Bill: {row['bill_number']} — {row['title']} Committee: {row['committee_name']} Investability Score: {row['investability_score']:.1f}/100 Triple Signal: {row.get('triple_signal', False)} Tickers Affected: {row.get('tickers_affected', 'N/A')} Exec Buys: {row.get('exec_buy_count', 0)}""" docs.append(Document( page_content=content, metadata={"bill_number": row["bill_number"], "score": row["investability_score"]} )) return docs # Build retriever and QA chain embeddings = OpenAIEmbeddings() docs = signals_to_documents(bills_df) vectorstore = FAISS.from_documents(docs, embeddings) qa = RetrievalQA.from_chain_type( llm=ChatOpenAI(model="gpt-4o"), retriever=vectorstore.as_retriever(k=5) ) # Natural language queries over congressional data answer = qa.invoke("Which semiconductor bills have triple signals?") print(answer["result"])
Scikit-Learn: Train Your Own Congressional Alpha Model
The GovGreed API exposes the raw features behind the investability score. You can pull those features and train your own model — either to reproduce the score or to add your own signals.
# pip install scikit-learn lightgbm pandas import pandas as pd from lightgbm import LGBMClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score # Fetch bill features from GovGreed API resp = requests.get( f"{BASE_URL}/rest/v1/bill_features", params={"congress": "in.(117,118)"}, # historical for training headers=HEADERS ) features_df = pd.DataFrame(resp.json()) # Key features available FEATURE_COLS = [ "insider_count", "insider_trade_value", "has_triple_signal", "sector_count", "impacted_ticker_count", "related_contributions", "exec_ahead_vote_count", "exec_officer_buy_count", "markup_scheduled", "sponsor_party_d", "cosponsors_count", "committee_seniority_avg" ] X = features_df[FEATURE_COLS].fillna(0) y = (features_df["enacted"] == True).astype(int) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) model = LGBMClassifier(n_estimators=200, learning_rate=0.05) model.fit(X_train, y_train) auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]) print(f"AUC: {auc:.3f}") # baseline ~0.71 with GovGreed features
Frequently Asked Questions
requests or httpx. For signal processing: pandas, numpy. For ML: scikit-learn, lightgbm, xgboost. For AI analysis: anthropic SDK or openai SDK. For LLM pipelines: langchain. For execution: alpaca-trade-api or ibapi. For backtesting: backtrader or zipline-reloaded.offset/limit parameters to paginate through large datasets.