Integration Guide
How to integrate AI-powered lawsuit summarization into the LELU workflow.
Overview
This tool automates the extraction of structured data from NYPD misconduct lawsuit complaint PDFs. It's designed to match the LELU (Law Enforcement Look Up) database taxonomy exactly, making it easy to import extracted data into Airtable.
What It Does
- OCR scanned complaint PDFs using vision AI models
- Extract narrative summaries in ProPublica style
- Classify using exact LELU taxonomy (allegations, force types, themes, etc.)
- Extract officer names, badge numbers, precincts, and addresses
- Provide source provenance (exact text + page number) for every extraction
How It Works
→
Qwen3 VL / DeepSeek
→
Llama 4 Scout
→
Structured Data
Extraction Modes
| Mode | OCR | Extraction | Cost/Page | Best For |
|---|---|---|---|---|
| DeepSeek | Self-hosted (EC2) | Llama 4 | ~$0.0002 | Large batches, lowest cost |
| Two-Step | Qwen3 VL (Bedrock) | Llama 4 | ~$0.003 | Balance of cost and quality |
| Unified | Qwen3 VL (single pass) | ~$0.004 | Simplest pipeline | |
Workflow Options
Option 1: Web Interface
Use the Demo page to review individual cases with side-by-side comparison of AI extraction vs. human summaries.
Option 2: Command Line Batch
# Activate virtual environment source venv/bin/activate # Run extraction on all PDFs in a folder python run_vision_extraction.py --mode two-step --input-dir ./complaints/ # Export results to CSV python export_results.py --format csv --output lelu_import.csv
Option 3: API Integration
# JSON API endpoint GET /api/export/json?method=two_step # CSV export GET /api/export/csv?method=two_step
Output Formats
CSV/TSV for Airtable
Export to CSV or TSV format with pipe-separated multi-value fields. Ready for Airtable import.
| Column | Example |
|---|---|
| case_id | owens_troy |
| summary | On December 9, 2015, Troy Owens was arrested... |
| allegations | Excessive force|False arrest|Malicious prosecution |
| force_types | Non-weapon physical force|Tight handcuffs |
| officers | John Doe I|John Doe II |
JSON with Provenance
Full JSON includes source text and page numbers for every extracted field.
{
"allegations": [
{
"type": "Excessive force/assault and battery",
"provenance": {
"source_text": "defendants... wrongfully touched, assaulted and battered...",
"page_number": 3,
"paragraph": "Second Cause of Action, Paragraph 10"
}
}
]
}
LELU Taxonomy Coverage
The extraction prompt uses the exact LELU taxonomy from Airtable.
Allegations (18)
- Abuse of process
- ADA/disability claim
- Conversion
- Excessive force/assault and battery
- Excessive pre-arraignment detention
- Failure to intervene
- False arrest/False imprisonment
- Indifference/Denial of medical care
- Malicious prosecution
- Monell claim
- Municipal liability
- Negligence
- Retaliation for exercise of constitutional rights
- Strip search
- Unconstitutional condition of confinement/cruel and unusual punishment
- Unlawful search and seizure
- Violation of NY Civil Rights Law ยง50-a
- Wrongful conviction/imprisonment
Force Types (12)
- Baton/asps/object
- Canine
- Chokehold
- Gun pointed (not fired)
- Mace/Pepper spray
- Non-weapon physical force
- Shooting/Discharge of firearm
- Sonic weapon/LRAD
- Taser
- Tight handcuffs
- Unknown
- Vehicle/Collision
Themes (10)
- Disability
- Homelessness
- Immigration
- LGBTQ+ identity
- Mental health
- Police assisting EMS
- Race and/or ethnicity
- Religion
- Sex/gender
- Youth
Locations (17)
- At residence but not inside (i.e. Lobby/stairwell/yard/etc)
- Bike
- Bus/Subway
- Commercial
- Hospital/Health Clinic
- Inside residence
- Motor vehicle
- NYCHA buildings and/or grounds
- Other/Unknown
- Police vehicle
- Precinct
- Prison/Jail
- Public/Open space
- School
- Street
- Social Services
- Workplace
Defendant Types (7)
- City of New York
- Correction Officer
- Department of Corrections
- Individual Police Officer
- NYPD
- Other City Agency
- Other Individual
Technical Requirements
- AWS Account with Bedrock access (Qwen3 VL, Llama 4)
- Python 3.10+ with boto3, PyMuPDF
- Optional: EC2 g5.xlarge for self-hosted DeepSeek OCR
Quick Setup
git clone https://github.com/tiwhi/cap-summary-prototype cd cap-summary-prototype python -m venv venv && source venv/bin/activate pip install -r requirements.txt # Configure AWS credentials aws configure # Test extraction python run_vision_extraction.py --mode two-step --pdf sample.pdf
Next Steps
- Review demo cases to see extraction quality
- Compare accuracy metrics across extraction modes
- Estimate costs for your document volume
- Export sample data for testing Airtable import