LangExtract uses large language models to transform unstructured text into structured data with precise source grounding. No training required - just provide examples and let AI do the rest.
Built for production-scale text processing with enterprise-grade reliability and precision.
Leverages state-of-the-art language models like Gemini to extract structured information with unprecedented accuracy.
Enforces JSON schemas on model outputs, ensuring consistent and well-structured data extraction every time.
Every extraction is precisely mapped to its source location with character-level precision for complete traceability.
Define new extraction tasks instantly with prompts and examples - no model training or labeled data required.
Works across multiple languages seamlessly, powered by Google's multilingual language models.
Intelligent chunking and parallel processing handles documents of any size with optimal performance.
Define your extraction task with natural language prompts and examples. LangExtract handles the complex LLM orchestration behind the scenes.
import langextract as lx
# Define extraction task with examples
instructions = """
Extract person details from text:
- Full name
- Job title
- Key action performed
"""
example = lx.data.ExampleData(
text="Dr. Sarah Johnson, the lead researcher, discovered a new compound.",
extractions=[
lx.data.Extraction(
extraction_class="person",
extraction_text="Dr. Sarah Johnson",
attributes={
"title": "lead researcher",
"action": "discovered a new compound"
}
)
]
)
# Extract from new text
result = lx.extract(
text_or_documents="Engineer Alice Williams designed the software architecture.",
prompt_description=instructions,
examples=[example],
model_id="gemini-2.5-flash"
)
# Access structured results with source grounding
for extraction in result.extractions:
print(f"{extraction.extraction_class}: {extraction.extraction_text}")
print(f"Attributes: {extraction.attributes}")
print(f"Source position: {extraction.char_start}-{extraction.char_end}")
From content moderation to business intelligence, LangExtract adapts to your specific needs.
Extract and categorize policy violations, PII, and harmful content with precise source attribution.
Understand user intent and extract structured query parameters from natural language input.
Structure clinical notes, extract medications, and process radiology reports for EHR systems.
Parse reports, emails, and documents to extract metrics, actions, and key business insights.
Install LangExtract and start extracting structured data from your text in minutes.
pip install langextract
Comprehensive guides and API reference
Real-world code examples and tutorials
Community support and issue tracking