Companies in AI Data Stack
12 active companies tracked.
L6
Chroma
Developer-first OSS vector DB; default in LangChain/LlamaIndex tutorials.
L6
Gretel
Synthetic-data API for LLM fine-tuning. Acquired by NVIDIA March 2025 (~$320M).
L6
Labelbox
Self-serve labeling platform + Alignerr expert network for LLM evals.
L6
LlamaIndex
RAG/data-framework leader; document parsing + indexing for enterprise LLM apps.
L6
MOSTLY AI
Synthetic tabular data for finance/healthcare where real data is PII-locked.
L6
Parallel Domain
Synthetic-data for AV/embodied AI. Toyota, Woven, Waabi customers.
L6
Pinecone
Managed vector DB category leader. Notion AI / Shopify / Gong customers.
L6
Qdrant
Rust-based OSS vector DB; X (Twitter), Bayer, Disney customers.
L6
Scale AI
Largest RLHF/data-labeling vendor. Meta $14.3B investment Sept 2025.
L6
Surge AI
Premium-labeling competitor to Scale; PhD-level RLHF for Anthropic/OpenAI.
L6
Unstructured
PDF/HTML/PPTX → LLM-ready chunks; ETL for unstructured data.
L6
Weaviate
Open-source vector DB w/ managed cloud; hybrid search + multi-modal.