
Chunkr
W24AI Search Engine for Research
Open source API service to parse complex documents
AI Search Engine + API We've built search that returns 5x more relevant results compared to Google Scholar. Our search engine is free to use, and we offer an API for LLM focused applications. Database: Over 100M research objects - covering 16 sources types, ~12.5K journals & repositories, and ~65K concepts.
Battle-tested + highly modular vision infrastructure to convert PDFs, PPTs, Word, Excel, PNG, and JPEGs into LLM-ready data. We started by building lumina.sh - where we needed to parse ~600M pages of scientific literature. The researchers didn't care - but devs wanted our ingestion pipeline. So we built chunkr instead. We offer high quality layout analysis, OCR, bounding boxes, granular VLM controls, semantic chunking, and all the last mile engineering that goes into building standout AI applications. Common use-cases include RAG, and automating document workflows like invoices/medical reports -> database.
The company shifted from building an AI search engine for research (end-user product/search/database) to providing developer infrastructure/APIs for document parsing and vision pipelines—fundamentally different products and user bases.