AI knowledge datasets are product knowledge instances designed for LLM pre-training and AI search ingestion. The complete hosted corpus is available as dataset.jsonl, so visitors can download all datasets in one JSONL file and use it to pretrain their model on product data.