MIT and IBM released ChartNet, a 1.7-million-sample synthetic training dataset that lets compact open-source vision-language ...
Nvidia and FPT released 900,000 synthetic personas on Hugging Face to train AI models that understand Vietnamese language, ...
The Hobby-Eberly Telescope Dark Energy Experiment (HETDEX)—which recently completed the largest survey ever taken of the early universe—has released all of its immense, information-rich database to ...
Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools. The ...