CONSULTATION WORKSHOP ON BUILDING AN OPEN VIETNAMESE DATASET FOR AI RESEARCH, DEVELOPMENT, AND APPLICATION
 As part of the National Innovation Day 2025, the workshop titled “Consultation on Building an Open Vietnamese Dataset for AI Research, Development, and Application” took place with the participation of NIC, Meta, AI for Vietnam, along with numerous experts and leading technology enterprises.
At the event, ViGen introduced its trial version – a significant milestone for Vietnam’s AI ecosystem, marking a major step forward in building an open Vietnamese language data platform to support AI research and development.
Three key outcomes were announced:
🔹 Vi-Primer 1.0 – the largest Vietnamese pre-training dataset to date (50 billion tokens), released under an open license.
🔹 Five comprehensive evaluation frameworks with over 10,000 samples, designed to assess Vietnamese AI models across language, knowledge, reasoning, and real-world application.
🔹 ViGen’s trial platform – an open collaboration space for organizations, enterprises, and researchers to contribute, train, and evaluate models, promoting transparency and knowledge sharing.
The launch of ViGen is not only a major technological milestone but also a call to action for Vietnam’s AI community — uniting enterprises, research institutes, universities, and the broader society to co-create an open, sustainable AI ecosystem that reflects Vietnam’s identity.