中文普通话语音识别开源数据集,截止到2024.01.02

数据集时长(h)人数标注准确率下载链接开源协议备注
thchs303040-openslr.orgApache License v.2.0-
Primewords_set1100296>98%openslr.orgCC BY-NC-ND 4.0-
aishell1178400>95%openslr.orgApache License v.2.0-
ST-CMDS122855-openslr.orgCC BY-NC-ND 4.0-
aishell210001991>96%希尔贝壳—专注于人工智能大数据和技术的创新-需要申请
aidatatang_200zh200600>98%openslr.orgCC BY-NC-ND 4.0-
aidatatang_1505zh15056408>98%数据堂-AI数据服务-人工智能数据采集与标注CC BY-NC-ND 4.0需要申请
Speechocean10.3320>98%openslr.orgCC BY-NC-ND 4.0-
MAGICDATA7551080>98%openslr.orgCC BY-NC-ND 4.0-
Common Voice703333-Common VoiceCC-0mp3格式
aishell385218>98%openslr.orgApache License v.2.0
TAL_ASR10080+好未来AI开放平台-数据集 (100tal.com)注册即可下载
WenetSpeech10000≥95%WenetSpeech (wenet-e2e.github.io)CC BY 4.0填写表格审核通过后下载
MAGICDATA  Conversational 180 663openslr.orgCC BY-NC-ND 4.0
SHALCAS22A60openslr.orgCC BY-NC-ND 4.0

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐