An artificial intelligence (AI) voice recognition system that can monitor and identify pornographic content began its open beta from last Sunday.
上周日,一款由阿里巴巴公司开发的能监测和识别色情内容的人工智能语音识别系统开始公测。
Helped by the voiceprint recognition method, the Alibaba-developed voice recognition system can identify multiple languages such as Chinese, Japanese, English and Russian, as well as Chinese dialects from different provinces such as Hunan, Hubei, Henan, Sichuan and Guangdong.
借助声纹识别技术,这个语音识别系统能够识别中文、日文、英文、俄文等多国语言,还能识别湖南、湖北、河南、四川、广东等国内多个省份的方言。
Transforming voice into script, the system compares the scripts with keywords in its lexicon1 and anti-spam audio models to determine if something is pornographic.
该系统将语音识别转成文字后,再将这些文字与其词库中的关键词或反垃圾音频模型比对,判断是否涉黄。
The lexicon and anti-spam audio models collect tens of thousands of pornographic words with the same or similar pronunciations, according to Alibaba.
据阿里巴巴介绍,词库和反垃圾音频模型收集了成千上万个发音相同或类似的涉黄词汇。
The system monitors both online and offline voice files. It also has the ability to adapt and "learn" through constant use. For example, its Cantonese recognition ability was cultivated by watching TV series.
款系统可以监控在线和离线的语音文件。此外,它还具有通过持续的使用适应“学习”的能力,比如,其广东话识别能力是通过看电视剧训练的。
The system is scheduled to be put into operation in September this year.
预计该系统将于今年9月投入使用。