Now Get 50% OFF* on Conversational AI Off-the-Shelf Datasets
Speech & Audio dataset for chatbots, voice assistants, speech-enabled devices.
*Limited Period Offer
Inaaminiwa na Viongozi wa Viwanda
Maelezo | Keyword | Off-the-shelf Language Dataset | Call Center Conversations 8khz* | Generic Conversations 8khz* | Media & Podcasts 16khz* | Utterance/ Scripted Monologue 16khz* | Total Volume in Hours | Dialects covered | Audio Format | Text Transcription Format | Tumia Uchunguzi | chanzo | CTA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Shule zote nchini Marekani | Afrikaans Audio Dataset | 600 | 900 | 1500 | Afrikaans spoken in Africa | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
arabic | Arabic Audio Dataset | 800 | 1500 | 2300 | Arabic from Gulf countries | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
Kichina | Chinese Audio Dataset | 2000 | 2000 | Chinese from China | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
danish | Danish Audio Dataset | 400 | 600 | 2000 | 3000 | Danish from Denmark | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
dutch | Dutch Audio Dataset | 2000 | 2000 | Dutch from Netherland | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
English - AAVE Accent | English - AAVE (African American Vernacular English) Audio Dataset | 500 | 500 | 1000 | The vernacular variety (sometimes known as AAVE, typically spoken by the vast majority of working- and middle-class African Americans) and the more standard variety (typically spoken by middle-class African Americans in formal and public situations) but with a stronger emphasis on the vernacular. | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
English - Boston/New York Accent | English - Boston/New York Audio Dataset | 225 | 225 | 350 | 800 | This is a collection of several regional accents spoken in and around the cities of Boston, New York, and Philadelphia. These accents might sound similar to non-locals, but distinct from other American accents. Despite some local vocabulary that is different from other parts of the English-speaking world, these accents are mutually intelligible with English spoken elsewhere. | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
English - Chinese Accent | English - Chinese Accented Audio Dataset | 150 | 300 | 450 | Speakers who speak Chinese as their first language and who moved/immigrated to the United States as teenagers/adults and learned English as their second language. | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
English - Deep South Accent | English - Deep South Audio Dataset | 275 | 275 | 450 | 1000 | Speakers from (i) Texas; (ii) North Carolina, South Carolina, Georgia; (iii) New Orleans; (iv) Florida panhandle; (v) Tennessee, Arkansas, Michigan. | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
English - Hispanic Accent | English - Hispanic Accented Audio Dataset | 400 | 400 | 800 | Hispanic English refers to the varieties of US English spoken by Hispanic Americans of diverse national heritage. The main focus was on Mexican Americans, speakers of different national origins (e.g. Mexico, Puerto Rico, Dominican Republic, Ecuador, Cuba, etc) and from different regions (e.g. California, New York, Florida) as well. Speakers included were who speak Spanish as a first language as well as speakers of Hispanic origin who speak Spanish has a heritage language. | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
English - New Zealand Accent | English - New Zealand Audio Dataset | 250 | 750 | 1000 | Speakers on both islands, including a mix of younger speakers (<40 years old) and older speakers (>40 years old) in equal proportions. | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
English - Singapore Accent | English - Singapore Audio Dataset | 400 | 600 | 1000 | Both Standard Singapore English and Colloquial Singapore English. Singaporeans of different ethnic backgrounds (e.g. Chinese, Malay, Indian, etc) and of different educational levels. | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
English - South Africa Accent | English - South Africa Audio Dataset | 400 | 600 | 1000 | Representatives from various socioeconomic classes and ethnological backgrounds (e.g. South Africans of European, African, Indian, or mixed background). | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
English - Irish Accent | English - Irish Audio Dataset | 500 | 500 | English spoken in Ireland | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
English - Scottish Accent | English - Scottish Audio Dataset | 800 | 800 | English spoken by Scottish | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
English - Welsh Accent | English - Welsh Audio Dataset | 800 | 800 | Welsh English | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
Mfaransa Canada | French Canadian Audio Dataset | 1000 | 1000 | Mfaransa wa Canada | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
Kiyahudi | Hebrew Audio Dataset | 750 | 750 | 1500 | Hebrew in Israel | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
indonesian | Indonesian Audio Dataset | 1000 | 1000 | 2000 | Kibahasa Kiindonesia | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
japanese | Japanese Audio Dataset | 2000 | 2000 | Japanese from Japan | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
Korea | Korean Audio Dataset | 100 | 200 | 1500 | 1800 | Speakers spread throughout South Korea. | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
malay | Malay Audio Dataset | 500 | 500 | 1000 | Malay in Malaysia | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
Kihispania cha Mexico | Mexican Spanish Audio Dataset | 1250 | 1250 | Mexican from Mexico | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
Kipolandi | Polish Audio Dataset | 250 | 2000 | 2250 | Polish from Poland | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
russian | Russian Audio Dataset | 2000 | 2000 | Russian from Russia | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
Kiswahili | Swahili Audio Dataset | 350 | 650 | 1000 | South African and Kenyan Swahili | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
swedish | Swedish Audio Dataset | 350 | 650 | 1000 | Swedish in Sweden | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
Taiwan Chinese | Taiwan Chinese Audio Dataset | 1000 | 1000 | Chinese from Taiwan | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
thai | Thai Audio Dataset | 350 | 450 | 800 | An informal register used between friends, | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
turkish | Turkish Audio Dataset | 2000 | 2000 | Turkish from Turkey | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||||
vietnamese | Vietnamese Audio Dataset | 600 | 400 | 1000 | Northern (e.g.,Hanoi), Central, and Southern (e.g., Ho Chi Minh City). | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
hindi | Hindi Audio Dataset | 800 | 2000 | 2800 | Hindi in India specifically in North, East and West regions | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
Kihinglish | Indian English Audio Dataset | 300 | 500 | 800 | Collected from urban Indian cities that are financial hubs of the country due to growing economic opportunities. Such places can be Noida, Delhi, Dehradun, Chandigarh, Mumbai, Kolkata, Bangalore, Pune, Chennai, Hyderabad, etc | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||
Kiingereza | English Audio Dataset | 700 | 700 | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | |||||
kannada | Kannada Audio Dataset | 60 | 100 | 40 | 200 | Kannada from Karnataka, India | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
Malayalam | Malayalam Audio Dataset | 60 | 100 | 40 | 200 | Malayalam from Kerala, Lakshadweep and Puducherry | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
Kioriya | Oriya Audio Dataset | 60 | 100 | 40 | 200 | Oriya from parts of Odisha, West Bengal, Jharkhand and Chhattisgarh | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
punjabi | Punjabi Audio Dataset | 60 | 100 | 40 | 200 | Punjabi from Punjab, India | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
tamil | Tamil Audio Dataset | 60 | 100 | 240 | 400 | Tamil from Tamil Nadu, India | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
telugu | Telugu Audio Dataset | 100 | 950 | 950 | 2000 | Telugu from Andhra Pradesh, India | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
Kibengali | Bengali Audio Dataset | 60 | 100 | 40 | 200 | Bengali from West Bengal, India | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
gujarati | Gujarati Audio Dataset | 60 | 100 | 40 | 200 | Gujarati from Gujarat, India | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
Marathi | Marathi Audio Dataset | 60 | 100 | 40 | 200 | Marathi from Maharashtra, India | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi | ||
Kubafu | Assamese Audio Dataset | 60 | 100 | 40 | 200 | Assamese from Asssam, India | .wimbi | .json | ASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga Lugha | Shaip | Wasiliana nasi Wasiliana nasi |
Utaalam wa kina katika AI ya Maongezi
Mazungumzo ya AI au Chatbots au Wasaidizi wa Mtandaoni/Dijitali ni mahiri tu kama teknolojia na data iliyo nyuma yao. Huko Shaip, tunakupa seti pana ya mkusanyiko wa sauti mseto wa Uchakataji wa Lugha Asilia (NLP) ambayo huiga mazungumzo na watu halisi ambayo hukuruhusu kufanya AI yako hai. Kwa ufahamu wetu wa kina, tunakusaidia kujenga na kubinafsisha miundo ya usemi inayowezeshwa na AI, kwa usahihi zaidi na seti tajiri za data zilizoundwa katika lugha nyingi kutoka kote ulimwenguni. Tunatoa mkusanyiko wa sauti wa lugha nyingi, unakili wa sauti, na huduma za ufafanuzi wa sauti kulingana na mahitaji yako, huku tukibinafsisha kikamilifu dhamira, matamshi na usambazaji wa idadi ya watu unaotaka.
Mkusanyiko wa Hotuba
Mkusanyiko wa Hotuba ya hiari
Unukuzi wa Data ya Sauti
Kuweka Data na Ufafanuzi
Shaip hukuruhusu kufunza kwa usahihi Jukwaa lako la Maongezi ya AI ili iweze:
- Ongea, tuma maandishi na piga gumzo bila mshono katika vituo vingi.
- Jifunze kutokana na mwingiliano uliopo kwa njia ya gumzo, nukuu za sauti, miamala, n.k. na upendekeze na uzungumze, kulingana na mafunzo haya.
- Elewa dhamira ya usemi wa mwanadamu na uondoe utata katika kuelewa lugha ya binadamu.
- Kuingiliana na wewe kwa misingi ya moja kwa moja na inaweza kufunzwa kutambua watumiaji na kukumbuka mazungumzo ya zamani.
Kiongozi wa Ulimwenguni katika Takwimu za Mafunzo ya AI ya Mazungumzo
Masaa ya data ya sauti katika lugha 100+ - Iliyochorwa, Imenakiliwa na Imetangazwa
Utoaji Leseni ya Data ya Matamshi
Saa 20k+ za Data ya Matamshi katika lugha 40+ na lahaja zinazoshughulikia anuwai ya mada 55+ kutoka vikoa tofauti yaani, Kituo cha Kupigia simu, Mijadala, Mazungumzo ya Jumla, Hotuba, podikasti, n.k.
Ukusanyaji wa Takwimu za Hotuba
Kusanya data ya sauti na matamshi (mtazamo mmoja, mazungumzo ya watu 2, gumzo la kibinadamu) katika zaidi ya lugha 100 kutoka kote ulimwenguni, iliyobinafsishwa kulingana na mahitaji yako ya AI.
Unukuzi wa Data ya Hotuba
Cost-effective audio transcription or audio annotation through a strong workforce of 30,000 collaborators with guaranteed TAT, accuracy, and savings
Ongeza kasi ya uundaji wa programu yako ya Mazungumzo ya AI kwa Ukusanyaji wa Sauti na Huduma za Ufafanuzi wa Sauti
Faida ya Shaip
Scale
Tunaweza kupata, kupima, na kuwasilisha data ya sauti kutoka kote ulimwenguni kwa lugha na lahaja nyingi kulingana na mahitaji yako.
Utaalamu
Tunayo utaalam sahihi kuhusu ukusanyaji sahihi wa data, unukuzi, na ufafanuzi wa kiwango cha dhahabu.
Mtandao
Mtandao wa wachangiaji waliohitimuwa 30,000, ambao wanaweza kupewa kazi za ukusanyaji wa data haraka ili kujenga mfano wa mafunzo ya AI na huduma za kuongeza kiwango.
Teknolojia
Tunayo jukwaa lenye msingi wa AI na zana na michakato ya umiliki ili kuinua usimamizi wa mtiririko wa kazi 24 * 7 wakati wote.
Agility
We adapt to changes in customer requirements very fast and help in accelerating AI development with quality speech data 5-10x faster than competition.
Usalama
Tunatoa umuhimu mkubwa kwa usalama wa data na faragha na pia tumethibitishwa kushughulikia data nyeti iliyodhibitiwa sana.
Tunachofanya Bora
Takwimu za Mafunzo
Pata data iliyo na lebo ya ubora wa juu zaidi katika muda mfupi. Ni ya kiwango cha dhahabu, inategemewa na iko tayari kufunza miundo yako ya AI na ML ili kufikia viwango vya juu zaidi vya utendakazi.
Ukusanyaji wa Takwimu, Kuweka lebo na Ufafanuzi
Ukiwa na Shaip unapata utaalamu wa miaka 15+ katika kukusanya, kunukuu na kufafanua data ya ubora. Kwa nguvu kazi yetu ya kimataifa tunaweza kukusanya data kutoka kote ulimwenguni, kisha kutoa huduma za kuweka lebo na ufafanuzi kwa kiwango kamili cha ujuzi na utaalam unaohitajika kwa data yako.
Katalogi za Takwimu na Utoaji wa Leseni
Kwa orodha yetu kubwa ya mamilioni ya hifadhidata unaweza kukusanya na kupanga inavyohitajika. Kisha tunaweza kutoa leseni kwa data hiyo ya ubora kwa mahitaji yako mahususi ya matumizi ya AI na ML. Zaidi ya hayo, data hii inapatikana kwa sehemu ya gharama ikiwa ungeiunda mwenyewe.
Je, ungependa kuunda seti yako ya data?
Wasiliana nasi sasa ili kujifunza jinsi tunavyoweza kukusanya seti maalum ya data kwa suluhisho lako la kipekee la AI.