Now Get 50% OFF* on Conversational AI Off-the-Shelf Datasets

Speech & Audio dataset for chatbots, voice assistants, speech-enabled devices.

*Limited Period Offer

  • Kwa kujiandikisha, nakubaliana na Shaip Sera ya faragha na Masharti ya Huduma na kutoa idhini yangu ya kupokea mawasiliano ya uuzaji ya B2B kutoka kwa Shaip.

Inaaminiwa na Viongozi wa Viwanda

MaelezoOff-the-shelf Language DatasetCall Center Conversations 8khz*Generic Conversations 8khz*Media & Podcasts 16khz*Utterance/ Scripted Monologue 16khz*Total Volume in HoursDialects coveredAudio FormatText Transcription FormatTumia UchunguzichanzoCTA
HotubaShule zote nchini MarekaniAfrikaans Audio Dataset6009001500Afrikaans spoken in Africa.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaarabicArabic Audio Dataset80015002300Arabic from Gulf countries.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKichinaChinese Audio Dataset20002000Chinese from China.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubadanishDanish Audio Dataset40060020003000Danish from Denmark.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubadutchDutch Audio Dataset20002000Dutch from Netherland.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - AAVE AccentEnglish - AAVE (African American Vernacular English) Audio Dataset5005001000The vernacular variety (sometimes known as AAVE, typically spoken by the vast majority of working- and middle-class African Americans) and the more standard variety (typically spoken by middle-class African Americans in formal and public situations) but with a stronger emphasis on the vernacular..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - Boston/New York AccentEnglish - Boston/New York Audio Dataset225225350800This is a collection of several regional accents spoken in and around the cities of Boston, New York, and Philadelphia. These accents might sound similar to non-locals, but distinct from other American accents. Despite some local vocabulary that is different from other parts of the English-speaking world, these accents are mutually intelligible with English spoken elsewhere..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - Chinese AccentEnglish - Chinese Accented Audio Dataset150300450Speakers who speak Chinese as their first language and who moved/immigrated to the United States as teenagers/adults and learned English as their second language..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - Deep South AccentEnglish - Deep South Audio Dataset2752754501000Speakers from (i) Texas; (ii) North Carolina, South Carolina, Georgia; (iii) New Orleans; (iv) Florida panhandle; (v) Tennessee, Arkansas, Michigan..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - Hispanic AccentEnglish - Hispanic Accented Audio Dataset400400800Hispanic English refers to the varieties of US English spoken by Hispanic Americans of diverse national heritage. The main focus was on Mexican Americans, speakers of different national origins (e.g. Mexico, Puerto Rico, Dominican Republic, Ecuador, Cuba, etc) and from different regions (e.g. California, New York, Florida) as well. Speakers included were who speak Spanish as a first language as well as speakers of Hispanic origin who speak Spanish has a heritage language..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - New Zealand AccentEnglish - New Zealand Audio Dataset2507501000Speakers on both islands, including a mix of younger speakers (<40 years old) and older speakers (>40 years old) in equal proportions..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - Singapore AccentEnglish - Singapore Audio Dataset4006001000Both Standard Singapore English and Colloquial Singapore English. Singaporeans of different ethnic backgrounds (e.g. Chinese, Malay, Indian, etc) and of different educational levels..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - South Africa AccentEnglish - South Africa Audio Dataset4006001000Representatives from various socioeconomic classes and ethnological backgrounds (e.g. South Africans of European, African, Indian, or mixed background)..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - Irish AccentEnglish - Irish Audio Dataset500500English spoken in Ireland.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - Scottish AccentEnglish - Scottish Audio Dataset800800English spoken by Scottish.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaEnglish - Welsh AccentEnglish - Welsh Audio Dataset800800Welsh English.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaMfaransa CanadaFrench Canadian Audio Dataset10001000Mfaransa wa Canada.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKiyahudiHebrew Audio Dataset7507501500Hebrew in Israel.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaindonesianIndonesian Audio Dataset100010002000Kibahasa Kiindonesia.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubajapaneseJapanese Audio Dataset20002000Japanese from Japan.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKoreaKorean Audio Dataset10020015001800Speakers spread throughout South Korea..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubamalayMalay Audio Dataset5005001000Malay in Malaysia.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKihispania cha MexicoMexican Spanish Audio Dataset12501250Mexican from Mexico.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKipolandiPolish Audio Dataset25020002250Polish from Poland.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubarussianRussian Audio Dataset20002000Russian from Russia.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKiswahiliSwahili Audio Dataset3506501000South African and Kenyan Swahili.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaswedishSwedish Audio Dataset3506501000Swedish in Sweden.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaTaiwan ChineseTaiwan Chinese Audio Dataset10001000Chinese from Taiwan.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubathaiThai Audio Dataset350450800An informal register used between friends,.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaturkishTurkish Audio Dataset20002000Turkish from Turkey.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubavietnameseVietnamese Audio Dataset6004001000Northern (e.g.,Hanoi), Central, and Southern (e.g., Ho Chi Minh City)..wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubahindiHindi Audio Dataset80020002800Hindi in India specifically in North, East and West regions.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKihinglishIndian English Audio Dataset300500800Collected from urban Indian cities that are financial hubs of the country due to growing economic opportunities. Such places can be Noida, Delhi, Dehradun, Chandigarh, Mumbai, Kolkata, Bangalore, Pune, Chennai, Hyderabad, etc.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKiingerezaEnglish Audio Dataset700700.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubakannadaKannada Audio Dataset6010040200Kannada from Karnataka, India.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaMalayalamMalayalam Audio Dataset6010040200Malayalam from Kerala, Lakshadweep and Puducherry.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKioriyaOriya Audio Dataset6010040200Oriya from parts of Odisha, West Bengal, Jharkhand and Chhattisgarh.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubapunjabiPunjabi Audio Dataset6010040200Punjabi from Punjab, India.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubatamilTamil Audio Dataset60100240400Tamil from Tamil Nadu, India.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubateluguTelugu Audio Dataset1009509502000Telugu from Andhra Pradesh, India.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKibengaliBengali Audio Dataset6010040200Bengali from West Bengal, India.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubagujaratiGujarati Audio Dataset6010040200Gujarati from Gujarat, India.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaMarathiMarathi Audio Dataset6010040200Marathi from Maharashtra, India.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip
HotubaKubafuAssamese Audio Dataset6010040200Assamese from Asssam, India.wimbi.jsonASR, Msaidizi wa Mtandao, Chatbot, AI ya Mazungumzo, Uchanganuzi wa Matamshi, TTS, Kuiga LughaShaip

Utaalam wa kina katika AI ya Maongezi

Mazungumzo ya AI au Chatbots au Wasaidizi wa Mtandaoni/Dijitali ni mahiri tu kama teknolojia na data iliyo nyuma yao. Huko Shaip, tunakupa seti pana ya mkusanyiko wa sauti mseto wa Uchakataji wa Lugha Asilia (NLP) ambayo huiga mazungumzo na watu halisi ambayo hukuruhusu kufanya AI yako hai. Kwa ufahamu wetu wa kina, tunakusaidia kujenga na kubinafsisha miundo ya usemi inayowezeshwa na AI, kwa usahihi zaidi na seti tajiri za data zilizoundwa katika lugha nyingi kutoka kote ulimwenguni. Tunatoa mkusanyiko wa sauti wa lugha nyingi, unakili wa sauti, na huduma za ufafanuzi wa sauti kulingana na mahitaji yako, huku tukibinafsisha kikamilifu dhamira, matamshi na usambazaji wa idadi ya watu unaotaka.

Mkusanyiko wa Hotuba

Mkusanyiko wa Hotuba ya hiari

Unukuzi wa Data ya Sauti

Kuweka Data na Ufafanuzi

Shaip hukuruhusu kufunza kwa usahihi Jukwaa lako la Maongezi ya AI ili iweze:

  • Ongea, tuma maandishi na piga gumzo bila mshono katika vituo vingi.
  • Jifunze kutokana na mwingiliano uliopo kwa njia ya gumzo, nukuu za sauti, miamala, n.k. na upendekeze na uzungumze, kulingana na mafunzo haya.
  • Elewa dhamira ya usemi wa mwanadamu na uondoe utata katika kuelewa lugha ya binadamu.
  • Kuingiliana na wewe kwa misingi ya moja kwa moja na inaweza kufunzwa kutambua watumiaji na kukumbuka mazungumzo ya zamani.

Kiongozi wa Ulimwenguni katika Takwimu za Mafunzo ya AI ya Mazungumzo

Masaa ya data ya sauti katika lugha 100+ - Iliyochorwa, Imenakiliwa na Imetangazwa

Utoaji Leseni ya Data ya Matamshi

Saa 20k+ za Data ya Matamshi katika lugha 40+ na lahaja zinazoshughulikia anuwai ya mada 55+ kutoka vikoa tofauti yaani, Kituo cha Kupigia simu, Mijadala, Mazungumzo ya Jumla, Hotuba, podikasti, n.k.

Ukusanyaji wa Takwimu za Hotuba

Kusanya data ya sauti na matamshi (mtazamo mmoja, mazungumzo ya watu 2, gumzo la kibinadamu) katika zaidi ya lugha 100 kutoka kote ulimwenguni, iliyobinafsishwa kulingana na mahitaji yako ya AI.

Unukuzi wa Data ya Hotuba

Cost-effective audio transcription or audio annotation through a strong workforce of 30,000 collaborators with guaranteed TAT, accuracy, and savings

Ongeza kasi ya uundaji wa programu yako ya Mazungumzo ya AI kwa Ukusanyaji wa Sauti na Huduma za Ufafanuzi wa Sauti

Faida ya Shaip

Scale​

Tunaweza kupata, kupima, na kuwasilisha data ya sauti kutoka kote ulimwenguni kwa lugha na lahaja nyingi kulingana na mahitaji yako.

Utaalamu

Tunayo utaalam sahihi kuhusu ukusanyaji sahihi wa data, unukuzi, na ufafanuzi wa kiwango cha dhahabu.

Mtandao

Mtandao wa wachangiaji waliohitimuwa 30,000, ambao wanaweza kupewa kazi za ukusanyaji wa data haraka ili kujenga mfano wa mafunzo ya AI na huduma za kuongeza kiwango.

Teknolojia

Tunayo jukwaa lenye msingi wa AI na zana na michakato ya umiliki ili kuinua usimamizi wa mtiririko wa kazi 24 * 7 wakati wote.

Agility

We adapt to changes in customer requirements very fast and help in accelerating AI development with quality speech data 5-10x faster than competition.

Usalama

Tunatoa umuhimu mkubwa kwa usalama wa data na faragha na pia tumethibitishwa kushughulikia data nyeti iliyodhibitiwa sana.

Tunachofanya Bora

Takwimu za Mafunzo

Pata data iliyo na lebo ya ubora wa juu zaidi katika muda mfupi. Ni ya kiwango cha dhahabu, inategemewa na iko tayari kufunza miundo yako ya AI na ML ili kufikia viwango vya juu zaidi vya utendakazi.

Maelezo Zaidi

Ukusanyaji wa Takwimu, Kuweka lebo na Ufafanuzi

Ukiwa na Shaip unapata utaalamu wa miaka 15+ katika kukusanya, kunukuu na kufafanua data ya ubora. Kwa nguvu kazi yetu ya kimataifa tunaweza kukusanya data kutoka kote ulimwenguni, kisha kutoa huduma za kuweka lebo na ufafanuzi kwa kiwango kamili cha ujuzi na utaalam unaohitajika kwa data yako.

Maelezo Zaidi

Katalogi za Takwimu na Utoaji wa Leseni

Kwa orodha yetu kubwa ya mamilioni ya hifadhidata unaweza kukusanya na kupanga inavyohitajika. Kisha tunaweza kutoa leseni kwa data hiyo ya ubora kwa mahitaji yako mahususi ya matumizi ya AI na ML. Zaidi ya hayo, data hii inapatikana kwa sehemu ya gharama ikiwa ungeiunda mwenyewe.

Maelezo Zaidi

Je, ungependa kuunda seti yako ya data?

Wasiliana nasi sasa ili kujifunza jinsi tunavyoweza kukusanya seti maalum ya data kwa suluhisho lako la kipekee la AI.