Intelsense AI

Interviewed by Syed Md. Rakeen, Team MBR

Intelsense AI is a leading voice technology company based in Bangladesh that aims to empower businesses to reach their full potential by offering state-of-the-art speech recognition and natural language processing (NLP) solutions. It has developed diverse speech recognition and NLP solutions that can be applied in different settings to address various business needs. The company is committed to leveraging artificial intelligence (AI) and machine learning to provide cutting-edge solutions that help businesses and individuals overcome language barriers and streamline data-handling processes. Team MBR was in a conversation with Mr. Rumman Arefin, Founder and CEO, Intelsense AI, to learn about his inspirations and vision behind the company.

Syed Md. Rakeen: Since being founded in 2018, Intelsense AI has been working on developing diversified AI-powered products in the field of voice and language processing. Would you kindly share with us how you came up with this idea?

Rumman Arefin: The inception of Intelsense AI can be traced back to my childhood aspirations. As a child, I always wanted to build something that learns intuitively and evolves into something smarter than me. I needed something more than a calculator— something that could solve all the problems I could not find answers to. With time, I grew up to become a software professional and began working as a software designer for various national and international organisations. I was inspired by Mitsuku, an exceptional AI chatbot, and realised the potential of AI-powered conversational agents. I wondered why such technologies were not prevalent in Bangladesh, and I realised that if we want to be intelligent, we must recognise speeches, videos, handwriting, faces, and all kinds of things, thereby generating all kinds of useful by-products. My passion for overcoming communication barriers led me to assemble a team and found Intelsense AI in 2018, with a focus on language and voice processing.

Our co-founders, Dr. Arif Ahmad, Rafi Nizamee, Jubaer Hossain, Zaowad Rahabin, Kaiser Hamid, and Dr. Mahnuma Moumi, share the pursuit of innovation. Together, we have created a collaborative environment to revolutionise communication with AI-powered technologies. Our exceptional team, some of whom have worked in the speech processing industry worldwide, is working tirelessly to shape the future of voice and language processing, fostering a world where technologies bring us closer together, transcend boundaries, and empower connections.

Our inspiration to create SenseVoice came from experiencing frustration with traditional notetaking methods during meetings and lectures. We saw an opportunity to use AI for a voice-powered note-taking app, transcribing conversations in real time. Popular among professionals, educators, and students, SenseVoice has led to our planned expansion to include speaker identification and translation while continuing to innovate in voice and language processing.

Syed Md. Rakeen: Intelsense AI has recently launched its AI-powered transcription platform, named SenseVoice. In what industries or fields could SenseVoice be most useful, and how might it revolutionise the way humans work with audio and video content?

Rumman Arefin: SenseVoice, our recently launched AI-powered transcription platform, has the potential to revolutionise how audio and video content are managed across various industries. By capturing, transcribing, and analysing information in real-time, it aims to significantly enhance efficiency, accuracy, and collaboration in diverse sectors.

In the world of business and meetings, SenseVoice can foster improved collaboration and communication by transcribing meetings and conferences, allowing attendees to easily review and share notes while preserving important information. Journalism can benefit from SenseVoice, as journalists can transcribe interviews and press conferences for accurate and comprehensive articles, saving time by eliminating manual transcription. In education, SenseVoice becomes a game-changer by transcribing lectures and class discussions, enabling students to review material at their pace, improving learning outcomes, and ensuring equal access for students with hearing impairments.

The media and entertainment industries can leverage SenseVoice to transcribe and analyse audio and video content, helping professionals identify trends and patterns while improving the accuracy of subtitles and closed captions for an enhanced viewer experience. Lastly, the legal profession can benefit from SenseVoice by transcribing depositions and court hearings, making it more convenient to review and analyse information, improving legal proceedings’ accuracy, and reducing error risks. SenseVoice can cater to numerous other use cases, ensuring a more streamlined and productive work environment for all.

Syed Md. Rakeen: Different accents and dialects can be difficult to detect when it comes to AI-powered voice and language processing technologies. How does Intelsense AI tackle the issue and ensure accuracy across different accents and dialects?

Rumman Arefin: Intelsense AI is fully aware of the complexities involved in detecting and processing different accents and dialects in AIpowered voice and language technologies. To ensure accuracy across various accents and dialects, we adopt a multifaceted approach that makes our technologies more versatile and adaptable. One of our essential strategies is using diverse training data. We train our language models on comprehensive datasets that include different accents and dialects, enabling them to learn and recognise specific language patterns. We also utilise phonetic transcriptions, which are standardised representations of language sounds. This helps teach our language models to recognise the sounds of different accents and dialects more effectively.

Fine-tuning our language models for specific tasks and datasets is another vital part of our process. For example, we might fine-tune a language model on a dataset of spoken language that contains a wide range of accents and dialects, significantly improving the model’s performance in tasks that require recognising various accents and dialects. In addition to these techniques, we leverage speaker adaptation, which involves training the language models on data from specific speakers, allowing the models to recognise their speech patterns more accurately. Furthermore, our language models are designed and optimised for continuous learning. By constantly learning from new data, our models can adapt to changes in language patterns and improve their accuracy over time.

Syed Md. Rakeen: Intelsense AI’s HIA is an AI-powered voice-based financial assistant. Would you kindly share with us how it is bringing changes to the traditional way of banking?

Rumman Arefin: Intelsense AI’s HIA, an AI-powered voice-based financial assistant, is bringing about significant changes to the traditional way of banking and transforming the customer experience in several ways. One of the most notable aspects of HIA is its ability to offer personalised assistance. By understanding customers’ financial needs, it provides tailored advice that enables them to manage their finances more effectively and make better-informed decisions. This ultimately contributes to improved financial well-being for users. Convenience is another key aspect of HIA. Customers can access their account information and perform transactions using simple voice commands, eliminating the need to visit a physical branch or use a computer. With HIA, banking becomes a seamless and hassle-free experience that is available to customers anytime and anywhere. HIA’s voice-based interface makes banking more accessible to customers with disabilities or those who face difficulties using traditional banking methods. By bridging this gap, HIA ensures that a broader range of customers can enjoy the benefits of modern banking services.

HIA represents innovation in the banking industry. By leveraging cutting-edge AI and voice technologies, HIA places itself at the forefront of innovation, setting it apart from competitors and attracting new customers who value a technologically advanced banking experience. In summary, HIA is revolutionising traditional banking through personalisation, convenience, efficiency, accessibility, and innovation, ultimately providing a superior customer experience.

 

Syed Md. Rakeen: Sentiment analysis technologies allow machines to comprehend the feelings and attitudes of users while communicating with voice assistants. How do the product offerings of Intelsense AI handle the differences in human emotions and other non-literal language elements?

Rumman Arefin: Sentiment analysis is an important aspect of voice assistants in today’s world, as it involves understanding human emotions and other non-literal language elements. At Intelsense AI, we take this challenge seriously and implement various techniques to make our product offerings more sensitive to these nuances in human communication. One key technique we use is contextual analysis. Our sentiment analysis technologies consider the context of a sentence or conversation to determine the speaker’s intent and emotions. By analysing the surrounding words and phrases, our systems can better understand the underlying meaning of a sentence and non-literal language elements. We also employ machine learning and deep learning techniques to enhance our sentiment analysis capabilities. These techniques analyse vast amounts of text and speech data to identify patterns and relationships between language elements and emotions. As our systems process more data, they continuously learn and improve their accuracy in understanding human emotions.

In addition to these methods, our offerings incorporate emotion detection techniques, which involve analysing the tones, pitches, and other acoustic features of speech to determine the speakers’ emotions. This allows our voice assistants to be more perceptive and respond appropriately to users’ emotions.

Syed Md. Rakeen: Though AI-powered voice and language processing technologies offer immense benefits, Bangladesh has yet to explore their potential on a larger scale. In your opinion, what are some of the steps required to increase the adoption of AIpowered voice and language processing technologies in business operations in Bangladesh?

Rumman Arefin: Bangladesh is an emerging market with great potential for adopting AI-powered voice and language processing technologies in business operations. However, to increase their adoption on a larger scale, several steps need to be taken, which I believe can pave the way for a successful implementation of these technologies in the country.

Many businesses in Bangladesh may not be fully aware of the potential benefits of AI-powered voice and language processing technologies. Organising workshops, seminars, and other outreach programmes can help disseminate knowledge about these technologies and their advantages. Secondly, building local capacity is essential for the long-term success of AI adoption in Bangladesh. This involves developing a skilled workforce through training and education that can create, implement, and maintain AI-powered technologies. Thirdly, fostering collaboration between academia, industry, and government can accelerate the adoption of AI-powered voice and language processing technologies in the country. This collaboration can facilitate the exchange of knowledge, skills, and resources necessary for successful implementation.

However, providing access to funding and resources can enable small and medium-sized businesses in Bangladesh to adopt AI-powered voice and language processing technologies and stay competitive. Financial support can help businesses overcome the initial costs of adopting these innovative technologies. By focusing on these steps, Bangladesh can foster a conducive environment for the growth and integration of AI-powered voice and language processing technologies in its business operations, ultimately contributing to the country’s economic development.

Syed Md. Rakeen: One of the concerns regarding AI-powered voice and language processing technologies is the risk associated with manipulation and misrepresentation by mimicking an individual’s voice with the intention of misusing the technologies. What safeguards can be put in place to ensure ethical and responsible usage of AI-powered voice and language processing technologies?

Rumman Arefin: Addressing the concerns about the potential misuse of AI-powered voice and language processing technologies is essential, and several safeguards can be put in place to ensure ethical and responsible usage. First, authentication protocols must be integrated into the technologies. A combination of biometric or two-factor authentication can help ensure that only authorised individuals can access and use the voice assistant, keeping sensitive user data safe.

Transparency in how the technologies work and the potential risks involved are crucial for building trust with users and ensuring that they can make informed decisions about using the technologies. Consent is also vital, requiring service providers to obtain explicit consent from users for the specific usage of their voice data and allowing users to withdraw consent and delete their data easily. Data security should be a priority for any organisation dealing with AI-powered voice and language processing technologies. Protecting user data through encryption, restricted access, and regular security audits is paramount.

Holding service providers accountable for any misuse or abuse of voice and language processing technologies is essential, with processes in place for reporting and investigating incidents of misuse and taking corrective action when necessary.

By integrating these safeguards, we can create a robust framework for the ethical and responsible use of AI-powered voice and language processing technologies, allowing us to harness their full potential while minimising the risks.

Syed Md. Rakeen: Developers around the world are constantly creating wonders by coming up with new AI-powered technologies. Would you kindly share some of the most exciting advancements you expect to see in the field of voice and language processing technologies in the upcoming years?

Rumman Arefin: In the upcoming years, we can expect a multitude of exciting advancements in the field of voice and language processing technologies. As conversational AI systems evolve, they will become more intelligent, adaptive, and versatile, engaging in human-like conversations and handling increasingly complex tasks. The advancement of multilingual voice processing will enable AI systems to effortlessly switch between languages and dialects, bridging the gap between cultures and fostering global collaboration. Future AI systems will also possess an enhanced contextual understanding, enabling them to comprehend idiomatic expressions, metaphors, and sarcasm, making interactions more engaging and natural. Emotion detection will play a significant role in the future of AI-powered voice and language processing, with AI systems analysing speech patterns and tones to detect and respond to emotions. This will lead to more personalised and empathetic interactions across various aspects of our lives.

Despite ethical concerns, responsible use of voice cloning technology holds immense potential. Personalised voice assistants, improved text-to-speech capabilities, and voice restoration for those who have lost their ability to speak are just a few of the possibilities this technology can unlock.

 

Intelsense AI is a passionate voice tech company specialising in NLP and voice-assistant based products. To learn more, please visit http://www. intelsense.ai/