A New Era for ChatGPT: Voice and Image Prompts Arrive

OpenAI, the company that introduced us to ChatGPT, has now unveiled a set of new features in the paid version and Enterprise users of ChatGPT. This new feature will come out on the chatbot in the next week or so.

The big news is that now ChatGPT will support voice and image prompts. This is a piece of welcome news, especially for those who have been seeking such a service from the chatbot for a long time.

Since its introduction, people have been thinking about making it more useful. Today, more than ever, professionals open up OpenAI’s ChatGPT in one of their tabs along with Google. This is a big achievement for OpenAI because they are now in direct competition with search engine giant Google.

In the past, people always thought about how the chatbot would work if voice search and image prompts were included in it. Now, their imagination is turning into reality.

Once the new update has been set in for the paid users, these features will be rolled out for everyone. This is a welcome news.

Know more: Is ChatGPT Safe?

Table of Contents

The Reason to be Thrilled

With the emergence of voice and image prompts, a new chapter will open up in the history of ChatGPT. Voice and image have the potential to make ChatGPT two hundred times more useful than in the current version.

Think about it: You are traveling to a new destination. You snap a photo of a landmark while traveling and get information with the help of a live conversation about what is interesting about that place.

When you are at home, click a photo of your fridge and pantry to find out what can be prepared for dinner. You can then ask follow-up questions to get a step-by-step recipe.

Once the dinner is over, assist your child with a Math problem by taking a picture, circling the problem set, and having the chatbot provide hints to resolve the problem.

This feature will be available for the Enterprise and Plus users. The good news is that voice is coming over on iOS and Android (you can opt-in your settings), and images will be accessible on all platforms.

Learn more: What is chat gpt used for

Initiate a Conversation with ChatGPT

This is the perfect time for the Plus and Enterprise users of the chatbot to experiment a bit and use voice to engage in a back-and-forth conversation with ChatGPT. Speak with it as you travel. You can even avail of Microsoft Data and AI Consulting services by asking for assistance from the chatbot.

Just imagine your business world completely transforming with the help of this voice prompt feature of ChatGPT.

Here are some steps to get started with voice on ChatGPT.

Go to Settings > New Features on your mobile app.
Opt into voice conversations.
Next, tap the headphone button, which can be found on the top-right corner of the home screen.
Then, select your preferred voice.

Note: There are five distinct voices from which you select your choice.

Another amazing thing that makes this feature even more worthwhile is the fact that a brand-new text-to-speech model powers it. It is capable of creating human-like audio from text and a few seconds of sample speech.

The company has forged partnerships with professional voice actors to generate each of the voices. It also utilizes Whisper, an open-source speech recognition system, to transcribe the words you speak into text.

Listening to Voice Sample

ChatGPT listens to the voice sample and then can replicate the voice to match the one that resembles the most to the voice sample. This amazing technology can come in handy when you wish to create a tutorial video.

Dialog About Images

This feature enables you to showcase one or more images to ChatGPT. Here, you can troubleshoot why your Microsoft Data and AI Partner are not working or why your dashboard is not depicting colorful text.

To get the most out of this feature, you need to concentrate on a specific part of the image. Next, utilize the drawing tool in your mobile app.

Depict One or More Images

At the beginning, you need to tap the photo button to capture or select an image. In case you are using iOS or Android, tap the plus button initially. It is also possible for you to discuss multiple images or utilize the drawing tool of OpenAI to guide the chatbot.

Multimodal GPT3.5 and GPT-4 power the comprehending feature of the image. The language reasoning skills of these models can be applied to different types of images. This includes screenshots, photographs, and documents comprising both images and texts.

Also know: Get ChatGPT to Rank Magic Commander Decks

Gradual Deployment of Image and Voice Capabilities

The ultimate objective of OpenAI is to develop AGI that is harmless and helpful. The company believes in introducing new tools on a gradual basis. This enables it to make enhancements and refine risk mitigations over a while.

It also prepares everyone for more powerful systems in the coming times. This strategy becomes even more crucial with advanced models that indulge in voice and vision.

Voice

The new voice technology is capable of replicating synthetic voices from a few seconds of real speech. This can open doors to several creative and accessibility-concentrated applications.

But, these capabilities also present new risks, like the potential for malicious actors to impersonate public figures or commit fraud.

We will discuss this in future detail in a few moments.

Image Input

Vision-based models also have their own set of challenges. It can create hallucinations about people depending upon the interpretation of the model of images in high-stack domains.

Like any other ChatGPT feature, vision is all about helping you in your everyday life. It can assist you when it can see what you see. With the help of this approach, OpenAI has developed a free mobile app for blind and low-vision people, Be My Eyes, to comprehend its uses and limitations.

Users informed the company that they found it useful to have general conversations about images that consist of people in the background. For example, in case someone appears on TV while you are trying to ascertain your remote control settings.

The Reaction of People After Hearing This News

There have been varied reactions of people on X, formerly Twitter. While certain users have accepted the new update with open arms, others have raised concerns.

Although the integrated functions might make the chatbot more innate, certain research suggests that complex interfaces that fail to mimic human conversation can feel strange to use. This might make the technology tougher to use.

Then, some users have raised concerns about the recent lawsuits against the violation of copyright laws and infringement of intellectual property rights of OpenAI. They are advising others not to use the chatbot.

Then, some users have raised the concern that the updates might replace smaller AI startups, software engineers, and even educators in the coming years.

AI-generated voices have also raised the alarm of voice scams, deepfakes, and identity theft. There has been a growing concern about AI voice generators. In simple words, it is where AI mimics the voice of a real person and calls their relatives for money.

According to a McAfee report, 77% of people targeted by an AI voice scam lost money as a result.

On top of this, as per Joel Fische r, who studies human-computer interaction at the University of Nottingham in the UK, the integration of voice recognition might make this feature less accessible to people who find it difficult to speak mainstream accents.

Also, people are concerned that since the image function enables the AI to decipher images, the bot might be able to bypass image verification CAPTCHA tests on websites.

These tests are performed to prove that they are not bots by transcribing distorted texts and recognizing images that are intended to restrict access.

According to a recent study, yet to be peer-reviewed, it showcases that AI bots can resolve CAPTCHA tests more swifter and more precisely than humans.

How ChatGPT Plans to Mitigate These Risks?

The company has also taken technical measures to significantly restrict the chatbot’s ability to scrutinize and make direct statements about people. To counter this, the company has started to use the technology to power a specific use case.

This is the voice chat developed with voice actors the company directly worked with.

The company also acknowledges the fact that it has some constraints while using images in AI, like image hallucinations where the AI creates false information about the image.

To counter this, OpenAI has taken technical measures to constrain the ability of ChatGPT to scrutinize and make direct statements about people.

The reason is that ChatGPT is not always accurate, and these systems should respect the privacy of individuals.

With the help of real-world use and feedback, the company will try to make the tool even more useful.

Also know: how to use chatgpt with google sheets

Final Thoughts

It has got to be seen how the users adapt to this wonderful voice and image prompts feature in the ChatGPT will open new avenues for Enterprise and Plus users of the chatbot.

With the kind of reception that has been received so far, it has got to be said that OpenAI is in for a bit of a hot boil in the coming days. But, if the company takes these reactions constructively, it can very well make history in the way searches are performed on the internet in the coming time.

Once the features roll out for the free users, it will become even more clear as to which direction this update is going to take this amazing chatbot.

The future of search looks exciting, to say the least!

ChatGPT rolls out voice and image prompts

The Reason to be Thrilled