Google Cloud on LinkedIn: Start building in a flash with Gemini 2.0 Flash—packed with new features…

View organization page for Google Cloud, graphic

2,643,146 followers

Start building in a flash with Gemini 2.0 Flash—packed with new features and improvements ⚡ Starting today, developers can test Gemini 2.0 Flash Experimental via the Gemini API in Google AI Studio and Vertex AI, taking advantage of improved spatial understanding, multimodal and reasoning performance. And with the new Multimodal Live API users can create interactive experiences that respond in real time, like in the demo below. Learn more → https://goo.gle/4gsj9W3

34 Comments

Transcript

Okay, this is a quick demo of multimodal live streaming in Gemini 2.0, and we are casting a tab into AI studio. [user] ��Can you see my screen?�� [model] ��Yes, I see your screen. It's showing a document with the title ��demo��. It also contains bullet points and a Google Meet window with a person in the video. Is there anything in particular you would like to talk about on the screen?�� [user] ��Yeah. Can you start the demo by just reading the text I highlighted?�� [model] ��Sure, I can do that. The text you highlighted reads ��the Multimodal Live API lets you build real time multimodal applications powered by Gemini 2.0 Flash. It features tool use like Google search and code execution, function calling, and more.�� Would you like me to read the following lines too?�� [user] ��No, can you actually just tell me what this word means though?�� [model] ��I can do that. Multimodal refers to the ability to process and understand different types of data like text, images, and audio altogether. Does that help?�� [user] ��Yeah. I need to demo the interruptions feature next. Can you tell me a really boring kind of once upon a time sort of story?�� [model] ��Okay, I can do that. Once upon a time, in a small village nestled beside a quiet forest, there was a very unremarkable rock. This rock was gray, and it was round, and it sat there-�� [user] ��Sorry, I��ll interrupt you.�� [model] ��No problem. I understand that interruptions are part of the demo. What should I do now?�� [user] ��So we're going to demo memory. Can you just summarize everything we've seen or heard so far?�� [model] ��I can do that. So far, we've discussed a demo of the Multimodal Live API, which lets you build realtime multimodal apps. And I defined what multimodal means. I was also asked to tell a very boring once upon a time story, but I was interrupted. Is there anything else?�� [user] ��No. I think we're done. Do you want to read this ending card, or should I?�� [model] ��I can read it if you would like. The ending card says, ��start building with Gemini 2.0 at aistudio.google.com.�� Is that correct?�� [user] ��Yeah. That's great. All right, thanks, talk to you later.��

Yariv Adan

Early stage AI startups investor - reach out to me with opportunities! | Ex Senior Director @ Google AI, Google Zurich Site Lead

Ok, I must give it to Google - this one launch is equal to 12 days of launches... 😇🤷♂️ Amazing responsive video and audio interaction! Killer feature in the browser - sending quite a few startups back to square 1 Truly multimodal input and output. Text and visual input output are truly one modality now. Very impressive demonstration by Google of what they are best at - building amazing AI technology. If only they knew how to package and deliver it in an easy to use product - they would be unstoppable.

19 Reactions

CDAS360

Gemini 2.0 Flash is a major step forward! 🙌 The improvements in spatial understanding and multimodal performance are impressive. Excited to dive into this and see how it opens up new possibilities for developers. Can't wait to explore more!

4 Reactions

FluentData

Incredible leap forward! 🚀 These spatial and multimodal capabilities will transform how we build AI solutions. Let's go! ⚡

2 Reactions

Vikas Gupta ☁️

Having 22+ yr exp.- Sales & BDM in DC, Cloud & Cybersecurity,IMS ,Prof. Services HySc - AWS/Azure/GCP Staff Aug.(GCC -L1/L2/L3) /NOC, Cymmetri-IAM/PAM l Forcepoint-DLP/SASE/ZTNA l Seceon's MSOC (aiSIEM/aiXDR) l SOAR

Built on Trillium, sixth-generation TPUs, which powered 100% of Gemini 2.0 flash and today it's available to all Gemini users so that they can build with it too with Deep research feature and supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio.

1 Reaction

Evluate ✨

Google - Gemini 2.0 is delivering an abundance of features to users. Excited to learn about their experiences firsthand!

1 Reaction

황인하

Hello world

Free live multimodal model usage is crazy good. My previous project can be revived thanks to this low-latency audio to text model :D I hope there are more websocket examples on the cookbook

Rons

#Gemini 2.0 is a game-changer for builders and innovators. Google Cloud’s commitment to empowering creativity with cutting-edge AI tools is unmatched. Can’t wait to see how this transforms projects across industries, appreciate to the team for making big ideas a reality.

Jerome A. White

Senior Project Manager | Digital Transformation | Growth & Strategy Leader: Market Development and Value Creation (Cloud, AI, Agile)

Heard about this today at the AI Summit in NY. Amazing work Google Cloud . So now I don't need LiveKit nor ElevenLabs to drive my user experience.

1 Reaction

Ankit Pareek

Director of Technology Generative AI, Conversational AI, NLP, Intelligent Automation

Does it have an inbuilt audio splitter or one needs to bring in a custom one ?

1 Reaction

Imran Bashir

GenAI, MLOps, IoT | Architect | Engineer | Consultant ◈ I Partner with Tech Leaders To Solve IoT, GenAI & Video Analytics Complexities, Architect Advanced Systems, Cut Costs and Fast-Track Deployments - Let's Talk !

Beautiful. Can't wait to try out the streaming audio.

1 Reaction

See more comments

To view or add a comment, sign in

Google Cloud’s Post

More from this author

How to pay down the high security cost of legacy tech

The platform priority

To make AI more secure, AI vendors should share their vulnerability research

Explore topics