Google Just Dropped a Bombshell! The Upgraded Multimodal Capabilities Are Insanely Powerful
/ 3 min read /
Table of Contents 目录
Hi everyone, I’m luckySnail. Yesterday, Google officially released the native multimodal image generation feature for Gemini 2.0 Flash Experimental. After playing with it for a few hours, I couldn’t wait to share — it’s incredibly powerful! Let’s take a look at what it can do:
- Image generation
- Image editing
- Creating image stories
- Designing birthday cards
Let’s first check out the results, and then I’ll explain how to try it. If you want to jump right in, you can scroll down to the “Tutorial” section.
Image Generation
Here’s an image of my favorite SpongeBob and Patrick, generated all at once by Gemini — ready to use right away!

Now we can use our imagination and generate a picture of Tom and Jerry shaking hands!

Image Editing
The text placement in the Tom and Jerry image above might not be ideal — we can adjust it through conversation:

We can also remix images from the web:

Creating Image Stories
This is the most promising feature. Let’s use it to generate a story of a food delivery rider picking up an order and delivering it to the customer:

Here’s an example of generating a game character from scratch:
Amazing!
Designing Birthday Cards
I think this is a very practical feature. One idea I have right now is to use it to design wedding invitations. What do you think of the result?
Design a Chinese-style wedding invitation card. Use Chinese red as the theme color, with large text reading: "Xiao Zhang ❤️ Xiao Wang's Wedding Invitation"
Tutorial
Right now, it’s still free and unlimited from Google — what a generous move from a tech giant! First, open this URL in your browser: https://aistudio.google.com/ (you’ll need a VPN).
You’ll see this:

When you first enter, I suggest trying out the three sample cards provided by the official site to understand how it works. Then you can let your imagination run wild! I really envy those creative minds now.
Summary
After using it for a few hours, I feel the generation capability is already very strong, though sometimes it returns ⚠️. In terms of fine-tuning images, it feels like it has reached a productive level. Right now, the image story generation feature seems the most exciting — it can generate images with narrative context in one go, perfect for content creation. If you think plain text is too dry, give it a try — it’s really useful!
I recently launched my own product: https://www.svgshow.cn . It’s a website that helps you quickly turn content into beautiful images, with online editing capabilities. My cover image was generated using it.