Texti Newsletter #14

This week is all about the big players! They're in the spotlight 🔦
What is it really about? Let's dive in.

The highlights of the week:

Texti on Tiktok
OpenAI Sora
Google Gemini 1.5

Screenshot of the homepage of @texti's tiktok page.

Our Newsletter is now on Tiktok!

Our newsletter is now available on tiktok. Follow us to see wrapped content on our page. It's not as rich and enhanced as the email newsletter, but we're slowly getting there. Don't miss-out to stay connect with us, stay in touch through our tiktok profile.

Also follow us on twitter and on Linkedin. We post quite often in the other platforms and this way you'll definitely stay on top of the game.

Open AI Sora

TLDR:

OpenAI just showed the capabilities of a new text 2 video tool that can generate videos of up to 60 seconds, and it is incredible, it's 10x better what we had before.

Long Story:

As predicted this year will be all about text2video. We've learned how to make high quality pictures (midjourney is just on another level), now we're expanding our capabilities and pushing our hardware to the next level.

This week, OpenAI presented their fresh out of the oven, tool called Sora, and frankly it's just mind-blowing 🤯. Why? Simply because the quality of the video it's producing is just unbelievable. The shapes, the lines, the motion, the fingers – are still delusional, but other than that it's really good.

It is currently in a closed alpha, where only some influencers have access to it. If you share the word about me, I might be one of them at some point in life 🤞.

Anyways, the tech is still the same, take a text, make a picture, and now generate another (60seconds * 60pictures per second) 3600pictures, put them one after another and voilá you have a video. The thing is though, it's sooo much smoother than what Pika does (check this post for context). It has a story line, it does look like a video, not just a weird cartoon. There are either some clever algorithms behind the scene, either AI has a budget of $7 Trillion, and can afford more computing power – regardless it makes a massive difference.

Based on the numbers alone, we can assume that video generation is going to be expensive and slow. I believe the tech will revert to what we've had about 50 years ago, first we make a 200x200px video check it in low res, to make sure we've got the right result, and then just upscale it like crazy to 4k.
Check the demos from Open AI's website.

Prompt: A beautiful silhouette animation shows a wolf howling at the moon, feeling lonely, until it finds its pack.

Prompt: A litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in.

Well aren't they cute? Awww 🥰.
It still has it's flaws, like physics are often all over the place, it sometimes lacks context, it doesn't really know how to interact with objects, as in if in the generated video, somebody bites a cookie, the bite won't disappear, the cookie will still be brand new. Just take a look at the examples bellow:

Prompt: Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing.

Prompt: Basketball through hoop then explodes.Prompt: Basketball through hoop then explodes.

You can clearly see, that is has problems with physics and the way it interacts with objects. It still tries to copy existing videos into reality.

It is clearly flawed, but that's just the beginning. It's only 1 year old. Just think what's it going to be able to do by the age of 5.

Gemini 1.5

TLDR:

Google launched Gemini 1.5, it's extra efficient due it's novel architecture, Transformer + MoE, which roughly means that's an inception of AI-s working on a task, and the killer features is that it's context size is up to 1,000,000 Tokens.

Long Story:

Google made history this week. It made an AI that can understand 1.000.000 Tokens at a time! It also applied a new technological architecture, that is much more clever then before and can use it's own resources way more efficient.

- You ask what the hell this even mean?
- Well, AIs are pretty stupid, believe or not, they have the knowledge of 1000 libraries, but they don't know how to apply that knowledge in a specific situation. So what an AI needs to answer more appropriate is something like a short term memory, where it gathers a bunch of information about the state we are in, and then applies the knowledge it has to provide an answer. Now words for AI are measured in Tokens, because AI has little bit of an intricate way of working, imagine it remembers only syllabus in it's library, where each syllabus is a token.

Now to this day OpenAI's GPT4 Turbo had a max memory of 128k Tokens. This week however this barrier was smashed into pieces, Google's AI is now able to remember up to 1.000.000 Tokens.

- You might why does this even matter? Isn't 128k enough to remember the current state already?
- Well technically yes, 128k is fine and can handle most of the day to day situation. But if you for example want to add it into a project, or a video, or an audio book, then 128k are not even close to be enough, you need more! (We always need more 😅).
With 1.000.000 Tokens is now able to remember 1 hour of video, or 11 hours of audio, or 30k lines of code, or even 700k words.

- Okay clear, how about that novel architecture what does that mean? What's so revolutionary about it?
- You see, Google says they used a mixture of Transformer (the most modern algorithm for determining the most appropriate tokens) and MoE (an additional layer which is made from mini-AIs expert in a specific field). So basically a way to orchestrate a lot of information in a very smart way. These technical novelties make this a new industry top. This makes it stand-out of the crowd.

- Alright gotcha, it now makes a little more sense. Thanks.

The 1Million tokens is a really big deal, it means it now can go way further down the road. It can now actually build a big project, not of an enterprise level, but for most small businesses this is going to be way more than enough. 30k lines of code isn't something incredible in terms of amounts, but it means, the AI can handle a lot more in it's short term memory. Now combining Sora and the capabilities of Gemini 1.5, we could in theory generate a 1 hour long video, which would contain and be able to figure out an entire story. Honestly sounds fascinating and crazy.

Tomorrow's Christopher Nolan might be any kid with access to internet.

That's it folks, see you next week ❤️️️️️️️

Happy Prompting!

Remember to invite your friends to subscribe at https://newsletter.texti.app