Now I want to talk about Gemini today, for a specific reason. I think it’s in depth integration with Google Suite and the fact that for me personally google surrounds all my work, I have the ability to use it across different products.
Gemini can read my emails and summarize them, it can watch youtube videos, it can do research, it can search through your google drive, it can do so many things.
Gemini has the biggest context window, meaning that it can process most documents and content at the same time and provide you an answer based on that, without relying on RAG systems or whatever else.
What I’m trying to say is that google has a serious edge over everybody else when it comes to the user eco-systems.
I decided to play around with it and the UI is pretty sweet. For example I found an old google sheet where I did some estimations for client proposals, and asked gemini to summarize it.
Now that Gemini knew what the project was about, I asked it for recommendations, naturally it proposed some, which clearly have to be delegated to an AI. Here the lies began ☹️
That made me real sad. I think Gemini was just hallucinating and instead of being able to act on its promises it just imagined it can do things when in fact it cannot 😢
But gemini 2.0 isn’t only about text, it can now also process images, videos, audio etc, so I gave it a try.
I provided it a PDF with about 30 pages of content in it, tables mainly, and when I asked to sum-up the values it failed miserably, it said it needs time to re-evaluate it’s decision, but when I asked when will I know the answer, it stalled and then said I don’t know what you’re talking about.
To be frank, LLMs always have errors with precision, like adding numbers is a complex task for an approximation algorithm, ultimately it’s not really intelligent, it’s just approximating values. So a task like this would require some extra input and functionality to coded separately.
Since approximation is the name of the game, I went this way with the next task. I uploaded an image and asked about the ceramic paintings. Specifically the following image, to make sure I know it talks about the right place, I also asked it to point me to the location of the place. Given the nature of LLMs to be good at approximations, it should point me in the right direction.
And it was correct, it offered me 2 options highlighting option number one, which is Chapel of Souls in Porto, Portugal. Second option was Porto Cathedral, it’s also close enough given the lack of context of the image. That’s a cool one. This is how you can cheat at geo-guesser. 😉
I didn’t want to try video and other stuff, because well the concept is pretty clear to me, it approximates greatly, but it’s a little hard to compute the hard facts. There’s still some way to go, and depending on use-case some other things could be computed greatly using alternative solutions.
Final words for today, while the magic is still in the works, today, now we observe how magic is happening. Today is the worst yet, and the future is going to be astonishing. 🚀