Image Understanding in Local LLMs with Opera

June 13th, 2024

A couple weeks ago we introduced Image Understanding to Aria, our browser’s built-in AI assistant. Today we’re taking that capability to the next level by bringing it directly to your device. Now, you can make queries about images using local LLMs. We’re introducing this feature to the Llava and bakllava families of local LLMs in Opera Developer. This constitutes our latest Feature Drop, and it enhances some of the over 2000 local LLMs in the browser that you can download and try out.

We understand that many of you want to use certain AI features and functionalities without having to rely on cloud server computing. And we’re committed to make the on-device AI experience as complete as possible, hence why Image Understanding is making its way to local LLMs in the developer stream of Opera.

How to get on-device Image Understanding?

There are a few easy steps to get this feature working, and I’m going to explain them to you here:

The first step to get this new feature working is to download Opera Developer.

Go to the sidebar panel and click on the Aria icon.

Click on ‘choose local AI model’ at the top of the interface, and then ‘go to settings’ in the drop-down menu that appears.

Then, you’ll need to download a local LLM from the Llava or bakllava families. These are multi-modal LLMs, which means that they can take input from more sources than just text.

Don’t worry, it’s not rocket science. Our browser UI will make it simple for you, and there’s this blog post that explains how to get them working properly.

Animated GIF showing how to access local LLMs in Opera Developer.

Once downloaded, choose the supported model to start a new chat. As an example, let’s use the following model: llava:7b-v1.5-q4_K_M

Then you’ll find a ‘+’ button to the left of the chat input box, when you click it you’ll be able to upload images to the LLM.

'+' icon where you can upload images and ask queries about them.

Choose the image you want to upload and write a prompt asking something about it.

Example of Image Understanding using local LLMs in Opera Developer.

Now you’re ready to use Image Understanding in the same way as with Aria, but this time everything is happening within your own device, giving you enhanced privacy! How cool is that? Local LLMs are becoming more popular by the day, and we’re committed to making our browser your gateway to AI features and functionalities.

The nitty-gritty about local LLMs

As with any new technologies, local LLMs are improving constantly, and that means there are obstacles to be surmounted. One of AI’s biggest challenges is to avoid hallucinations, which happens when you get an answer that doesn’t fully represent reality. Take the following image as an example:

Example of a hallucination from the on-device AI model.

The overall Image Understanding in this case is good, the essence of the image has been properly captured by the local model. However, there’s a couple hallucinations that are present here, and that need to be identified when using this kind of AI tools. For instance, the VR headset is there, but it’s not an Oculus Rift, it’s actually the Apple Vision Pro. The local model also states that there’s three dogs in total, but in reality there’s just one. What’s not a hallucination is the fact that this dalmatian is very cute and friendly, even though it’s actually just one dog.

A way around hallucinations is to use a stronger, and therefore, larger model. However, there’s always a limitation regarding the hardware at your disposal and the models it can run. That’s why the top-notch models – like Aria – run on supercomputers and rely on cloud computing to provide you access to them.

If you’re wondering whether your PC can or cannot run a local AI model, and don’t have a clue how to figure it out, don’t worry! We’ve just released a device Benchmarking tool for local AI that will answer that question for you. Bear in mind that while a larger model might be more powerful and accurate, it’s also more resource-hungry and will require better hardware.

Try Image Understanding with Opera today

Download Opera Developer and get access to over 2000 local LLM models as well as the most advanced AI features with Aria. We’re continuously working on releasing new AI Feature Drops, subscribe to our newsletter by scrolling down to the sign-up box below, and be the first to know our latest news.

________________________________________________________________________________________________________________

Note: Users must comply with applicable laws and regulations when using open-source local AI models. Opera does not assume any responsibility for their use.