Posted by Thomas Ezan – Developer Relations Engineer, Rebecca Franks – Developer Relations Engineer, and Avneet Singh – Product Supervisor
We’re bringing again Androidify later this 12 months, this time powered by Google AI, so you possibly can customise your very personal Android bot and share your creativity with the world. At the moment, we’re releasing a brand new open supply demo app for Androidify as an incredible instance of how Google is utilizing its Gemini AI fashions to boost app experiences.
On this submit, we’ll dive into how the Androidify app makes use of Gemini fashions and Imagen by way of the Firebase AI Logic SDK, and we’ll present some insights discovered alongside the best way that can assist you incorporate Gemini and AI into your individual tasks. Learn extra in regards to the Androidify demo app.
App circulate
The general app features as follows, with varied components of it utilizing Gemini and Firebase alongside the best way:

Gemini and picture validation
To get began with Androidify, take a photograph or select a picture in your gadget. The app must make it possible for the picture you add is appropriate for creating an avatar.
Gemini 2.5 Flash by way of Firebase helps with this by verifying that the picture incorporates an individual, that the particular person is in focus, and assessing picture security, together with whether or not the picture incorporates abusive content material.
val jsonSchema = Schema.obj( properties = mapOf("success" to Schema.boolean(), "error" to Schema.string()), optionalProperties = listOf("error"), ) val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()) .generativeModel( modelName = "gemini-2.5-flash-preview-04-17", generationConfig = generationConfig { responseMimeType = "utility/json" responseSchema = jsonSchema }, safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.LOW_AND_ABOVE), SafetySetting(HarmCategory.CIVIC_INTEGRITY, HarmBlockThreshold.LOW_AND_ABOVE), ), ) val response = generativeModel.generateContent( content material { textual content("You're to investigate the supplied picture and decide whether it is acceptable and applicable primarily based on particular standards.... (extra particulars see the total pattern)") picture(picture) }, ) val jsonResponse = Json.parseToJsonElement(response.textual content) val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true val error = jsonResponse.jsonObject["error"]?.jsonPrimitive?.content material
Within the snippet above, we’re leveraging structured output capabilities of the mannequin by defining the schema of the response. We’re passing a Schema object by way of the responseSchema param within the generationConfig.
We need to validate that the picture has sufficient info to generate a pleasant Android avatar. So we ask the mannequin to return a json object with success = true/false and an elective error message explaining why the picture would not have sufficient info.
Structured output is a strong characteristic enabling a smoother integration of LLMs to your app by controlling the format of their output, much like an API response.
Picture captioning with Gemini Flash
As soon as it is established that the picture incorporates enough info to generate an Android avatar, it’s captioned utilizing Gemini 2.5 Flash with structured output.
val jsonSchema = Schema.obj( properties = mapOf( "success" to Schema.boolean(), "user_description" to Schema.string(), ), optionalProperties = listOf("user_description"), ) val generativeModel = createGenerativeTextModel(jsonSchema) val immediate = "You're to create a VERY detailed description of the principle particular person within the given picture. This description will likely be translated right into a immediate for a generative picture mannequin..." val response = generativeModel.generateContent( content material { textual content(immediate) picture(picture) }) val jsonResponse = Json.parseToJsonElement(response.textual content!!) val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true val userDescription = jsonResponse.jsonObject["user_description"]?.jsonPrimitive?.content material
The opposite possibility within the app is to begin with a textual content immediate. You possibly can enter in particulars about your equipment, coiffure, and clothes, and let Imagen be a bit extra inventive.
Android technology by way of Imagen
We’ll use this detailed description of your picture to counterpoint the immediate used for picture technology. We’ll add further particulars round what we wish to generate and embrace the bot colour choice as a part of this too, together with the pores and skin tone chosen by the person.
val imagenPrompt = "A 3D rendered cartoonish Android mascot in a photorealistic fashion, the pose is relaxed and easy, dealing with instantly ahead [...] The bot seems as follows $userDescription [...]"
We then name the Imagen mannequin to create the bot. Utilizing this new immediate, we create a mannequin and name generateImages:
// we provide our personal fine-tuned mannequin right here however you should use "imagen-3.0-generate-002" val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()).imagenModel( "imagen-3.0-generate-002", safetySettings = ImagenSafetySettings( ImagenSafetyFilterLevel.BLOCK_LOW_AND_ABOVE, personFilterLevel = ImagenPersonFilterLevel.ALLOW_ALL, ), ) val response = generativeModel.generateImages(imagenPrompt) val picture = response.pictures.first().asBitmap()
And that’s it! The Imagen mannequin generates a bitmap that we are able to show on the person’s display.
Finetuning the Imagen mannequin
The Imagen 3 mannequin was finetuned utilizing Low-Rank Adaptation (LoRA). LoRA is a fine-tuning method designed to cut back the computational burden of coaching giant fashions. As a substitute of updating your entire mannequin, LoRA provides smaller, trainable “adapters” that make small adjustments to the mannequin’s efficiency. We ran a positive tuning pipeline on the Imagen 3 mannequin typically accessible with Android bot belongings of various colour mixtures and completely different belongings for enhanced cuteness and enjoyable. We generated textual content captions for the coaching pictures and the image-text pairs have been used to finetune the mannequin successfully.
The present pattern app makes use of a regular Imagen mannequin, so the outcomes might look a bit completely different from the visuals on this submit. Nevertheless, the app utilizing the fine-tuned mannequin and a customized model of Firebase AI Logic SDK was demoed at Google I/O. This app will likely be launched later this 12 months and we’re additionally planning on including assist for fine-tuned fashions to Firebase AI Logic SDK later within the 12 months.

ML Package
The app additionally makes use of the ML Package Pose Detection SDK to detect an individual within the digicam view, which triggers the seize button and provides visible indicators.
To do that, we add the SDK to the app, and use PoseDetection.getClient(). Then, utilizing the poseDetector, we take a look at the detectedLandmarks which are within the streaming picture coming from the Digital camera, and we set the _uiState.detectedPose to true if a nostril and shoulders are seen:
non-public droop enjoyable runPoseDetection() { PoseDetection.getClient( PoseDetectorOptions.Builder() .setDetectorMode(PoseDetectorOptions.STREAM_MODE) .construct(), ).use { poseDetector -> // Since picture evaluation is processed by ML Package asynchronously in its personal thread pool, // we are able to run this instantly from the calling coroutine scope as a substitute of pushing this // work to a background dispatcher. cameraImageAnalysisUseCase.analyze { imageProxy -> imageProxy.picture?.let { picture -> val poseDetected = poseDetector.detectPersonInFrame(picture, imageProxy.imageInfo) _uiState.replace { it.copy(detectedPose = poseDetected) } } } } } non-public droop enjoyable PoseDetector.detectPersonInFrame( picture: Picture, imageInfo: ImageInfo, ): Boolean { val outcomes = course of(InputImage.fromMediaImage(picture, imageInfo.rotationDegrees)).await() val landmarkResults = outcomes.allPoseLandmarks val detectedLandmarks = mutableListOf() for (landmark in landmarkResults) { if (landmark.inFrameLikelihood > 0.7) { detectedLandmarks.add(landmark.landmarkType) } } return detectedLandmarks.containsAll( listOf(PoseLandmark.NOSE, PoseLandmark.LEFT_SHOULDER, PoseLandmark.RIGHT_SHOULDER), ) }

Get began with AI on Android
The Androidify app makes an in depth use of the Gemini 2.5 Flash to validate the picture and generate an in depth description used to generate the picture. It additionally leverages the particularly fine-tuned Imagen 3 mannequin to generate pictures of Android bots. Gemini and Imagen fashions are simply built-in into the app by way of the Firebase AI Logic SDK. As well as, ML Package Pose Detection SDK controls the seize button, enabling it solely when an individual is current in entrance of the digicam.
To get began with AI on Android, go to the Gemini and Imagen documentation for Android.
Discover this announcement and all Google I/O 2025 updates on io.google beginning Might 22.