Home Blog

On-device GenAI APIs as a part of ML Equipment aid you simply construct with Gemini Nano



Posted by Caren Chang – Developer Relations Engineer, Chengji Yan – Software program Engineer, Taj Darra – Product Supervisor

We’re excited to announce a set of on-device GenAI APIs, as a part of ML Equipment, that will help you combine Gemini Nano in your Android apps.

To start out, we’re releasing 4 new APIs:

    • Summarization: to summarize articles and conversations
    • Proofreading: to shine brief textual content
    • Rewriting: to reword textual content in several types
    • Picture Description: to supply brief description for photos

Key advantages of GenAI APIs

GenAI APIs are excessive degree APIs that permit for straightforward integration, much like current ML Equipment APIs. This implies you possibly can count on high quality outcomes out of the field with out additional effort for immediate engineering or superb tuning for particular use circumstances.

GenAI APIs run on-device and thus present the next advantages:

    • Enter, inference, and output knowledge is processed domestically
    • Performance stays the identical with out dependable web connection
    • No further value incurred for every API name

To forestall misuse, we additionally added security safety in numerous layers, together with base mannequin coaching, safety-aware LoRA fine-tuning, enter and output classifiers and security evaluations.

How GenAI APIs are constructed

There are 4 fundamental elements that make up every of the GenAI APIs.

  1. Gemini Nano is the bottom mannequin, as the muse shared by all APIs.
  2. Small API-specific LoRA adapter fashions are skilled and deployed on prime of the bottom mannequin to additional enhance the standard for every API.
  3. Optimized inference parameters (e.g. immediate, temperature, topK, batch dimension) are tuned for every API to information the mannequin in returning one of the best outcomes.
  4. An analysis pipeline ensures high quality in numerous datasets and attributes. This pipeline consists of: LLM raters, statistical metrics and human raters.

Collectively, these elements make up the high-level GenAI APIs that simplify the trouble wanted to combine Gemini Nano in your Android app.

Evaluating high quality of GenAI APIs

For every API, we formulate a benchmark rating primarily based on the analysis pipeline talked about above. This rating is predicated on attributes particular to a job. For instance, when evaluating the summarization job, one of many attributes we have a look at is “grounding” (ie: factual consistency of generated abstract with supply content material).

To supply out-of-box high quality for GenAI APIs, we utilized characteristic particular fine-tuning on prime of the Gemini Nano base mannequin. This resulted in a rise for the benchmark rating of every API as proven under:

Use case in English Gemini Nano Base Mannequin ML Equipment GenAI API
Summarization 77.2 92.1
Proofreading 84.3 90.2
Rewriting 79.5 84.1
Picture Description 86.9 92.3

As well as, this can be a fast reference of how the APIs carry out on a Pixel 9 Professional:

Prefix Velocity
(enter processing price)
Decode Velocity
(output technology price)
Textual content-to-text 510 tokens/second 11 tokens/second
Picture-to-text 510 tokens/second + 0.8 seconds for picture encoding 11 tokens/second

Pattern utilization

That is an instance of implementing the GenAI Summarization API to get a one-bullet abstract of an article:

val articleToSummarize = "We're excited to announce a set of on-device generative AI APIs..."

// Outline job with desired enter and output format
val summarizerOptions = SummarizerOptions.builder(context)
    .setInputType(InputType.ARTICLE)
    .setOutputType(OutputType.ONE_BULLET)
    .setLanguage(Language.ENGLISH)
    .construct()
val summarizer = Summarization.getClient(summarizerOptions)

droop enjoyable prepareAndStartSummarization(context: Context) {
    // Verify characteristic availability. Standing can be one of many following: 
    // UNAVAILABLE, DOWNLOADABLE, DOWNLOADING, AVAILABLE
    val featureStatus = summarizer.checkFeatureStatus().await()

    if (featureStatus == FeatureStatus.DOWNLOADABLE) {
        // Obtain characteristic if mandatory.
        // If downloadFeature is just not known as, the primary inference request will 
        // additionally set off the characteristic to be downloaded if it isn't already
        // downloaded.
        summarizer.downloadFeature(object : DownloadCallback {
            override enjoyable onDownloadStarted(bytesToDownload: Lengthy) { }

            override enjoyable onDownloadFailed(e: GenAiException) { }

            override enjoyable onDownloadProgress(totalBytesDownloaded: Lengthy) {}

            override enjoyable onDownloadCompleted() {
                startSummarizationRequest(articleToSummarize, summarizer)
            }
        })    
    } else if (featureStatus == FeatureStatus.DOWNLOADING) {
        // Inference request will routinely run as soon as characteristic is      
        // downloaded.
        // If Gemini Nano is already downloaded on the system, the   
        // feature-specific LoRA adapter mannequin can be downloaded very  
        // shortly. Nevertheless, if Gemini Nano is just not already downloaded, 
        // the obtain course of might take longer.
        startSummarizationRequest(articleToSummarize, summarizer)
    } else if (featureStatus == FeatureStatus.AVAILABLE) {
        startSummarizationRequest(articleToSummarize, summarizer)
    } 
}

enjoyable startSummarizationRequest(textual content: String, summarizer: Summarizer) {
    // Create job request  
    val summarizationRequest = SummarizationRequest.builder(textual content).construct()

    // Begin summarization request with streaming response
    summarizer.runInference(summarizationRequest) { newText -> 
        // Present new textual content in UI
    }

    // You too can get a non-streaming response from the request
    // val summarizationResult = summarizer.runInference(summarizationRequest)
    // val abstract = summarizationResult.get().abstract
}

// Be sure you launch the useful resource when not wanted
// For instance, on viewModel.onCleared() or exercise.onDestroy()
summarizer.shut()

For extra examples of implementing the GenAI APIs, take a look at the official documentation and samples on GitHub:

Use circumstances

Right here is a few steering on tips on how to finest use the present GenAI APIs:

For Summarization, think about:

    • Dialog messages or transcripts that contain 2 or extra customers
    • Articles or paperwork lower than 4000 tokens (or about 3000 English phrases). Utilizing the primary few paragraphs for summarization is often adequate to seize an important data.

For Proofreading and Rewriting APIs, think about using them in the course of the content material creation course of for brief content material under 256 tokens to assist with duties equivalent to:

    • Refining messages in a selected tone, equivalent to extra formal or extra informal
    • Sprucing private notes for simpler consumption later

For the Picture Description API, think about it for:

    • Producing titles of photos
    • Producing metadata for picture search
    • Using descriptions of photos in use circumstances the place the photographs themselves can’t be displayed, equivalent to inside an inventory of chat messages
    • Producing different textual content to assist visually impaired customers higher perceive content material as a complete

GenAI API in manufacturing

Envision is an app that verbalizes the visible world to assist people who find themselves blind or have low imaginative and prescient lead extra unbiased lives. A standard use case within the app is for customers to take an image to have a doc learn out loud. Using the GenAI Summarization API, Envision is now in a position to get a concise abstract of a captured doc. This considerably enhances the person expertise by permitting them to shortly grasp the details of paperwork and decide if a extra detailed studying is desired, saving them effort and time.

side by side images of a mobile device showing a document on a table on the left, and the results of the scanned document on the right showing details providing the what, when, and where as written in the document

Supported units

GenAI APIs can be found on Android units utilizing optimized MediaTek Dimensity, Qualcomm Snapdragon, and Google Tensor platforms via AICore. For a complete listing of units that assist GenAI APIs, consult with our official documentation.

Be taught extra

Begin implementing GenAI APIs in your Android apps in the present day with steering from our official documentation and samples on GitHub: AI Catalog GenAI API Samples with Compose, ML Equipment GenAI APIs Quickstart.

The way forward for power: How innovation and infrastructure are wanted to answer AI development


In 2024, the Worldwide Power Company (IEA) estimated that information facilities accounted for roughly 1.5 % of worldwide electrical energy demand. That quantity is anticipated to greater than double by 2030, pushed largely by the rise in AI infrastructure. To place this into perspective, that improve could be equal to Japan’s complete electrical energy consumption at the moment.

How will we meet this rising want for power — not solely from synthetic intelligence (AI), however from different digital applied sciences and the electrification of industries, equivalent to transportation and buildings?

Whereas this problem could seem daunting, there may be cause for optimism. We’re seeing quite a few improvements and applied sciences paving the way in which ahead, together with developments in additional energy-efficient information facilities, breakthroughs in liquid cooling, improved software program fashions, elevated power safety, and the exploration of other power sources.

I not too long ago had the chance to debate the way forward for power with Mary de Wysocki, SVP and Chief Sustainability Officer at Cisco; Adele Trombetta, SVP & GM, CX EMEA at Cisco; and Christopher Wellise, VP of Sustainability at Equinix, a Cisco buyer that gives international digital infrastructure and colocation providers. Listed here are a few of the highlights from our dialog.

Q: How are clients adjusting their methods in response to the power image proper now?

Adele: Our clients and companions are driving sturdy demand for AI deployment to unlock its advantages and keep aggressive. This development spans throughout industries in each the private and non-private sectors. The strain is simple, however the power influence of AI, significantly generative AI, can be on our clients’ minds, particularly as giant language fashions (LLMs) are being deployed and educated at scale. A lot of our clients throughout Europe, the Center East and Africa (EMEA) have net-zero targets, so they should handle the accelerating adoption of AI fastidiously. They’re approaching this with sustainability in thoughts, making it a core a part of technique growth quite than an afterthought. It’s about adopting AI whereas concurrently managing its power influence.

Mary: Curiously, views can fluctuate relying on who you’re talking with — whether or not it’s clients, companions, or suppliers. A standard theme rising, significantly in mild of worldwide dynamics, is resiliency. There’s a clear deal with proactive investments to safe the power wanted for the longer term. It’s additionally about discovering methods to maneuver ahead collaboratively and figuring out alternatives for co-investment. The priorities are clear: we’d like development, resilience, and a extra sustainable method.

Chris: We’re seeing a number of key themes throughout the shopper panorama. Resiliency and reliability are high priorities, with clients targeted on making certain their functions run easily. Regulatory compliance is one other main concern, particularly in areas just like the European Union (EU) with directives such because the EU Power Effectivity Directive. One other request is for end-to-end options that optimize operations throughout all the worth chain in addition to help sustainability reporting and regulatory necessities. As clients undertake hybrid multi-cloud environments, they’re eager to optimize power use throughout platforms and areas. Lastly, partnerships are vital. Prospects acknowledge the necessity to collaborate with suppliers, power suppliers, and others to satisfy their targets and optimize power use. For instance, within the Cisco-Equinix partnership, 70% of units linked to the Equinix material run on Cisco expertise.

Q: We all know information facilities are the inspiration for supporting the AI growth and managing its associated power wants. What are some technological developments which are taking place within the information heart?

Mary: Designing merchandise with power effectivity in thoughts is a vital first step in delivering enterprise outcomes and addressing sustainability. For instance, Cisco’s Silicon One chip is engineered to be each energy-efficient and optimized for AI workloads, enabling clients to scale back energy consumption whereas assembly the rising calls for of contemporary networks and data-intensive functions. Along with that, a foundational innovation for patrons, companions, and suppliers is our Sustainability Information Basis (SDF). It offers a single supply of reality, providing the information wanted to handle carbon footprints and progress towards net-zero targets. This info empowers expertise leaders with the instruments to higher handle power and drive sustainability.

Chris: Designing for effectivity is so necessary. Since 2021, we have now required all new construct websites to pursue LEED or an equal inexperienced constructing certification to show adherence to acknowledged sustainability greatest practices in design and development. Information facilities, constructed to final 20 to 30 years, require optimization in each design and operations. Innovation in cooling is particularly necessary as a result of cooling usually accounts for over half of power consumption. Over 100 Equinix information facilities are actually enabled with entry to liquid cooling expertise, equivalent to warmth exchangers or direct-to-chip cooling. Within the latter, a copper plate, fluid, and closed-loop system take away warmth immediately from the chip whereas utilizing chemical compounds to stop erosion and bio slimes. From a sustainability perspective, this concentrated warmth turns into extremely usable. For instance, in Helsinki, Finland, warmth from information facilities warms over 10,000 houses, and over the past Summer time Olympics, the aquatic heart swimming pools had been heated by an Equinix information heart. Moreover, AI-powered superior software program can create digital twins to optimize cooling parameters and cut back power consumption for cooling.

Adele: Cisco’s new merchandise now combine each sustainability and safety into the design course of. Prospects more and more wish to perceive how we deploy, monitor, and optimize expertise to handle power consumption, efficiency, and AI. Based on a Gartner research, “By 2030, greater than 70% of information facilities will monitor sustainability metrics, up from roughly 10% at the moment.” (supply: Gartner®). Collaboration throughout the companion and buyer ecosystem is essential to modernization and environment friendly useful resource use. Coordinating numerous information — starting from Cisco networking gear to grid information, climate, location, and IT/OT techniques — presents a posh however thrilling problem. With Splunk, we will streamline this course of and generate the insights wanted for efficient info stream and optimization.

Q: How properly is the worldwide electrical energy infrastructure outfitted to deal with rising electrical energy demand?

Chris: Our major problem lies extra in distribution than in provide, and the explanations for this fluctuate by area. In the USA, getting old infrastructure and sophisticated coverage and regulatory environments performs a job. In Europe, whereas there may be speedy development in renewable power, integrating it successfully into the grid stays a problem. In Asia, the state of affairs is extra numerous, with each speedy renewable power enlargement and a continued heavy reliance on fossil fuels. To deal with these points successfully, it’s essential to handle each distribution and provide concurrently.

Mary: Generative AI requires important power, prompting the query: how can we guarantee dependable entry to the grid? In New York Metropolis, I see each alternative and problem within the grid. The U.S. grid, constructed principally within the ’60s and ’70s, lacks reliability and resilience, with 70% of transmission strains over 25 years previous. We see the potential of AI to assist deal with main challenges, however its success depends upon modernizing the grid and information facilities. Industrial IoT can play a key function in creating sensible, safer grids that maximize obtainable power, help numerous power sources and allow predictive upkeep.

Adele: We’re partnering with clients longing for digital transformation, together with power firms supporting vital nationwide infrastructure. Leveraging this shift, we’re specializing in creating extra sustainable options and we’re constructing sensible grids that prioritize effectivity. Whereas AI remains to be in its early phases, ongoing collaboration and partnership with utilities is significant to make sure flexibility and flexibility for his or her evolving wants.

Concerned with studying extra about the way forward for power and the affect of AI? Be part of me in particular person as I lead a dialogue about this matter at Cisco Stay US in San Diego from June 8-12. The session will happen on Tuesday, June 10 from 2 to 2:30pm PT, and you’ll register right here.


Supply: Gartner, [10 Performance Metrics to Improve Data Center Sustainability], [Henrique Cecci, Autumn Stanish], [14 February 2025]

GARTNER is a registered trademark and repair mark of Gartner, Inc. and/or its associates within the U.S. and internationally and is used herein with permission. All rights reserved.

Share:

DriveNets extends AI networking material with multi-site capabilities for distributed GPU clusters



“We use the identical bodily structure as anybody with prime of rack after which leaf and backbone swap,” Dudy Cohen, vice chairman of product advertising at DriveNets, advised Community World. “However what occurs between our prime of rack, which is the swap that connects NICs (community interface playing cards) into the servers and the remainder of the community isn’t based mostly on Clos Ethernet structure, fairly on a really particular cell-based protocol. [It’s] the identical protocol, by the best way, that’s used within the backplane of the chassis.”

Cohen defined that any information packet that comes into an ingress swap from the NIC is lower into evenly sized cells, sprayed throughout the whole material after which reassembled on the opposite aspect. This method distinguishes DriveNets from different options that may require specialised parts resembling Nvidia BlueField DPUs (information processing models) on the endpoints.

“The material hyperlinks between the highest of rack and the backbone are completely load balanced,” he mentioned. “We don’t use any hashing mechanism… and that is why we are able to include all of the congestion avoidance throughout the material and don’t want any exterior help.”

Multi-site implementation for distributed GPU clusters

The multi-site functionality permits organizations to beat energy constraints in a single information middle by spreading GPU clusters throughout areas.

This isn’t designed as a backup or failover mechanism. Lasser-Raab emphasised that it’s a single cluster in two areas which can be as much as 80 kilometers aside, which permits for connection to totally different energy grids.

The bodily implementation sometimes makes use of high-bandwidth connections between websites. Cohen defined that there’s both darkish fiber or some DWDM (Dense Wavelength Division Multiplexing) fibre optic connectivity between the websites. Sometimes the connections are bundles of 4 800 gigabit ethernet, appearing as a single 3.2 terabit per second connection.

Anchore SBOM, Komodor integrates into IDPs, and Shopify’s new dev instruments – SD Instances Day by day Digest


Anchore is enabling “Convey Your Personal SBOMs” with the discharge of Anchore SBOM, which gives a centralized place to view, handle, and analyze SBOMs created internally and from third-party software program. 

SBOMs may be imported if they’re in SPDX model 2.1-2.3, CycloneDX model 1.0-1.6, and Syft native codecs. 

“We constructed Anchore Enterprise to be embedded into the CI/CD pipeline – it analyzes OSS dangers, enforces coverage gates all through supply, and scans constantly thereafter. SBOMs are on the core of how we set up belief within the supply pipeline and due to this fact within the software program you’re delivering,” stated Neil Levine, SVP of product at Anchore.

Komodor integrates into IDPs

Komodor is thought for its day-2 Kubernetes operations administration, spanning monitoring, troubleshooting, efficiency optimization, and price administration. With new help for Backstage and Port (and extra to come back), the corporate is bringing these administration capabilities into developer workflows. 

Key capabilities of the mixing embody the power to view real-time standing of deployed companies, step-by-step troubleshooting directions, efficiency monitoring, role-based entry management, and fleet administration for platform groups. 

“Inner developer platforms have emerged to simplify software program supply, however Kubernetes stays a bottleneck that’s advanced, opaque, and disconnected from the developer expertise,” stated Itiel Shwartz, co-founder and CTO of Komodor. “By embedding Komodor into Backstage and Port, we’re giving builders a safe and straightforward technique to see, perceive, and repair points of their companies, proper from the portal. It’s the lacking piece that makes IDPs really self-service for addressing K8s points.”

Shopify releases new developer instruments

It’s launching a brand new unified developer platform that integrates the Dev Dashboard and CLI and presents AI-powered code technology. Builders may also now create “dev shops” the place they will preview apps in take a look at environments, a characteristic that was beforehand solely out there to Plus plans, and is now out there to all builders.

Different new options introduced right now embody declarative customized knowledge definitions, a unified Polaris UI toolkit, and Storefront MCP, which permits builders to construct AI brokers that may act as buying assistants for shops.   

Peacock constructed adaptively on Android to ship nice experiences throughout screens



Posted by Sa-ryong Kang and Miguel Montemayor – Developer Relations Engineers

Peacock constructed adaptively on Android to ship nice experiences throughout screens

Peacock is NBCUniversal’s streaming service app obtainable within the US, providing culture-defining leisure together with stay sports activities, unique authentic content material, TV reveals, and blockbuster motion pictures. The app continues to evolve, turning into greater than only a platform to look at content material, however a hub of leisure.

Right this moment’s customers are consuming leisure on an more and more wider array of gadget sizes and kinds, and specifically are transferring in direction of cellular gadgets. Peacock has adopted Jetpack Compose to assist with its journey in adapting to extra screens and assembly customers the place they’re.

Disclaimer: Peacock is accessible within the US solely. This video will solely be viewable to US viewers.

Adapting to extra versatile type components

The Peacock improvement staff is concentrated on bringing the perfect expertise to customers, it doesn’t matter what gadget they’re utilizing or after they wish to devour content material. With an rising development from app customers to look at extra on cellular gadgets and huge screens like foldables, the Peacock app wants to have the ability to adapt to completely different display screen sizes. As extra gadgets are launched, the staff wanted to discover new options that take advantage of out of every distinctive show permutation.

The aim was to have the Peacock app to adapt to those new shows whereas frequently providing high-quality leisure with out interruptions, just like the stream reloading or visible errors. Whereas considering forward, additionally they needed to arrange and construct an answer that was prepared for Android XR because the leisure panorama is shifting in direction of together with extra immersive experiences.

quote card featuring a headshot of Diego Valente, Head of Mobile, Peacock & Global Streaming, reads 'Thinking adaptively isn't just about supporting tablets or large screens - it's about future proofing your app. Investing in adaptability helps you meet user's expectations of having seamless experiencers across all their devices and sets you up for what's next.'

Constructing a future-proof expertise with Jetpack Compose

With the intention to construct a scalable answer that may assist the Peacock app proceed to evolve, the app was migrated to Jetpack Compose, Android’s toolkit for constructing scalable UI. One of many important instruments they used was the WindowSizeClass API, which helps builders create and take a look at UI layouts for various dimension ranges. This API then permits the app to seamlessly change between pre-set layouts because it reaches established viewport breakpoints for various window sizes.

The API was used at the side of Kotlin Coroutines and Flows to maintain the UI state responsive because the window dimension modified. To check their work and nice tune edge case gadgets, Peacock used the Android Studio emulator to simulate a variety of Android-based gadgets.

Jetpack Compose allowed the staff to construct adaptively, so now the Peacock app responds to all kinds of screens whereas providing a seamless expertise to Android customers. “The app feels extra native, extra fluid, and extra intuitive throughout all type components,” mentioned Diego Valente, Head of Cell, Peacock and International Streaming. “Which means customers can begin watching on a smaller display screen and proceed immediately on a bigger one after they unfold the gadget—no reloads, no friction. It simply works.”

Getting ready for immersive leisure experiences

In constructing adaptive apps on Android, John Jelley, Senior Vice President, Product & UX, Peacock and International Streaming, says Peacock has additionally laid the groundwork to shortly adapt to the Android XR platform: “Android XR builds on the identical massive display screen ideas, our funding right here naturally extends to these rising experiences with much less developmental work.”

The staff is happy in regards to the prospect of options unlocked by Android XR, like Multiview for sports activities and TV, which allows customers to look at a number of video games or digital camera angles without delay. By tailoring spatial home windows to the consumer’s atmosphere, the app might provide new methods for customers to work together with contextual metadata like sports activities stats or actor data—all with out ever interrupting their expertise.

Construct adaptive apps

Discover ways to unlock your app’s full potential on telephones, tablets, foldables, and past.

Discover this announcement and all Google I/O 2025 updates on io.google beginning Could 22.