CHROME

Update #1

Built-in AI Early Preview Program

Welcome and heads-up about the Prompt API

Authors

Kenji Baheux

Thomas Steiner

Alexandra Klepper

Contact

See this section

Last-updated

Jul 25, 2024

See changelog.

Latest news 📯

Intro

Welcome, and thank you for participating in our Early Preview Program for built-in AI capabilities (article, talk at Google I/O 2024). Your involvement is invaluable as we explore opportunities to improve or augment web experiences with AI!

The Built-in AI Early Preview Program has several goals:

📣

Know of other folks who would love to join this program? Or perhaps you got access to this document from a friend?

Sign up to get the latest updates directly in your inbox.

In this first update, we're excited to provide details about the upcoming exploratory Prompt API, designed to facilitate the discovery of AI use cases through local prototyping. More concretely, this API will let you interact with Gemini Nano, on-device, directly in your development environment.

📝

Exploratory APIs are intended for local prototyping only. With these exploratory APIs, we intend to solicit feedback, confirm assumptions, and determine what task APIs (such as a Translation API) we build in the future. Consequently, the exploratory APIs may never launch.

As we explore the possibilities of this technology together, it’s critical that we prioritize responsible AI development. To help guide our efforts, take a moment to review Google's Generative AI Prohibited Uses Policy. This policy outlines some key considerations for ethical and safe deployment of AI.

Let's learn and build together ✨!

Prompt API

Purpose

The Prompt API is provided for local experimentation, to facilitate the discovery of use cases for built-in AI. With this API, you can send natural language instructions to an instance of Gemini Nano in Chrome.

While the Prompt API offers the most flexibility, it won’t necessarily deliver the best results and may not deliver sufficient quality in some cases. That’s why we believe that task-specific APIs (such as a Translation API) paired with fine tuning or expert models, will deliver significantly better results. Our hope for the Prompt API is that it helps accelerate the discovery of compelling use cases to inform a roadmap of task-specific APIs.

Timing

The Prompt API is available, behind an experimental flag, from Chrome 127+ for desktop.

Requirements

Our Built-in AI program is currently focused on desktop platforms. In addition, the following conditions are required for Chrome to download and run Gemini Nano.

Aspect

Windows

MacOS

Linux

OS version

10, 11

≥ 13 (Ventura)

Not specified

Storage

At least 22 GB on the volume that contains your Chrome profile.

Note that the model requires a lot less storage, it’s just for the sake of having an ample storage margin.

GPU

Integrated GPU, or discrete GPU (e.g. video card).

Video RAM

4 GB (minimum)

Network connection

A non-metered connection

🚧

These are not necessarily the final requirements for Gemini Nano in Chrome.

We are interested in feedback about performance aspects to further improve the user experience, and adjust the requirements.

Not yet supported: 

  • Chrome for Android
  • Chrome for iOS
  • Chrome for ChromeOS

Setup

Prerequisites

  1. Acknowledge Google’s Generative AI Prohibited Uses Policy.
  2. Download Chrome Dev channel (or Canary channel), and confirm that your version is equal or newer than 128.0.6545.0.
  3. Check that your device meets the requirements.

Enable Gemini Nano and the Prompt API

Follow these steps to enable Gemini Nano and the Prompt API flags for local experimentation:

  1. Open a new tab in Chrome, go to chrome://flags/#optimization-guide-on-device-model
  2. Select Enabled BypassPerfRequirement
  1. Go to chrome://flags/#prompt-api-for-gemini-nano
  2. Select Enabled
  3. Relaunch Chrome.

Confirm availability of Gemini Nano

  1. Open DevTools and send await window.ai.canCreateTextSession(); in the console. If this returns “readily”, then you are all set.
  1. Open a new tab in Chrome, go to chrome://components 
  2. Confirm that Gemini Nano is either available or is being downloaded
  1. Once the model has downloaded and has reached a version greater than shown above, open DevTools and send await window.ai.canCreateTextSession(); in the console. If this returns “readily”, then you are all set.

API overview

Sample code

At once version

// Start by checking if it's possible to create a session based on the availability of the model, and the characteristics of the device.

const canCreate = await window.ai.canCreateTextSession();

// canCreate will be one of the following:

// * "readily": the model is available on-device and so creating will happen quickly

// * "after-download": the model is not available on-device, but the device is capable,

//   so creating the session will start the download process (which can take a while).

// * "no": the model is not available for this device.

if (canCreate !== "no") {

  const session = await window.ai.createTextSession();

  // Prompt the model and wait for the whole result to come back.  

  const result = await session.prompt("Write me a poem");

  console.log(result);

}

Streaming version

const canCreate = await window.ai.canCreateTextSession();

if (canCreate !== "no") {

  const session = await window.ai.createTextSession();

  // Prompt the model and stream the result:

  const stream = session.promptStreaming("Write me an extra-long poem");

  for await (const chunk of stream) {

    console.log(chunk);

  }

}

Session options

Each session can be customized with topK and temperature. The default values for these parameters are returned from window.ai.defaultTextSessionOptions()

const defaults = await window.ai.defaultTextSessionOptions();

const session = await window.ai.createGenericSession(

    {

      temperature: 0.6, 

      topK: defaults.topK

    }

);

Terminating a session

Call destroy() to free resources if you no longer need a session. When a session is destroyed, it can no longer be used, and any ongoing execution will be aborted. You may want to keep the session around if you intend to prompt the model often since creating a session can take some time.

await session.prompt(`

  You are a friendly, helpful assistant specialized in clothing choices.

`);

session.destroy();

// The promise will be rejected with an error explaining that the session is destroyed.

await session.prompt(`

  What should I wear today? It's sunny and I'm unsure between a t-shirt and a polo.

`);

Control sequence

The sequence of characters <ctrl23> is Gemini Nano’s control sequence. It helps the model understand rounds of simulated interactions and when it’s supposed to respond. The sequence is meant to be used directly in the prompt (see the Prompt 101 section for details).

💡

Currently, the control sequence <ctrl23> is automatically added by the Prompt API at the end of the prompt.

However, for one shot* or few shots** prompting, it’s better to add the control sequence at the end of every simulated round of interaction (shot). See these examples.

*One shot: a prompt where the model is provided with a single example to learn from and generate a response.

**Few shots: a prompt where the model is provided with a few examples, typically ranging from two to five, to help it generate a more accurate response.

Exceptions

The Prompt API may receive errors from the AI runtime. See this section for a list of possible errors, and how they are mapped into DOMExceptions.

Caveats

canCreateTextSession()’s “after-download” state

The “after-download” state and behavior is not supported. The API doesn’t trigger a download of the model. Instead, Chrome will trigger the download, either as part of the chrome://flags state change, or because of another on-device AI feature.

Streaming

Currently, promptStreaming() returns a ReadableStream whose chunks successively build on each other.

For example, the following code logs a sequence, such as "Hello," "Hello world," "Hello world I am," "Hello world I am an AI."

for await (const chunk of stream) {

  console.log(chunk);

}

This is not the desired behavior. We intend to align with other streaming APIs on the platform, where the chunks are successive pieces of a single long stream. This means the output would be a sequence like "Hello", " world", " I am", " an AI".

For now, to achieve the intended behavior, you can implement the following:

let result = '';

let previousLength = 0;

for await (const chunk of stream) {

  const newContent = chunk.slice(previousLength);
 
console.log(newContent);

  previousLength = chunk.length; 

  result += newContent;

}

console.log(result);

Session persistence and cloning

🆕

New: session persistence

Chrome version: 128.0.6606.0 and above

From Chrome 128.0.6606.0, information sent to the model during a session is now retained.

  • You no longer need to re-send the whole conversation each time. In other words, each call to prompt() or promptStreaming() are part of a continuous interaction.
  • Be aware that the context window will keep filling up with each prompt, and will eventually start evicting the oldest tokens.
  • If you want to start from scratch, you may want to call destroy() and/or create a new session.

Session cloning is not yet implemented, stay tuned.

Only for Chrome versions strictly prior to 128.0.6606.0:

Although there is an object called AITextSession, persistence has not yet been implemented. Any information sent to the model is not retained, which means you must re-send the whole conversation each time. In other words, each call to prompt() or promptStreaming() is independent.

Once persistence is implemented, we intend to allow session cloning. This means it will be possible to set a session with a baseline context and prompt the model without needing to reiterate.

Incognito and guest mode

Our implementation of the Prompt API doesn’t currently support the Incognito mode, nor the guest mode. This is due to a dependency in the AI runtime layer, and not intended to be a permanent limitation.

Enterprise

This API will not work if GenAILocalFoundationalModelSettings is set to “Do not download model”.

Workers

As of Jun 18, 2024, the Prompt API is available in Dedicated Workers, Shared Workers, and Service Workers. For the last two types of workers, you will need a recent Chrome Canary or Dev version.

Prompt 101

To write the best prompts, we recommend reading the following:

In addition, here are our recommendations for Gemini Nano in Chrome.

DOs

DONTs

Include examples to guide your prompt. One example, or one shot prompting, is better than no examples. Few shot prompting is best in guiding the model to return your intended results.

Avoid use cases with right or wrong answers.
The model might be too small to answer correctly knowledge questions, and may also struggle with tasks that depend on getting the perfect answer. Design your feature or UX with these imperfections in mind.

Add rules. You can improve the quality of the output by adding rules in your prompt.

Examples: “Respond in a friendly tone”, “Use the category ‘ambiguous’ if it’s unclear”, “Do not provide any explanation.”

Avoid use cases that bypass the user. LLMs may not always have a satisfying response. So, it’s better to position your AI feature as a tool to support the user in their tasks (e.g. generating a list of potential keywords that the user can quickly review and adjust).

Add a persona. This can help the model return something that’s more aligned to your needs.

Example: “You just bought a product, and are writing a review for said product. Please rephrase [...]”

Avoid customizing the parameters. While it’s possible to change the parameters (such as temperature and topK), we strongly recommend that you keep the default values unless you’ve run out of ideas with the prompt or you need the model to behave differently.

(Non-English)

Specify the output language. If you prompt the model in English but need a non-English response, specify the output language in the prompt.

Example: “Please rephrase the following sentence in Spanish. Hola, cómo estás”.

Use AI capabilities responsibly. Generative AI models can help you or your users be more productive, enhance creativity and learning. We expect you to use and engage with this technology in accordance with Google’s Generative AI Prohibited Uses Policy.

How to use the control sequence <ctrl23>

The control sequence helps the model understand rounds of simulated interactions and when it’s supposed to respond. It consists of the regular characters < c t r l 2 3 > . It’s not a special code that needs to be inserted by hitting various modifier keys.

One shot prompt example

For example, let’s say you want to retrieve a summary of an article. You can share an example of the full text of one article and the expected output, along with the control sequence. Then, include your full article and the requested output:

[Full text of example article]

A paraphrased summary of this article is: [example summary]<ctrl23>

[Full text of article to be summarized]

A paraphrased summary of this article is:”

Few shots prompt

With more examples, it’s likely the model can share a result closer to your expectations. Here is a few shot example to summarize an article:

[Full text of example article #1]

A paraphrased summary of this article is: [example summary #1]<ctrl23>

[Full text of example article #2]

A paraphrased summary of this article is: [example summary #2]<ctrl23>

[Full text of example article #3]

A paraphrased summary of this article is: [example summary #3]<ctrl23>

[Full text of article to be summarized]

A paraphrased 3summary of this article is:”

Share your feedback

Surveys

We’ll send surveys on an ongoing basis to get a sense of how the APIs and the task-specific APIs approach is working, to collect signals about popular use cases, issues, etc.

Feedback form for quality or technical issues

If you experience quality or technical issues, consider sharing details. Your reports will help us refine and improve our models, APIs, and components in the AI runtime layer, to ensure safety and responsible use.

Feedback about Chrome’s behavior / implementation of the Prompt API

If you want to report bugs or other  issues related to Chrome’s behavior / implementation of the Prompt API, provide as many details as possible (e.g. repro steps) in a public chromium bug report.

Feedback about the APIs

If you want to report ergonomic issues or other problems related to one of the built-in AI APIs, see if there is any related issue first and if not then file a public spec issue:

Other feedback

For other questions or issues, reach out directly by sending an email to the mailing list owners (chrome-ai-dev-preview+owners@chromium.org). We’ll do our best to be as responsive as possible or update existing documents when more appropriate (such as adding to the FAQ section).

FAQ

Participation in the Early Preview Program

Opt-out and unsubscribe

To opt-out from the Early Preview Program, simply send an email to:

Opt-in

If you know someone who would like to join the program, ask them to fill out this form and that they communicate their eagerness to provide feedback when answering the last question of the survey!

Compatibility issue on macOS

The use of Rosetta to run the x64 version of Chromium on ARM is neither tested nor maintained, and unexpected behavior will likely result. Please check that all tools that spawn Chromium are ARM-native.

Output quality

The current implementation of the Prompt API is primarily for experimentation and may not reflect the final output quality when integrated with task APIs we intend to ship.

That said, if you see the model generates harmful content or problematic responses, we encourage you to share feedback on output quality. Your reports are invaluable in helping us refine and improve our models, APIs, and components in the AI runtime layer, to ensure safety and responsible use.

Is there a way to know the token length of the input prompt ?

Not at the moment. We acknowledge that this is inconvenient. For Latin languages, consider using this rule of thumb: one token is about four characters long.

What is the length of the context window?

The default context window is set to 1024 tokens. For Gemini Nano, the theoretical maximum is 32k, but there is a tradeoff between context size and performance.

What happens when the number of tokens in the prompt exceeds the context window?

The current design only considers the last N tokens in a given input. Consequently, providing too much text in the prompt, may result in the model ignoring the beginning portion of the prompt. For example, if the prompt begins with "Translate the following text to English:...", preceded by thousands of lines of text, the model may not receive the beginning of the prompt and thus fail to provide a translation.

Some prompts stop when using stream execution.

Let us know if you can reproduce the issue: prompt, session options, details about the device, etc. In the meantime, restart Chrome and try again.

Last resort troubleshooting

Alternative steps

Some participants have reported that the following steps helped them get the component to show up:

Model download delay

The browser may not start downloading the model right away. If your computer fulfills all the requirements and you don't see the model download start on chrome://components after calling window.ai.createTextSession(), and Optimization Guide On Device Model shows version 0.0.0.0 / New, leave the browser open for a few minutes to wait for the scheduler to start the download.

Debug logs

If everything fails:

  1. Open a new tab
  2. Go to chrome://gpu
  3. Download the gpu report
  4. Go to chrome://histograms/#OptimizationGuide.ModelExecution.OnDeviceModelInstallCriteria.AtRegistration.DiskSpace
  1. Download the histograms report
  2. Share both reports (in full) with the Early Preview Program coordinators.

Crash logs

If you encounter a crash error message such as “the model process crashed too many times for this version.”, then we’ll need a crash ID to investigate the root causes.

  1. Ensure that you have enabled crash reporting.
  2. Reproduce the issue.
  3. Go to chrome://crashes
  4. Find the most recent crash
  1. Hit Send now if necessary.
  2. Then wait and reload until the Status line changes from Not uploaded to Uploaded. Note that this could take a while.
  1. Copy the ID next to the Uploaded Crash Report ID  line.
  2. Share the ID with the Early Preview Program coordinators.

In the meantime, you’ll need to wait for a new Chrome version to get another chance of trying the Prompt API.

Appendix

Full API surface

The full API surface is described below. See Web IDL for details on the language.

partial interface WindowOrWorkerGlobalScope {

  readonly attribute AI ai;

}

[Exposed=(Window,Worker)]

interface AI {

  Promise<AIModelAvailability> canCreateTextSession();

  Promise<AITextSession> createTextSession(
     optional AITextSessionOptions options = {});

  Promise<AITextSessionOptions> defaultTextSessionOptions();

};

[Exposed=(Window,Worker)]

interface AITextSession {

  Promise<DOMString> prompt(DOMString input);

  ReadableStream promptStreaming(DOMString input);

  undefined destroy();

  AITextSession clone(); // Not yet implemented (see persistence and cloning for context)

};

dictionary AITextSessionOptions {

  [EnforceRange] unsigned long topK;

  float temperature;

};

enum AIModelAvailability { "readily", "after-download", "no" };

Full list of exceptions

Methods

DOMExceptions

Error messages

Comments

All methods

InvalidStateError

The execution context is not valid.

The JS context is invalid (e.g. a detached iframe).

Remedy: ensure the API is called from a valid JS context.

createTextSession, defaultTextSessionOptions

OperationError

Model execution service is not available.

Remedy: try again, possibly after relaunching Chrome.

When streaming a response

UnknownError

An unknown error occurred: <error code>

Remedy: retry, possibly after relaunching Chrome.

Report a technical issue if you are stuck and/or can easily reproduce the problem.

NotSupportedError

The request was invalid.

UnknownError

Some other generic failure occurred.

NotReadableError

The response was disabled.

AbortError

The request was canceled.

prompt, promptStreaming

InvalidStateError

The model execution session has been destroyed.

This happens when calling prompt or promptStreaming after the session has been destroyed.

Remedy: create a new session.

createTextSession

NotSupportedError

Initializing a new session must either specify both topK and temperature, or neither of them.

This happens when the optional parameters are partially specified when calling createTextSession.

 

Remedy: You need to specify both topK and temperature parameters, or none at all, when calling createTextSession.  

Use the values from defaultTextSessionOptions if you only want to change the value of one parameter.

InvalidStateError

The session cannot be created.

If canCreateTextSession returned "readily" then this should not happen.

Remedy: retry, possibly after relaunching Chrome.

Report a technical issue if you are stuck and/or can easily reproduce the problem.

Other updates

Links to all previous updates and surveys we’ve sent can be found in The Context Index also available via goo.gle/chrome-ai-dev-preview-index

Changelog

Date

Changes

Jun 28, 2024

  • Added screenshot for OptimizationGuide.ModelExecution.OnDeviceModelInstallCriteria.AtRegistration.DiskSpace histogram and deeplinked it.

May 29, 2024

May 30, 2024

  • Added a note about storage footprint.
  • Added a note about Worker support being limited to Dedicated Workers.

May 31, 2024

  • Added a note about setting the system language to English (US) in the setup section (temporary requirement).
  • Minor editorial changes to the flags URLs, the name of the Bypass Perf Requirements option, and the name of the Prompt API flag.
  • Changed the target date to May 31st to reflect the latest estimate, although it’s still May 30th in Baker Islands as of the time of writing.
  • Changed “yes” to “readily” to reflect the actual implementation.
  • Added an explanation for BypassPerfRequirements.

Jun 1, 2024

  • In the setup steps, clarified that the version can be equal or newer than 128.0.6545.0.

Jun 3, 2024

  • Added chrome://histograms step in the troubleshooting guide.
  • Added a reminder that prerequisites must be met, in particular the 22 GB storage space requirement.

Jun 6, 2024

  • Added a note about guest mode not being supported.
  • Bumped the minimum storage requirements to account for some OS reporting the remaining storage space in gibibytes.

Jun 9, 2024

  • Splitting the sample code into the at-once version, and the streaming version.
  • Added a code sample for setting the temperature and topK.
  • Added a note about why keeping a session around is a good idea when the model is expected to be prompted often.

Jun 18, 2024

  • Striked out the OS / Chrome language requirement.
  • Updated support information for workers.
  • Added a known issue in the troubleshooting section.
  • Added a regression sidebar in the setup section.

Jun 25, 2024

  • Updated ETA for regression fix making it to the dev channel: Jun 26, 2024

Jun 26, 2024

  • Updated latest status for the regression (fixed in latest Canary and Dev channels).

Jun 27, 2024

  • Updated minimal version to be greater than the 128 release with the regression fix.
  • Removed the regression sidebar table since it’s now resolved in the latest 128 version for both Dev and Canary.

Jun 28, 2024

  • Added a requirement for the network connection type.
  • Updated the steps to take into a recent change requiring the use of await window.ai.createTextSession(); to register the desire to use the Prompt API as a condition to trigger the model download.

Jun 29, 2024

  • Added a troubleshooting sub-section for reporting crash issues.

Jul 1, 2024

  • Added alternative steps in the troubleshooting section based on a report by a participant.

Jul 9, 2024

  • Added a note that the disk storage requirement is tied to where the user profile lives.

Jul 11, 2024

  • Added “Other updates” section with a link to an index document pointing to all docs and surveys we’ve sent thus far.
  • Added a note about what to do upon encountering the “too many crashes for this version” error.

Jul 19, 2024

  • Added a note about support for session persistence from Chrome 128.0.6602.0.
  • Added links for reporting implementation issues, and spec issues.

Jul 25, 2024

  • Updated status for session persistence.
  • Noted a fix for the “BAD MESSAGE” crasher.