Get started using Gemini chat models in LangChain.
Access Google’s Generative AI models, including the Gemini family, directly via the Gemini API or experiment rapidly using Google AI Studio. This is often the best starting point for individual developers.For information on the latest models, model IDs, their features, context windows, etc. head to the Google AI docs.
API ReferenceFor detailed documentation of all features and configuration options, head to the ChatGoogleGenerativeAI API reference.
To access Google AI models you’ll need to create a Google Account, get a Google AI API key, and install the langchain-google-genai integration package.
messages = [ ( "system", "You are a helpful assistant that translates English to French. Translate the user sentence.", ), ("human", "I love programming."),]ai_msg = model.invoke(messages)ai_msg
[{'type': 'text','text': "J'adore la programmation.",'extras': {'signature': '...'}}]
Message content shapeGemini 3 series models will always return a list of content blocks to capture thought signatures. Use the .text property to recover string content.
Certain models (such as gemini-2.5-flash-preview-tts) can generate audio files.See more information on the Gemini API docs for details.
Copy
Ask AI
from langchain_google_genai import ChatGoogleGenerativeAI, Modalitymodel = ChatGoogleGenerativeAI(model="gemini-2.5-flash-preview-tts")response = model.invoke( "Please say The quick brown fox jumps over the lazy dog", generation_config=dict(response_modalities=[Modality.AUDIO]),)# Base64 encoded binary data of the audiowav_data = response.additional_kwargs.get("audio")with open("output.wav", "wb") as f: f.write(wav_data)
from langchain.tools import toolfrom langchain.messages import HumanMessagefrom langchain_google_genai import ChatGoogleGenerativeAI# Define the tool@tool(description="Get the current weather in a given location")def get_weather(location: str) -> str: return "It's sunny."# Initialize and bind (potentially multiple) tools to the modelmodel_with_tools = ChatGoogleGenerativeAI(model="gemini-3-pro-preview").bind_tools([get_weather])# Step 1: Model generates tool callsmessages = [HumanMessage("What's the weather in Boston?")]ai_msg = model_with_tools.invoke(messages)messages.append(ai_msg)# Check the tool calls in the responseprint(ai_msg.tool_calls)# Step 2: Execute tools and collect resultsfor tool_call in ai_msg.tool_calls: # Execute the tool with the generated arguments tool_result = get_weather.invoke(tool_call) messages.append(tool_result)# Step 3: Pass results back to model for final responsefinal_response = model_with_tools.invoke(messages)final_response
The json_schema method is recommended for better reliability as it constrains the model’s generation process directly rather than relying on post-processing tool calls.
Access token usage information from the response metadata.
Copy
Ask AI
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-3-pro-preview")result = model.invoke("Explain the concept of prompt engineering in one sentence.")print(result.content)print("\nUsage Metadata:")print(result.usage_metadata)
Copy
Ask AI
Prompt engineering is the art and science of crafting effective text prompts to elicit desired and accurate responses from large language models.Usage Metadata:{'input_tokens': 10, 'output_tokens': 24, 'total_tokens': 34, 'input_token_details': {'cache_read': 0}}
To see a thinking model’s thoughts, set include_thoughts=True to have the model’s reasoning summaries included in the response.
Copy
Ask AI
from langchain_google_genai import ChatGoogleGenerativeAIllm = ChatGoogleGenerativeAI( model="gemini-3-pro-preview", thinking_budget=1024, include_thoughts=True, )response = llm.invoke("How many O's are in Google? How did you verify your answer?")reasoning_score = response.usage_metadata["output_token_details"]["reasoning"]print("Response:", response.content)print("Reasoning tokens used:", reasoning_score)
Thought signatures are encrypted representations of the model’s reasoning processes. They enable Gemini to maintain thought context across multi-turn conversations, since the API is stateless and treats each request independently.
Gemini 3 may raise 4xx errors if thought signatures are not passed back with tool call responses. Upgrade to langchain-google-genai >= 3.1.0 to ensure this is handled correctly.
Signatures appear in two places in AIMessage responses:
Text blocks: Stored in extras.signature within the content block
Tool calls: Stored in additional_kwargs["__gemini_function_call_thought_signatures__"]
Copy
Ask AI
from langchain_google_genai import ChatGoogleGenerativeAIllm = ChatGoogleGenerativeAI( model="gemini-3-pro-preview", include_thoughts=True,)response = llm.invoke("How many O's are in Google? How did you verify your answer?")response.content_blocks[-1]# -> {"type": "text", "text": "...", "extras": {"signature": "EtgVCt..."}}
For multi-turn conversations with tool calls, you must pass the full AIMessage back to the model so signatures are preserved. This happens automatically when you append the AIMessage to your messages list:
Copy
Ask AI
from langchain.tools import toolfrom langchain.messages import HumanMessagefrom langchain_google_genai import ChatGoogleGenerativeAI@tooldef get_weather(location: str) -> str: """Get current weather for a location.""" return f"Weather in {location}: sunny, 22°C"model = ChatGoogleGenerativeAI(model="gemini-3-pro-preview").bind_tools([get_weather])messages = [HumanMessage("What's the weather in Tokyo?")]# Step 1: Model returns tool call with thought signature attachedai_msg = model.invoke(messages)messages.append(ai_msg) # Preserves thought signature# Step 2: Execute tool and add resultfor tool_call in ai_msg.tool_calls: result = get_weather.invoke(tool_call) messages.append(result)# Step 3: Model receives signature back, continues reasoning coherentlyfinal_response = model.invoke(messages)
Don’t reconstruct messages manually. If you create a new AIMessage instead of passing the original object, the signatures will be lost and the API may reject the request.
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-3-pro-preview")model_with_search = model.bind_tools([{"google_search": {}}])response = model_with_search.invoke("When is the next total solar eclipse in US?")response.content_blocks
Copy
Ask AI
[{'type': 'text', 'text': 'The next total solar eclipse visible in the contiguous United States will occur on...', 'annotations': [{'type': 'citation', 'id': 'abc123', 'url': '<url for source 1>', 'title': '<source 1 title>', 'start_index': 0, 'end_index': 99, 'cited_text': 'The next total solar eclipse...', 'extras': {'google_ai_metadata': {'web_search_queries': ['next total solar eclipse in US'], 'grounding_chunk_index': 0, 'confidence_scores': []}}}, {'type': 'citation', 'id': 'abc234', 'url': '<url for source 2>', 'title': '<source 2 title>', 'start_index': 0, 'end_index': 99, 'cited_text': 'The next total solar eclipse...', 'extras': {'google_ai_metadata': {'web_search_queries': ['next total solar eclipse in US'], 'grounding_chunk_index': 1, 'confidence_scores': []}}}]}]
Gemini models have default safety settings that can be overridden. If you are receiving lots of 'Safety Warnings' from your models, you can try tweaking the safety_settings attribute of the model. For example, to turn off safety blocking for dangerous content, you can construct your LLM as follows:
Context caching allows you to store and reuse content (e.g., PDFs, images) for faster processing. The cached_content parameter accepts a cache name created via the Google Generative AI API.
Single file example
This caches a single file and queries it.
Copy
Ask AI
import timefrom google import genaifrom google.genai import typesfrom langchain.messages import HumanMessagefrom langchain_google_genai import ChatGoogleGenerativeAIclient = genai.Client()# Upload filefile = client.files.upload(file="path/to/your/file")while file.state.name == "PROCESSING": time.sleep(2) file = client.files.get(name=file.name)# Create cachemodel = "gemini-3-pro-preview"cache = client.caches.create( model=model, config=types.CreateCachedContentConfig( display_name="Cached Content", system_instruction=( "You are an expert content analyzer, and your job is to answer " "the user's query based on the file you have access to." ), contents=[file], ttl="300s", ),)# Query with LangChainllm = ChatGoogleGenerativeAI( model=model, cached_content=cache.name,)message = HumanMessage(content="Summarize the main points of the content.")llm.invoke([message])
Multiple files example
This caches two files using Part and queries them together.
Copy
Ask AI
import timefrom google import genaifrom google.genai.types import CreateCachedContentConfig, Content, Partfrom langchain.messages import HumanMessagefrom langchain_google_genai import ChatGoogleGenerativeAIclient = genai.Client()# Upload filesfile_1 = client.files.upload(file="./file1")while file_1.state.name == "PROCESSING": time.sleep(2) file_1 = client.files.get(name=file_1.name)file_2 = client.files.upload(file="./file2")while file_2.state.name == "PROCESSING": time.sleep(2) file_2 = client.files.get(name=file_2.name)# Create cache with multiple filescontents = [ Content( role="user", parts=[ Part.from_uri(file_uri=file_1.uri, mime_type=file_1.mime_type), Part.from_uri(file_uri=file_2.uri, mime_type=file_2.mime_type), ], )]model = "gemini-3-pro-preview"cache = client.caches.create( model=model, config=CreateCachedContentConfig( display_name="Cached Contents", system_instruction=( "You are an expert content analyzer, and your job is to answer " "the user's query based on the files you have access to." ), contents=contents, ttl="300s", ),)# Query with LangChainllm = ChatGoogleGenerativeAI( model=model, cached_content=cache.name,)message = HumanMessage( content="Provide a summary of the key information across both files.")llm.invoke([message])
See the Gemini API docs on context caching for more information.