Introduction
Massive Language Fashions are quickly reworking industries—immediately, they energy all the pieces from personalised customer support in banking to real-time language translation in international communication. They’ll reply questions in pure language, summarize info, write essays, generate code, and rather more, making them invaluable instruments in immediately’s world. However regardless of their many benefits, they endure from a vital flaw often called “hallucination”. These are situations when the mannequin generates info that seems to be right and sensible however is both partially or completely false, made up by the mannequin and lacks any grounding in real-world knowledge. Thus to deal with this, Google has developed an open mannequin, a instrument referred to as DataGemma to attach LLMs with real-world knowledge and fact-check their responses with trusted sources utilizing Google’s Information Commons.
Studying Outcomes
- Perceive the fundamentals of Massive Language Fashions (LLMs) and their functions.
- Discover the causes and kinds of hallucinations in LLMs.
- Learn the way Google’s DataGemma tackles LLM hallucinations utilizing real-world knowledge.
- Achieve insights into superior methods like Retrieval-Interleaved Technology (RIG) and Retrieval Augmented Technology (RAG).
- Uncover how Google’s Information Commons improves LLM factual accuracy.
This text was printed as part of the Information Science Blogathon.
Understanding Massive Language Fashions
Massive Language Fashions are basis fashions, skilled on large quantities of textual knowledge with parameters starting from tens of millions to billions, that may perceive and generate pure language. They’re constructed on a transformer structure that enables processing and producing pure language. An LLM mannequin may be fine-tuned for particular duties in particular domains through the use of custom-made datasets. For instance, an LLM mannequin like BERT may be fine-tuned on cybersecurity corpora to automate risk intelligence utilizing LLMs. Some common LLM fashions are GPT-4 by OpenAI, BERT and Gemini by Google, LLaMA by Meta, Claude by Anthropic and many others.
Comparability of Gemma , Gemini and BERT
GEMMA | GEMINI | BERT |
Light-weight mannequin for builders | Bigger and extra highly effective, conversational AI | Pre-trained mannequin for NLP duties |
Very best for functions with useful resource constraints like cell phones & edge computing | Very best for advanced duties with no useful resource constraints like large-scale knowledge evaluation, advanced AI functions. | Very best for duties like textual content classification, query answering, sentiment evaluation. |
Simple to deploy in restricted assets surroundings | Typically deployed in cloud environments or knowledge facilities with ample assets. | Deployed each on-premise or in cloud environments, however bigger variations (like BERT-Massive) require vital computational assets |
Requires much less computational assets | Typically requires extra computational assets. | Smaller fashions like BERT-Base may be deployed on average {hardware}, whereas bigger fashions like BERT-Massive may have extra assets, however nonetheless lower than Gemini. |
Understanding Structure of Gemma
The structure of Gemma is designed to seamlessly combine superior retrieval and technology methods, permitting the system to intelligently entry exterior knowledge sources whereas producing correct, coherent responses, making it extremely efficient for varied AI-driven functions.
Gemma relies on the transformer decoder structure:

Gemma and Gemma 2 (the newest model launched in 2024) belong to the Gemma household of Google’s LLM fashions. They are often fine-tuned for custom-made duties. For instance: CodeGemma fashions are fine-tuned Gemma fashions for code completion.
What are Hallucinations in Context of LLMs?
Hallucinations in LLMs are situations the place the mannequin confidently generates output which is inaccurate, inconsistent or made up info however it seems plausible to us. The mannequin hallucinates content material and that content material is definitely not true. For instance: in a court docket case, two attorneys cited sources offered by ChatGPT which turned out to be false.
AI Hallucinations may be of three varieties
- Enter conflicting hallucinations: The mannequin generates an output that deviates from the data offered by the consumer within the enter.
- Context conflicting hallucinations: Right here, the mannequin generates an output contradicting it’s beforehand generated outputs.
- Truth-conflicting hallucinations: Mannequin generates false/inaccurate output that contradicts with real-world data or details.
What Causes Hallucinations?
- Restricted coaching knowledge: When the mannequin hasn’t been skilled totally or is skilled on restricted knowledge, when it encounters a immediate totally different from it’s coaching knowledge, regardless that it didn’t perceive absolutely the brand new immediate, it would produce knowledge primarily based on it’s present coaching knowledge resulting in inaccuracies.
- Overfitting: When too many options are offered, the mannequin will attempt to seize all the information factors with out understanding the underlying patterns after which get 100% accuracy on coaching knowledge, however it gained’t generalize effectively on new knowledge.
As you may see, hallucinated LLM content material may be dangerous if used with out fact-checking. In functions the place factual accuracy is vital and there can’t be any misinformation, like medical recommendation or authorized steering, hallucinations can result in misinformation with doubtlessly critical penalties. Hallucinations are delivered as confidently as right solutions, thus it may develop into troublesome for customers to recognise it. Additionally, because the reliance on AI for correct info is rising, hallucinations can scale back belief in AI programs, making it tougher for LLMs to be accepted in high-stakes domains.
Thus, mannequin builders must deal with this downside and be certain that in instances involving accuracy and details, the LLM ought to generate right, factual output to keep away from the unfold of misinformation. One such strategy to deal with AI Hallucinations has been developed by Google within the type of DataGemma.
What’s DataGemma?
DataGemma is an open mannequin developed by Google to attach LLMs with trust-worthy, factual, real-world knowledge sourced from Google’s DataCommons.

Google Information Commons is an open repository that mixes an enormous quantity of public datasets right into a unified format, making it simpler to entry and use. It combines knowledge from quite a lot of sources, together with authorities papers, analysis organizations, and international databases. The first function of Information Commons is to supply a standard framework for varied datasets, permitting customers to question and analyze structured real-world knowledge throughout quite a few domains with out requiring expensive knowledge cleansing or integration efforts.
Key Options of Information Commons
- It contains knowledge on quite a lot of matters resembling demographics, economics, surroundings, and healthcare, sourced from locations just like the U.S. Census Bureau, World Financial institution, NOAA, and extra.
- The info is organized right into a standardized schema, so customers can simply question datasets while not having to take care of the complexities of various knowledge codecs and constructions.
- Builders can entry Information Commons by means of APIs.
- It’s a public service that’s free to make use of, designed to make high-quality, dependable knowledge accessible to everybody.
Significance of Information Commons
- Researchers can use the Information Commons to rapidly collect and analyze massive, structured datasets while not having to supply and clear the information manually.
- Massive Language Fashions (LLMs), like Google’s Gemma, can use Information Commons to reference real-world knowledge, lowering hallucinations and enhancing factual accuracy of their outputs.

Hyperlink: Construct your individual Information Commons – Information Commons
RIG: A Hybrid Strategy for Minimizing LLM Hallucinations
It is a sophisticated approach in pure language processing (NLP) that mixes retrieval-based and generation-based strategies to enhance the standard and relevance of responses.
Right here’s a quick rationalization of how RIG works:
- Retrieval-Primarily based Strategies: These strategies contain looking out a big database of pre-existing responses or paperwork to search out probably the most related info. This strategy ensures that the responses are correct and grounded in actual knowledge.
- Technology-Primarily based Strategies: These strategies use fashions to generate responses from scratch primarily based on the enter. This permits for extra versatile and inventive responses however can generally result in inaccuracies or hallucinations.
- Interleaving: By interleaving or combining retrieval and technology methods, RIG makes use of the strengths of each approaches. The system retrieves related info after which makes use of a generative mannequin to refine and broaden upon it, making certain accuracy and creativity.
That is helpful in functions the place high-quality, contextually related responses are essential, resembling in conversational AI, buyer assist, and content material creation.
In DataGemma, Gemma 2 is fine-tuned to acknowledge when to extract correct info whereas producing an output. On this, it replaces the numbers generated in output, with extra exact info from Information Commons. Thus, mainly the mannequin double-checks its output with a extra trusted supply.
How RIG is utilized in DataGemma?
In DataGemma, Retrieval-Interleaved Technology (RIG) is leveraged to boost the accuracy and relevance of outputs by combining the strengths of each retrieval and generative fashions, making certain that generated content material is grounded in dependable knowledge from trusted sources like Information Commons.

- First, the consumer submits a question to the LLM mannequin. In our case, the LLM mannequin is DataGemma, which relies on Gemma 2 mannequin with 27B parameters, fine-tuned for RIG.
- The DataGemma mannequin generates a response within the type of a pure language question. The aim of that is to retrieve related knowledge from Information Commons’ pure language interface.
- Information Commons is queried, and the required knowledge is retrieved.
- The ultimate response is generated and proven to the consumer. The response contains knowledge, the supply info together with its hyperlink, and a few metadata. This replaces the doubtless inaccurate numbers in unique response.
Step by Step Process on Google Colab
Allow us to now implement RIG for minimizing hallucination.
Pre-requisites:
- A100 GPU
- Excessive-RAM runtime
- Hugging Face Token
Step1: Login to your hugging face account and create a brand new token
Click on right here to login hugging face account.

Create New Token:


Step2: DataCommons API Key

Step3: Allow Information Commons NL API
Go to your Colab pocket book Secrets and techniques part. Create new secret and allow pocket book entry.

- HF_TOKEN with worth as your Hugging Face token
- DC_API_KEY with worth as your Information Commons token

Step4: Set up Required Libraries
Allow us to set up required libraries.
#set up the next required libraries
!pip set up -q git+https://github.com/datacommonsorg/llm-tools
!pip set up -q bitsandbytes speed up
#load the finetuned Gemma2 27B mannequin
import torch
import data_gemma as dg
from google.colab import userdata
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
# Initialize Information Commons API shopper
DC_API_KEY = userdata.get('DC_API_KEY')
dc = dg.DataCommons(api_key=DC_API_KEY)
# Get finetuned Gemma2 mannequin from HuggingFace
HF_TOKEN = userdata.get('HF_TOKEN')
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_name="google/datagemma-rig-27b-it"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HF_TOKEN)
datagemma_model = AutoModelForCausalLM.from_pretrained(model_name,
device_map="auto",
quantization_config=nf4_config,
torch_dtype=torch.bfloat16,
token=HF_TOKEN)
# Construct the LLM Mannequin stub to make use of in RIG stream
datagemma_model_wrapper = dg.HFBasic(datagemma_model, tokenizer)
Step5: Choose or Enter a Question
On this step, customers can both choose a pre-defined question or enter a customized question, enabling the system to retrieve related info from the information sources for additional processing.

Step6: Run the RIG approach and Generate Output
On this step, the RIG approach is executed, combining retrieval and technology strategies to provide a exact and contextually related output primarily based on the enter question.
from IPython.show import Markdown
import textwrap
def display_chat(immediate, textual content):
formatted_prompt = "
" + immediate + "
"
textual content = textual content.change('•', ' *')
textual content = textwrap.indent(textual content, '> ', predicate=lambda _: True)
formatted_text = "
nn" + textual content + "n"
return Markdown(formatted_prompt+formatted_text)
def to_markdown(textual content):
textual content = textual content.change('•', ' *')
return Markdown(textwrap.indent(textual content, '> ', predicate=lambda _: True))
ans = dg.RIGFlow(llm=datagemma_model_wrapper, data_fetcher=dc, verbose=False).question(question=QUERY)
Markdown(textwrap.indent(ans.reply(), '> ', predicate=lambda _: True))
display_chat(QUERY, ans.reply())
Output: (for a unique question)

Conclusion: Gemma2 generates solely a numerical worth whereas DataGemma generates the numerical worth together with its supply info, supply hyperlinks, some meta knowledge and conclusion for the question.
Supply: Google Colab pocket book offered by Google
Retrieval Augmented Technology for Minimizing LLM Hallucinations
Retrieval Augmented Technology is an strategy in pure language processing (NLP) and huge language fashions (LLMs) to enhance the factual accuracy and relevance of the generated content material by permitting the mannequin to entry exterior data sources through the technology course of. It retrieves related info from Information Commons earlier than the LLM generates textual content, offering it with a factual basis for its response.
Right here’s a quick rationalization of how RAG works:
- Retrieval: When the consumer enters a question, the mannequin receives it after which extracts the related knowledge from its data base or exterior sources.
- Augmentation: This exterior info is then used to “increase” (or improve) the enter context for the language mannequin, serving to it generate extra contextually related responses.
- Technology: The LLM generates a response primarily based on each the unique question and the retrieved info.
How RAG is Utilized in DataGemma?
In DataGemma, Retrieval-Augmented Technology (RAG) is employed to boost response accuracy by retrieving related knowledge from exterior sources after which producing content material that mixes this retrieved data with AI-generated insights, making certain high-quality and contextually related outputs.

Right here’s how RAG works:
- First, the consumer submits a question to the LLM mannequin. In our case, the LLM mannequin is DataGemma, which relies on Gemma 2 mannequin with 27B parameters, fine-tuned for RAG job.
- The DataGemma mannequin generates a response, after analyzing the enter question, within the type of a pure language question. The aim of that is to retrieve related knowledge from Information Commons’ pure language interface.
- Information Commons is queried and the required info is retrieved.
- The ultimate response is generated and proven to the consumer. This contains knowledge tables, the supply info together with its hyperlink, and a few metadata. This replaces the doubtless inaccurate numbers in unique response.
- This retrieved info is added to the unique consumer question, creating an enhanced or augmented immediate.
- A bigger LLM (in our case, Gemini 1.5 Professional) makes use of this enhanced immediate, together with the retrieved knowledge, to generate a greater, extra correct and factual response.
Step by Step Process on Google Colab
We are going to now look in to the step-by-step process of RAG for minimizing hallucinations.
Pre-requisites:
- A100 GPU
- Excessive-RAM runtime
- Hugging Face Token
- Information Commons API Token
- Gemini 1.5 Professional API Key
Step1: Create Gemini API Key
Go to Google AI studio and create Gemini API key.


Step2: Allow Pocket book Entry
Go to your Google Colab pocket book Secrets and techniques part and enter Hugging Face, Information Commons and Gemini 1.5 Professional API key. Allow Pocket book entry.

Step3: Set up the Required Libraries
On this step, you’ll set up the mandatory libraries that allow the implementation of the RIG approach and guarantee easy operation of the DataGemma system.
#set up libraries
!pip set up -q git+https://github.com/datacommonsorg/llm-tools
!pip set up -q bitsandbytes speed up
#load fine-tuned Gemma2 27B mannequin
import torch
import data_gemma as dg
from google.colab import userdata
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
# Initialize Information Commons API shopper
DC_API_KEY = userdata.get('DC_API_KEY')
dc = dg.DataCommons(api_key=DC_API_KEY)
# Get Gemini 1.5 Professional mannequin
GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')
gemini_model = dg.GoogleAIStudio(mannequin="gemini-1.5-pro", api_keys=[GEMINI_API_KEY])
# Get finetuned Gemma2 mannequin from HuggingFace
HF_TOKEN = userdata.get('HF_TOKEN')
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_name="google/datagemma-rag-27b-it"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HF_TOKEN)
datagemma_model = AutoModelForCausalLM.from_pretrained(model_name,
device_map="auto",
quantization_config=nf4_config,
torch_dtype=torch.bfloat16,
token=HF_TOKEN)
# Construct the LLM Mannequin stub to make use of in RAG stream
datagemma_model_wrapper = dg.HFBasic(datagemma_model, tokenizer)
Step4: Choose or Create Your Personal Question
You’ll choose or create a customized question that may function the enter for the RIG approach to retrieve and generate the specified output.

Step5: Run RAG and generate the output
Now you’ll execute the RAG system to retrieve related knowledge and generate the ultimate output primarily based on the question you offered.
from IPython.show import Markdown
import textwrap
def display_chat(immediate, textual content):
formatted_prompt = "
" + immediate + "
"
textual content = textual content.change('•', ' *')
textual content = textwrap.indent(textual content, '> ', predicate=lambda _: True)
formatted_text = "
nn" + textual content + "n"
return Markdown(formatted_prompt+formatted_text)
def to_markdown(textual content):
textual content = textual content.change('•', ' *')
return Markdown(textwrap.indent(textual content, '> ', predicate=lambda _: True))
ans = dg.RAGFlow(llm_question=datagemma_model_wrapper, llm_answer=gemini_model, data_fetcher=dc).question(question=QUERY)
Markdown(textwrap.indent(ans.reply(), '> ', predicate=lambda _: True))
display_chat(QUERY, ans.reply())
Output:


Conclusion: When a question is requested, the related knowledge tables associated to the question are retrieved after which this knowledge is used to compose the ultimate response with significant info and insights. The question response together with supply hyperlinks, tables, and conclusion is generated as output.
Hyperlink: Information Gemma RAG
Why is DataGemma Essential?
DataGemma grounds LLM outputs in real-world knowledge, making certain that the mannequin generates fact-based responses. By fact-checking the mannequin’s responses with verified knowledge from Google’s Information Commons, DataGemma helps scale back the variety of incorrect or fabricated solutions. Utilizing the RIG and RAG approaches, researchers at Google have noticed vital enchancment within the accuracy of output generated by the mannequin, particularly in coping with queries that require numerical outputs.
They’ve noticed that customers favor the output generated by RIG and RAG greater than the baseline output. This strategy can scale back AI hallucinations, it may scale back the technology of misinformation. Additionally, since Google has made this Gemma mannequin variant open mannequin, it may be utilized by builders and researchers to discover this strategy and improve it additional to realize the frequent purpose of constructing LLMs extra dependable and reliable.
Conclusion
LLMs have develop into very important instruments throughout industries, however their tendency to “hallucinate”—producing convincing however incorrect info—poses a major problem. Google’s DataGemma, when mixed with the huge real-world knowledge of Google’s Information Commons, gives a potential answer to this downside. The methods in DataGemma enhance accuracy, significantly with numerical info, by basing LLM outputs on validated statistical knowledge. It additionally decreases misinformation. Early outcomes present that this technique significantly will increase the credibility of AI responses, with customers preferring the extra factual outputs given by the system. As a result of DataGemma is an open mannequin, researchers and builders could make use of it and enhance it, bringing LLMs nearer to changing into dependable instruments for real-world functions. Collaboration may also help scale back hallucinations and enhance trustworthiness.
References
Often Requested Questions
A. A basis mannequin is a big machine studying mannequin skilled on large quantities of various knowledge, enabling it to generalize throughout a variety of duties. LLMs are a sort of basis fashions skilled on huge quantities of textual knowledge.
A. AI hallucination refers back to the phenomenon the place an AI mannequin generates info that appears correct however is inaccurate or fabricated. The mannequin produces responses that lack grounding in real-world knowledge or details.
A. LLMs hallucinate as a result of they generate outputs primarily based on patterns within the knowledge they’ve been skilled on. Once they don’t have sufficient context or related knowledge to reply a question, they could fabricate plausible-sounding info as a substitute of admitting uncertainty, primarily based on related knowledge present in it’s present data base.
A. Google Gemma is a lightweight LLM mannequin of Google primarily based on Google Gemini’s analysis. A variant of Gemma is DataGemma which is an open mannequin developed to attach LLMs with real-world statistical knowledge from Google’s Information Commons.
A. RIG integrates real-world statistical knowledge instantly into the mannequin’s output by checking generated responses towards exterior knowledge sources, resembling Google Information Commons. So mainly response is generated after which it’s fact-checked with exterior sources. However in RAG, it retrieves related info from exterior databases or data sources after which generates responses primarily based on this info.
The media proven on this article will not be owned by Analytics Vidhya and is used on the Creator’s discretion.