Automate Weblog To Twitter Thread

13 January 2025

8

In at this time’s digital panorama, content material repurposing has change into essential for maximizing attain and engagement. One efficient technique is remodeling long-form content material like weblog posts into participating Twitter threads. Nevertheless, manually creating these threads could be time-consuming and difficult. On this article, we’ll discover the right way to construct an software to automate weblog to Twitter thread creation utilizing Google’s Gemini-2.0 LLM, ChromaDB, and Streamlit.

Studying Goals

Automate weblog to Twitter thread transformation utilizing Google’s Gemini-2.0, ChromaDB, and Streamlit for environment friendly content material repurposing.
Achieve hands-on expertise to construct automate weblog to Twitter thread with embedding fashions and AI-driven immediate engineering.
Perceive the capabilities of Google’s Gemini-2.0 LLM for automated content material transformation.
Discover the mixing of ChromaDB for environment friendly semantic textual content retrieval.
Construct a Streamlit-based internet software for seamless PDF-to-Twitter thread conversion.
Achieve hands-on expertise with embedding fashions and immediate engineering for content material era.

This text was revealed as part of the Knowledge Science Blogathon.

What’s Gemini-2.0?

Gemini-2.0 is Google’s newest multimodal Massive Language Mannequin (LLM), representing a big development in AI capabilities. It’s now accessible as Gemini-2.0-flash-exp API in Vertext AI Studio. It provides improved efficiency in areas like:

Multimodal understanding , coding, complicated directions following and performance calling in pure language.
Context-aware content material creation.
Advanced reasoning and evaluation.
It has native picture era, picture modifying, controllable text-to-speech era.
Low-latency responses with the Flash variant.

For our undertaking, we’re particularly utilizing the gemini-2.0-flash-exp mannequin API, which is optimized for fast response whereas sustaining high-quality output.

What’s the ChromaDB Vector Database?

ChromaDB is an open-source embedding database that excels at storing and retrieving vector embeddings. It’s a high-performance database designed for environment friendly storing, looking out, and managing embeddings generated by AI fashions. It allows similarity searches by indexing and evaluating vectors based mostly on their proximity to different comparable vectors in multidimensional house.

Environment friendly comparable search capabilities
Simple integration with fashionable embedding fashions
Native storage and persistence
Versatile querying choices
Light-weight deployment

In our software, ChromaDB is the spine for storing and retrieving related chunks of textual content based mostly on semantic similarity, enabling extra contextual and correct thread era.

What’s Streamlit UI?

Streamlit is an open-source Python library designed to rapidly construct interactive and data-driven internet purposes for AI/ML initiatives. Its concentrate on simplicity allows builders to create visually interesting and purposeful apps with minimal effort.

Key Options:

Ease of Use: Builders can flip Python scripts into internet apps with a number of traces of code.
Widgets: It provides a variety of enter widgets (sliders, dropdowns, textual content inputs) to make purposes interactive.
Knowledge Visualization: It Helps integration with fashionable Python libraries like Matplotlib, Plotly, and Altair for dynamic viz.
Actual-time Updates: Robotically rerun apps when code or enter adjustments, offering a seamless consumer expertise.
No Net Improvement Required: Take away the necessity to study HTML, CSS, or Javascript.

Software of StreamLit

Streamlit is widley used for constructing bashboards, exploratory knowledge evaluation instruments, AI/ML software prototypes. Its simplicity and interactivity makes it ideally suited for speedy prototying and sharing insights with non-technical stakeholders. We’re utilizing streamlit for desiging the interface for the our software.

Motivation for Tweet Era Automation

The first motivation behind automating tweet thread era embrace:

Time effectivity: Decreasing the annual effort required to create participating Twitter threads.
Consistency: Sustaining a constant voice and format throughout all threads.
Scalability: Processing a number of article rapidly and effectively.
Enhanced engagement: Leveraging AI to create extra compelling and shareable content material.
Content material optimization: Utilizing data-driven approaches to construction threads successfully.

Challenge Environmental Setup Utilizing Conda

To arrange the undertaking surroundings, observe these steps:

#create a brand new conda env
conda create -n tweet-gen python=3.11
conda activate tweet-gen

Set up required packages

pip set up langchain langchain-community langchain-google-genai
pip set up chromadb streamlit python-dotenv pypdf pydantic

Now create a undertaking folder named BlogToTweet or no matter you want.

Additionally, create a .env file in your undertaking root. Get your GOOGLE API KEY from right here and put it within the .env file.

GOOGLE_API_KEY=""

We’re all set as much as dive into the principle implementation half.

Challenge Implementation

In our undertaking, there are 4 vital information every having its performance for higher improvement.

Companies: For placing all of the vital providers in it.
fashions: For all of the vital Pydantic knowledge fashions.
essential: For testing the automation within the terminal.
app: For Streamlit UI implementation.

Implementing Fashions

We’ll begin with implementing Pydantic knowledge fashions within the fashions.py file. What’s Pydantic? learn this.

from typing import Non-compulsory, Record
from pydantic import BaseModel

class ArticleContent(BaseModel):
    title: str
    content material: str
    writer: Non-compulsory[str]
    url: str

class TwitterThread(BaseModel):
    tweets: Record[str]
    hashtags: Record[str]

It’s a easy but vital mannequin that may give the article content material and all of the tweets a constant construction.

Implementing Companies

The ContentRepurposer handles the core performance of the appliance. Right here is the skeletal construction of that class.

# providers.py
import os
from dotenv import load_dotenv
from typing import Record
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_community.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

from fashions import ArticleContent, TwitterThread

class ContentRepurposer:
    def __init__(self, content material):
        move

    def process_pdf(self, pdf_path: str) -> ArticleContent:
        move

    def get_relevant_chunk(self, question: str, okay: int = 3) -> Record[str]:
        move

    def generate_twitter_thread(self, article: ArticleContent):
        move

    def process_article(self, pdf_path: str):
        move

Within the preliminary technique, we’ll put all vital parameters of the category

def __init__(self):
        from pydantic import SecretStr

        google_api_key = os.getenv("GOOGLE_API_KEY")
        if google_api_key is None:
            elevate ValueError("GOOGLE_API_KEY surroundings variable isn't set")
        _google_api_key = SecretStr(google_api_key)
        
        # Initialize Gemini mannequin and embeddings
        self.embeddings = GoogleGenerativeAIEmbeddings(
            mannequin="fashions/embedding-001",
        )
        self.llm = ChatGoogleGenerativeAI(

            mannequin="gemini-2.0-flash-exp",
            temperature=0.7)
        
        # Initialize textual content splitter
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            separators=["nn", "n", " ", ""]
        )

Right here, we use Pydantic SecretStr for the safe use of the API_KEY, for embedding our articles we use the GoogleGenerativeAIEmbeddings perform utilizing the embedding-001 mannequin. To create the tweets from the article we’ll use the ChatGoogleGenerativeAI perform with the most recent Gemini-2.0-flash-exp mannequin. RecursiveCharacterTextSplitter is used for splitting a big doc into components right here we cut up the doc in chunk_size 1000 with 200 character overlap.

Processing PDF

The system processes PDFs utilizing PyPDFLoader from LangChain and implements textual content chunking.

def process_pdf(self, pdf_path: str) -> ArticleContent:
        """Course of native PDF and create embeddings"""
        # Load PDF
        loader = PyPDFLoader(pdf_path)
        pages = loader.load()
        
        # Extract textual content
        textual content = " ".be a part of(web page.page_content for web page in pages)
        
        # Break up textual content into chunks
        chunks = self.text_splitter.split_text(textual content)
        
        # Create and retailer embeddings in Chroma
        self.vectordb = Chroma.from_texts(
            texts=chunks,
            embedding=self.embeddings,
            persist_directory="./knowledge/chroma_db"
        )
        
        # Extract title and writer
        traces = [line.strip() for line in text.split("n") if line.strip()]
        title = traces[0] if traces else "Untitled"
        writer = traces[1] if len(traces) > 1 else None
        
        return ArticleContent(
            title=title,
            content material=textual content,
            writer=writer,
            url=pdf_path
        )

Within the above code, we implement the PDF processing performance of the appliance.

Load and Extract PDF Textual content: The PyPDFLoader reads the PDF file and extracts the textual content content material from all pages, concatenating it right into a single string.
Break up Textual content into Chunks: The textual content is split into smaller chunks utilizing the text_splitter for higher processing and bedding creation.
Generate Embeddings: Chroma creates vector embeddings from the textual content chunks and shops them in a persistent database listing.
Extract Title and Creator: The primary non-empty line is used because the title, and the second because the writer.
Return Article Content material: Assemble an Article Content material object containing the title, full textual content, writer, and file path.

Getting the related Chunk

def get_relevant_chunks(self, question: str, okay: int = 3) -> Record[str]:
        """Retrieve related chunks from vector database"""
        outcomes = self.vectordb.similarity_search(question, okay=okay)
        return [doc.page_content for doc in results]

This code retrieves the highest okay (default 3) most related textual content chunks from the vector database based mostly on similarity to the given question.

Producing Tweet thread from Article

This technique is a very powerful as a result of right here we’ll use all of the generative AI, embedding, and prompts collectively to generate the Thread from the shopper’s PDF file.

def generate_twitter_thread(self, article: ArticleContent) -> TwitterThread:
        """Generate Twitter thread utilizing Gemini"""
        # First, get essentially the most related chunks for various facets
        intro_chunks = self.get_relevant_chunks("introduction and details")
        technical_chunks = self.get_relevant_chunks("technical particulars and implementation")
        conclusion_chunks = self.get_relevant_chunks("conclusion and key takeaways")
       
        thread_prompt = PromptTemplate(
            input_variables=["title", "intro", "technical", "conclusion"],
            template="""
            Write an enticing Twitter thread (8-10 tweets) summarizing this technical article in an approachable and human-like type.

            Title: {title}

            Introduction Context:
            {intro}

            Technical Particulars:
            {technical}

            Key Takeaways:
            {conclusion}

            Tips:
            1. Begin with a hook that grabs consideration (e.g., a stunning reality, daring assertion, or thought-provoking query).
            2. Use a conversational tone and clarify complicated particulars merely, with out jargon.
            3. Embrace concise tweets beneath 280 characters, following the 1/n numbering format.
            4. Break down the important thing insights logically, and make every tweet construct curiosity for the following one.
            5. Embrace related examples, analogies, or comparisons to assist understanding.
            6. Finish the thread with a powerful conclusion and a name to motion (e.g., "Learn the total article," "Observe for extra insights").
            7. Make it relatable, instructional, and interesting.

            Output format:
            - A numbered record of tweets, with every tweet on a brand new line.
            - After the tweets, recommend 3-5 hashtags that summarize the thread, beginning with #.
            """
        )
        
        chain = LLMChain(llm=self.llm, immediate=thread_prompt)
        end result = chain.run({
            "title": article.title,
            "intro": "n".be a part of(intro_chunks),
            "technical": "n".be a part of(technical_chunks),
            "conclusion": "n".be a part of(conclusion_chunks)
        })
        
        # Parse the end result into tweets and hashtags
        traces = end result.cut up("n")
        tweets = [line.strip() for line in lines if line.strip() and not line.strip().startswith("#")]
        hashtags = [tag.strip() for tag in lines if tag.strip().startswith("#")]
        
        # Guarantee we've a minimum of one tweet and hashtag
        if not tweets:
            tweets = ["Thread about " + article.title]
        if not hashtags:
            hashtags = ["#AI", "#TechNews"]
            
        return TwitterThread(tweets=tweets, hashtags=hashtags)

Let’s perceive what is going on within the above code step-by-step

Retrieve Related Chunks: The tactic first extracts related chunks of textual content for the introduction, technical particulars, and conclusion utilizing the get_relevant_chunks technique.
Put together a Immediate: A PromptTemplate is created with directions to jot down an enticing Twitter thread summarizing the article, together with particulars on tone, construction, and formatting tips.
Run the LLM Chain: The LLMChain is used with the LLM fashions to course of the immediate and generate a thread based mostly on the article’s title and extracted chunks.
Parse Outcomes: The generated output is cut up into tweets and hashtags, making certain correct formatting and extracting the mandatory elements.
Return Twitter Thread: The tactic returns a TwitterThread object containing the formatted tweets and hashtags.

Course of The Article

This technique processes a PDF file to extract its content material and generates a Twitter thread summarizing it. and final it should return a Twitter Thread.

def process_article(self, pdf_path: str) -> TwitterThread:
        """Fundamental technique to course of article and generate content material"""
        attempt:
            article = self.process_pdf(pdf_path)
            thread = self.generate_twitter_thread(article)
            return thread
        besides Exception as e:
            print(f"Error processing article: {str(e)}")
            elevate

Upto right here We applied all the mandatory code for this undertaking, now there are two methods we will proceed additional.

Implementing the Fundamental file for testing and
Implementing Streamlit Software for the online interface

For those who don’t need to check the appliance in terminal mode then you may skip the Fundamental file implementation and go on to the Streamlit Software implementation.

Implementing the Fundamental file for testing

Now, we put collectively all of the modules to check the appliance.

import os
from dotenv import load_dotenv
from providers import ContentRepurposer


def essential():
    # Load surroundings variables
    load_dotenv()
    google_api_key = os.getenv("GOOGLE_API_KEY")

    if not google_api_key:
        elevate ValueError("GOOGLE_API_KEY surroundings variable not discovered")

    # Initialize repurposer
    repurposer = ContentRepurposer()

    # Path to your native PDF
    # pdf_path = "knowledge/guide_to_jax.pdf"
    pdf_path = "knowledge/build_llm_powered_app.pdf"

    attempt:
        thread = repurposer.process_article(pdf_path)

        print("Generated Twitter Thread:")
        for i, tweet in enumerate(thread.tweets, 1):
            print(f"nTweet {i}/{len(thread.tweets)}:")
            print(tweet)

        print("nSuggested Hashtags:")
        print(" ".be a part of(thread.hashtags))

    besides Exception as e:
        print(f"Didn't course of article: {str(e)}")


if __name__ == "__main__":
    essential()

Right here, you may see that it merely imports all of the modules, Test the GOOGLE_API_KEY availability, initiates ContentRepuposer() class, after which within the attempt block creates a thread by calling the process_article() technique from the repurposer object. On the final, some printing strategies for tweets printing on the terminal and the Exception dealing with.

To check the appliance, create a folder named knowledge in your undertaking root and put your downloaded PDF there. To obtain the article from AnalyticsVidya, go to any article click on the obtain button, and obtain it.

Now in your terminal,

python essential.py

Instance Weblog 1 Output

Instance Weblog 2 Output

I feel you get the concept of how lovely the appliance is! Let’s make it extra aesthetically sensible.

Implementing the Streamlit APP

Now we’ll do just about the identical as above in a extra UI-centric manner.

Importing Libraries and Env Configuration

import os
import streamlit as st
from dotenv import load_dotenv
from providers import ContentRepurposer
import pyperclip
from pathlib import Path

# Load surroundings variables
load_dotenv()

# Set web page configuration
st.set_page_config(page_title="Content material Repurposer", page_icon="🐦", format="broad")

Customized CSS

# Customized CSS
st.markdown(
    """

""",
    unsafe_allow_html=True,
)

Right here, we’ve made some CSS styling for the online pages (tweets, copy buttons, hashtags) is CSS complicated to you? go to W3Schools

Some Necessary Capabilities

def create_temp_pdf(uploaded_file):
    """Create a brief PDF file from uploaded content material"""
    temp_dir = Path("temp")
    temp_dir.mkdir(exist_ok=True)

    temp_path = temp_dir / "uploaded_pdf.pdf"
    with open(temp_path, "wb") as f:
        f.write(uploaded_file.getvalue())

    return str(temp_path)


def initialize_session_state():
    """Initialize session state variables"""
    if "tweets" not in st.session_state:
        st.session_state.tweets = None
    if "hashtags" not in st.session_state:
        st.session_state.hashtags = None


def copy_text_and_show_success(textual content, success_key):
    """Copy textual content to clipboard and present success message"""
    attempt:
        pyperclip.copy(textual content)
        st.success("Copied to clipboard!", icon="✅")
    besides Exception as e:
        st.error(f"Failed to repeat: {str(e)}")

Right here, the create_temp_pdf() technique will create a temp listing within the undertaking folder and can put the uploaded PDF there for your complete course of.

initialize_session_state() technique will verify whether or not the tweets and hashtags are within the Streamlit session or not.

The copy_text_and_show_success() technique will use the Pyperclip library to repeat the tweets and hashtags to the clipboard and present that the copy was profitable.

Fundamental Perform

def essential():
    initialize_session_state()

    # Header
    st.markdown(
        "",
        unsafe_allow_html=True,
    )

    # Create two columns for format
    col1, col2 = st.columns([1, 1])

    with col1:
        st.markdown("### Add PDF")
        uploaded_file = st.file_uploader("Drop your PDF right here", sort=["pdf"])

        if uploaded_file:
            st.success("PDF uploaded efficiently!")

            if st.button("Generate Twitter Thread", key="generate"):
                with st.spinner("Producing Twitter thread..."):
                    attempt:
                        # Get Google API key
                        google_api_key = os.getenv("GOOGLE_API_KEY")
                        if not google_api_key:
                            st.error(
                                "Google API key not discovered. Please verify your .env file."
                            )
                            return

                        # Save uploaded file
                        pdf_path = create_temp_pdf(uploaded_file)

                        # Course of PDF and generate thread
                        repurposer = ContentRepurposer()
                        thread = repurposer.process_article(pdf_path)

                        # Retailer leads to session state
                        st.session_state.tweets = thread.tweets
                        st.session_state.hashtags = thread.hashtags

                        # Clear up non permanent file
                        os.take away(pdf_path)

                    besides Exception as e:
                        st.error(f"Error producing thread: {str(e)}")

    with col2:
        if st.session_state.tweets:
            st.markdown("### Generated Twitter Thread")

            # Copy whole thread part
            st.markdown("#### Copy Full Thread")
            all_tweets = "nn".be a part of(st.session_state.tweets)
            if st.button("📋 Copy Complete Thread"):
                copy_text_and_show_success(all_tweets, "thread")

            # Show particular person tweets
            st.markdown("#### Particular person Tweets")
            for i, tweet in enumerate(st.session_state.tweets, 1):
                tweet_col1, tweet_col2 = st.columns([4, 1])

                with tweet_col1:
                    st.markdown(
                        f"""
                    
                    """,
                        unsafe_allow_html=True,
                    )

                with tweet_col2:
                    if st.button("📋", key=f"tweet_{i}"):
                        copy_text_and_show_success(tweet, f"tweet_{i}")

            # Show hashtags
            if st.session_state.hashtags:
                st.markdown("### Instructed Hashtags")

                # Show hashtags with copy button
                hashtags_text = " ".be a part of(st.session_state.hashtags)
                hashtags_col1, hashtags_col2 = st.columns([4, 1])

                with hashtags_col1:
                    hashtags_html = " ".be a part of(
                        [
                            f"{hashtag}"
                            for hashtag in st.session_state.hashtags
                        ]
                    )
                    st.markdown(hashtags_html, unsafe_allow_html=True)

                with hashtags_col2:
                    if st.button("📋 Copy Tags"):
                        copy_text_and_show_success(hashtags_text, "hashtags")


if __name__ == "__main__":
    essential()

For those who learn this code intently, you will notice that Streamlit creates two columns: one for the PDF uploader perform and the opposite for displaying the generated tweets.

Within the first column, we’ve carried out just about the identical because the earlier essential.py with some additional markdown, including buttons for importing and producing threads utilizing the Streamlit object.

Within the second column, Streamlit iterates the tweet record or generated thread, places every tweet in a tweet field, and creates a duplicate button for the person tweet, and within the final, it should present all of the hashtags and their copy buttons.

Now the enjoyable half!!

Open your terminal and sort

streamlit run .app.py

If every thing is finished proper It would begin a Streamlit software in your default browser.

Now, drag and drop your downloaded PDF on the field, it should robotically add the PDF to the system, and click on on the Generate Twitter Thread button to generate tweets.

You possibly can copy full thread or particular person tweet utilizing respective copy buttons.

I hope doing hands-on initiatives like these will assist you study many sensible ideas on Generative AI, Python libraries, and programming. Completely happy Coding, Keep wholesome.

All of the code used on this article is right here.

Conclusion

This undertaking demonstrates the facility of mixing fashionable AI applied sciences to automate content material repurposing. By leveraging Gemini-2.0 and ChromaDB, we’ve created a system that not solely saves time but additionally maintains high-quality output. The modular structure guarantee straightforward upkeep and extensibility, whereas the Streamlit interface makes it accessible to non-technical customers.

Key Takeaways

The undertaking demonstrates profitable integration of cutting-edge AI instruments for practival content material automation.
The structure’s modularity permits for straightforward upkeep and future enhancements, making it a sustainable resolution for content material repurposing.
The Streamlit interface makes the device accessible to content material creators with out technical experience, bridging the hole between complicated AI know-how and sensible utilization.
The implementation can deal with numerous content material varieties and volumes, making it appropriate for each particular person content material creators and huge organizations.

Continuously Requested Questions

Q1. How does the syste deal with lengthy article?

A. The system makes use of RecursiveCharacterTextSplitter to interrupt down lengthy articles into manageable chunks, that are then embedded and saved in ChromaDB. When producing threads, it retrieves essentially the most related chunk utilizing similarity search.

Q2. What’s the optimum temperature setting for Gemini-2.0 on this software?

A. We used a temperature of 0.7, which offered stability between creativity and coherence. You possibly can regulate this setting based mostly on particular wants, with larger values (>0.7) producing extra inventive output and decrease values (<0.7) producing extra centered content material.

Q3. How does the system guarantee tweet size compliance?

A. The immediate template explicitly specifies the 280-character restrict, and the LLM is educated to respect this constraint. You possibly can add further validation to make sure compliance programmatically.

The media proven on this article isn’t owned by Analytics Vidhya and is used on the Creator’s discretion.

A self-taught, project-driven learner, like to work on complicated initiatives on deep studying, Laptop imaginative and prescient, and NLP. I at all times attempt to get a deep understanding of the subject which can be in any area comparable to Deep studying, Machine studying, or Physics. Like to create content material on my studying. Attempt to share my understanding with the worlds.

Previous articleWhy is signInWithPhoneNumber from @capacitor-firebase/authentication returning undefined in an Ionic app for iOS?

Next articleIBM Robotic Course of Autmation Vulnerability Let Attackers Acquire Delicate Knowledge