Hey everyone!
This page will be your central resource hub throughout the Summer of Code program. I Prince (Elec) with Anushka(Civil) and Swaprabha(EP) , will be your mentors for this journey. Together, we'll build an AI agent that reads research papers, extracts knowledge, and generates literature reviews or hypotheses using LLMs.
We'll update this page weekly with new goals, resources, and tasks. Let's learn, build, and ship something awesome
Goal of the Week:
Parse research papers (PDFs)
We'll use tools like:
Goal: Convert a research paper into raw text (title, abstract, paragraphs, sections).
Clean + chunk text
This is important because LLMs and vector databases work better with smaller, digestible pieces.
→Example chunk:
“In this study, we explore how transformer models can be used for document understanding..."
Use tools like:
RecursiveCharacterTextSplitter
Generate embeddings
You can use:
text-embedding-3-small
)all-MiniLM-L6-v2
, BGE models)Why embeddings? So that similar text has similar vectors → which lets us later search by meaning, not just keywords.
Store them in a vector DB (FAISS or Weaviate)
These databases let us search for relevant text chunks later, based on a user's question.
Later on:
When the user asks: “What methods did this paper use?” →
We convert the question to an embedding → find similar chunks in the DB → send those chunks to GPT to answer.