AI Data Science Notebook Generator π€π
Automatically generate a complete Data Science Jupyter Notebook scaffold β including data loading, cleaning, visualization, modeling, and evaluation β using the power of the Google Gemini API. Stop writing boilerplate code and kickstart your analysis faster!
Example snippet of a generated notebook section (Visualization)
Table of Contents
- About The Project
- Built With
- Getting Started
- Usage
- Features
- Roadmap
- Contributing
- License
- Contact
- Acknowledgments
About The Project
Data science projects often involve repetitive setup and boilerplate code for common tasks like loading data, initial cleaning, basic visualizations, and setting up model training loops. This project aims to automate the generation of this initial Jupyter Notebook structure.
By providing a dataset description or path and defining the analytical goal, this tool leverages the Google Gemini API to generate Python code snippets organized within a Jupyter Notebook (.ipynb
) file. This allows data scientists and analysts to:
- π Accelerate Project Kick-off: Get a working notebook structure in seconds.
- π Reduce Boilerplate: Focus on the unique aspects of the analysis, not repetitive code.
- π‘ Explore Quickly: Generate initial visualizations and model outlines rapidly.
- π Learn: See how common data science tasks can be structured (though AI-generated code always requires review!).
The output is a fully functional Jupyter Notebook file (generated_notebook.ipynb
) ready for execution and further customization.
Built With
- Python
- Jupyter Notebook
- Google Gemini API
- google-generativeai (Python Client Library)
- python-dotenv
Getting Started
Follow these steps to get a local copy up and running.
Prerequisites
- Python: Version 3.7+ recommended. Install from python.org.
- pip: Python package installer (usually comes with Python).
python -m ensurepip --upgrade
- Git: To clone the repository. Install Git.
- Google Gemini API Key: You need an API key from Google AI Studio.
- Visit Google AI Studio.
- Click βGet API keyβ and follow the instructions.
- Important: Keep your API key secure and do not commit it directly into your code or repository.
Installation
- Clone the repo:
git clone https://github.com/mouncefik/ai-data-science-notebook-generator.git
- Navigate to the project directory:
cd ai-data-science-notebook-generator
- Install Python packages:
pip install -r requirements.txt
- Set up environment variables:
- Create a
.env
file in the root directory (you can copy.env.example
). - Add your Google Gemini API Key to the
.env
file:# .env GOOGLE_API_KEY="YOUR_API_KEY_HERE"
- Create a
Usage
-
Open the
main.ipynb
Notebook:- You can use Jupyter Lab, Jupyter Notebook, VS Code with the Python/Jupyter extension, or Google Colab (after uploading the files and installing requirements).
- Start Jupyter Lab:
jupyter lab
- Then navigate to and open
main.ipynb
.
-
Configure the Generation:
- Inside
main.ipynb
, locate the cells where you define thedataset_description
and theanalysis_goal
. - Modify these variables according to your specific dataset and what you want the notebook to achieve (e.g., βLoad data from βdata/titanic.csvβ, perform exploratory data analysis, and build a classification model to predict survival.β).
- Inside
-
Run the Notebook Cells:
- Execute the cells in
main.ipynb
sequentially. - The notebook will:
- Load your API key from the
.env
file. - Initialize the Google Gemini client.
- Send prompts to the API based on your dataset description and goal to generate code for different sections (loading, cleaning, EDA, modeling, etc.).
- Assemble the generated code into a new notebook structure.
- Load your API key from the
- Execute the cells in
-
Check the Output:
- A new file named
generated_notebook.ipynb
will be created in the project directory. - Open
generated_notebook.ipynb
to review, run, and customize the AI-generated code.
- A new file named
Important Note: AI-generated code requires careful review and validation. Ensure the generated code correctly implements the desired logic, handles edge cases, and aligns with best practices before relying on its output.
Features
- π€ AI-Powered: Uses Google Gemini for intelligent code generation.
- E End-to-End Structure: Generates sections for common DS tasks: * Data Loading * Data Cleaning / Preprocessing * Exploratory Data Analysis (EDA) & Visualization * Feature Engineering (basic) * Model Building * Model Evaluation
- β±οΈ Time Saving: Reduces manual effort in setting up notebooks.
- <0xF0><0x9F><0x93><0x96> Jupyter Format: Outputs a standard
.ipynb
file. - βοΈ Configurable: Define dataset and goals via variables in the main notebook.
Roadmap
- Improve prompt engineering for more robust and accurate code generation.
- Add options to select specific types of models or visualizations.
- Allow direct input of dataset file path for automatic schema detection (requires careful implementation).
- Implement basic error handling for API calls.
- Add support for different generative models (e.g., other model families within Gemini).
- Include more advanced preprocessing/feature engineering options.
- Add unit/integration tests.
See the open issues for a full list of proposed features (and known issues).
Contributing
Contributions are welcome! If you have suggestions or improvements, please feel free to:
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE
file for more information.
Contact
Mouncef Ik - [Your Preferred Contact Method - e.g., LinkedIn Profile URL, Email]
Project Link: https://github.com/mouncefik/ai-data-science-notebook-generator
Acknowledgments
- Google AI for the Gemini API.
- The open-source Python data science ecosystem (Pandas, Scikit-learn, Matplotlib, etc.) whose usage patterns inform the generation.
- README Template inspiration