Back

AI Data Science Notebook Generator | Mouncef Ikhoubi

March 20, 2025

4 min read

AI Data Science Notebook Generator πŸ€–πŸ“Š

License: MIT Python Version

Automatically generate a complete Data Science Jupyter Notebook scaffold – including data loading, cleaning, visualization, modeling, and evaluation – using the power of the Google Gemini API. Stop writing boilerplate code and kickstart your analysis faster!

Project Showcase Screenshot Example snippet of a generated notebook section (Visualization)


Table of Contents


About The Project

Data science projects often involve repetitive setup and boilerplate code for common tasks like loading data, initial cleaning, basic visualizations, and setting up model training loops. This project aims to automate the generation of this initial Jupyter Notebook structure.

By providing a dataset description or path and defining the analytical goal, this tool leverages the Google Gemini API to generate Python code snippets organized within a Jupyter Notebook (.ipynb) file. This allows data scientists and analysts to:

The output is a fully functional Jupyter Notebook file (generated_notebook.ipynb) ready for execution and further customization.


Built With


Getting Started

Follow these steps to get a local copy up and running.

Prerequisites

Installation

  1. Clone the repo:
    git clone https://github.com/mouncefik/ai-data-science-notebook-generator.git
  2. Navigate to the project directory:
    cd ai-data-science-notebook-generator
  3. Install Python packages:
    pip install -r requirements.txt
  4. Set up environment variables:
    • Create a .env file in the root directory (you can copy .env.example).
    • Add your Google Gemini API Key to the .env file:
      # .env
      GOOGLE_API_KEY="YOUR_API_KEY_HERE"

Usage

  1. Open the main.ipynb Notebook:

    • You can use Jupyter Lab, Jupyter Notebook, VS Code with the Python/Jupyter extension, or Google Colab (after uploading the files and installing requirements).
    • Start Jupyter Lab:
      jupyter lab
    • Then navigate to and open main.ipynb.
  2. Configure the Generation:

    • Inside main.ipynb, locate the cells where you define the dataset_description and the analysis_goal.
    • Modify these variables according to your specific dataset and what you want the notebook to achieve (e.g., β€œLoad data from β€˜data/titanic.csv’, perform exploratory data analysis, and build a classification model to predict survival.”).
  3. Run the Notebook Cells:

    • Execute the cells in main.ipynb sequentially.
    • The notebook will:
      • Load your API key from the .env file.
      • Initialize the Google Gemini client.
      • Send prompts to the API based on your dataset description and goal to generate code for different sections (loading, cleaning, EDA, modeling, etc.).
      • Assemble the generated code into a new notebook structure.
  4. Check the Output:

    • A new file named generated_notebook.ipynb will be created in the project directory.
    • Open generated_notebook.ipynb to review, run, and customize the AI-generated code.

Important Note: AI-generated code requires careful review and validation. Ensure the generated code correctly implements the desired logic, handles edge cases, and aligns with best practices before relying on its output.


Features


Roadmap

See the open issues for a full list of proposed features (and known issues).


Contributing

Contributions are welcome! If you have suggestions or improvements, please feel free to:

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE file for more information.


Contact

Mouncef Ik - [Your Preferred Contact Method - e.g., LinkedIn Profile URL, Email]

Project Link: https://github.com/mouncefik/ai-data-science-notebook-generator


Acknowledgments