Skip to content

Document GitHub Sentiment Dataset Pipeline #2

@splimon

Description

@splimon

Purpose

To create clear and reproducible documentation so future contributors can understand and rerun the GitHub sentiment pipeline without reverse‑engineering prior work. The notebooks should explain the commit/PR comment flow and produce outputs compatible with Kaiaulu.

Process

Create a documentation PR that:

  • Adds the .yml files for the GitHub projects used in the dataset
  • Adds the Python script used to load sentiment labels into MySQL
  • Converts existing MySQL Workbench queries into Jupyter notebooks
  • Organizes notebooks in a clear way:
    • Load sentiment CSV into MySQL
    • Explore DB + identify relevant tables
    • Scale/automate across all projects
  • Export each notebook to a .py file so reviewers can read the code directly in GitHub (in addition to the .ipynb)
  • Does not commit any data files (e.g., CSV exports, database dumps, generated data, etc.), only code and documentation (scripts, notebooks, configs)

The final notebook must output a file that can be inner joined with Kaiaulu output for the same project.

Task List

  • Add .yml config project files to the repo
  • Add Python loader script for loading sentiment labels into MySQL
  • Create Notebook 1: load sentiment CSV into MySQL
  • Create Notebook 2: explore DB + locate relevant tables/joins
  • Create Notebook 3: scale/automate pipeline across all projects
  • Export notebooks to .py for code review
  • Ensure final notebook outputs a file that is joinable with Kaiaulu

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions