Production-ready Python + Playwright test automation framework for the Parakh AI Evaluation Platform by CivicDataLab.
| Layer | Tool | Coverage |
|---|---|---|
| E2E UI | Playwright + pytest | Auth, homepage, navigation, feature tabs, AI models, evaluations list + detail, New Evaluation wizard (draft, automated & manual modes, cancel paths), evaluators management, evaluator role, auditor flows, prompt libraries, mobile viewport |
| Accessibility | axe-playwright-python (WCAG 2.1 AA) | Axe scans, alt text, ARIA, keyboard, skip links, social-icon labels |
| Visual Regression | Pillow pixel-diff | Desktop / tablet / mobile viewports |
| API / HTTP | requests | Status codes, headers, response time, GraphQL contracts |
| Performance | CDP + Navigation Timing API | Load time, TTFB, LCP, mobile 3G, authenticated-route budgets |
- Python 3.11+
- pip (or
pipx) - Node.js 18+ (required by Playwright browser installer)
- Internet access to
dev.parakh.civicdataspace.in
# 1. Clone the repo
git clone <repo-url>
cd ParakhAI_test
# 2. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install Python dependencies
pip install -r requirements.txt
# 4. Install Playwright browser binaries
playwright install --with-deps chromium
# 5. Configure environment
cp .env.example .env
# Fill in TEST_EMAIL_1, TEST_PASSWORD_1 at minimumpytestpytest tests/e2e/ -m e2e # E2E browser tests
pytest tests/accessibility/ -m accessibility # WCAG / axe-core
pytest tests/visual/ -m visual # screenshot regression
pytest tests/api/ -m api # HTTP-layer (no browser)
pytest tests/performance/ -m performance # load & timing metricspytest -m smoke -v # fast CI sanity subset
pytest -m auth -v # authenticated tests only (needs TEST_EMAIL_1)
pytest -m mobile -v # mobile-viewport tests (390px)
pytest -m regression_write # write-side tests (needs SANDBOX_ORG_SLUG)pytest tests/e2e/test_homepage.py -v
pytest tests/e2e/test_homepage.py::TestClass::test_name -vpytest -n 2 tests/e2e/ # 2 workers (recommended for browser tests)
pytest -n auto tests/api/ # auto-detect workers (good for API tests)pytest tests/e2e/ --splits 3 --group 1 # shard 1 of 3
pytest tests/e2e/ --splits 3 --group 2 # shard 2 of 3
pytest tests/e2e/ --splits 3 --group 3 # shard 3 of 3HEADLESS=false SLOW_MO=500 pytest tests/e2e/ -vBASE_URL=https://staging.parakh.civicdataspace.in pytest -m api -vpytest tests/visual/ -v
# Tests skip with "Baseline saved — re-run to compare"pytest tests/visual/ -v
# Tests fail if pixel diff > VISUAL_THRESHOLD (default 0.2%)rm snapshots/homepage_desktop_1440x900.png
pytest tests/visual/test_visual_regression.py::TestHomepageVisual::test_homepage_desktop_screenshot -v| Variable | Default | Description |
|---|---|---|
BASE_URL |
https://dev.parakh.civicdataspace.in |
Target platform URL |
GRAPHQL_URL |
https://dev.api.parakh.civicdataspace.in/graphql/ |
GraphQL endpoint |
ENVIRONMENT |
development |
local / development / staging / production |
BROWSER |
chromium |
chromium / firefox / webkit |
HEADLESS |
true |
false to watch the browser |
SLOW_MO |
0 |
Milliseconds between each Playwright action |
TIMEOUT |
30000 |
Default Playwright timeout (ms) |
VIEWPORT_WIDTH |
1440 |
Browser viewport width (px) |
VIEWPORT_HEIGHT |
900 |
Browser viewport height (px) |
SCREENSHOT_ON_FAILURE |
true |
Auto-screenshot on test failure |
VISUAL_THRESHOLD |
0.2 |
Max pixel-diff % for visual tests |
TEST_EMAIL_1 |
— | Primary test account email |
TEST_PASSWORD_1 |
— | Primary test account password |
TEST_EMAIL_2 |
— | Secondary account for multi-user tests (authenticated_page_u2) |
TEST_PASSWORD_2 |
— | Secondary account password |
TEST_USER_INDEX |
1 |
Active user slot — 1 or 2 |
SANDBOX_ORG_SLUG |
— | Org slug for write-side tests; unset = all regression_write tests skip |
| Marker | When to use |
|---|---|
@pytest.mark.e2e |
Browser UI end-to-end tests |
@pytest.mark.accessibility |
WCAG / axe-core tests |
@pytest.mark.visual |
Screenshot regression tests |
@pytest.mark.api |
HTTP-layer tests (no browser) |
@pytest.mark.performance |
Load/timing metric tests |
@pytest.mark.mobile |
Mobile-viewport tests (390×844) |
@pytest.mark.smoke |
Fast sanity subset for PR checks |
@pytest.mark.regression |
Full regression suite |
@pytest.mark.regression_write |
Mutating tests — auto-skip unless SANDBOX_ORG_SLUG is set |
@pytest.mark.auth |
Tests requiring an authenticated session (TEST_EMAIL_1 must be set) |
@pytest.mark.skip_on_ci |
Tests excluded from CI environments |
Push / PR to main or dev
│
▼
┌─────────┐
│ lint │ ruff check
└────┬────┘
│
┌────┴──────────────────────────────────────────┐
│ (parallel jobs) │
├──────────┬──────────────────────┬─────────────┤
│ api-tests│ e2e-tests (3 shards) │ a11y-tests │
│ │ shard 1 | 2 | 3 │ │
│ │ -n 2 per shard │ │
└──────────┴──────────────────────┴─────────────┘
visual-tests (separate job)
│
┌─────────▼──────────┐
│ test-summary │
│ downloads shards │
│ merge_test_reports │
│ → GitHub Step │
│ Summary │
└────────────────────┘
Artifacts retained 30 days:
e2e-shard-{1,2,3} per-shard HTML + JSON reports + screenshots
e2e-combined-report merged JSON + Markdown summary
api-test-report API HTML report
accessibility-report axe JSON + HTML
visual-regression-report diff images + HTML
Triggers: push to main / dev / develop, pull_request to main, workflow_dispatch.
- Choose the right folder:
tests/e2e/,tests/api/, etc. - Name the file
test_<feature>.py - Add the marker at module level:
pytestmark = [pytest.mark.e2e] - Use Page Objects from
pages/— create a new one extendingBasePageif needed, with selectors in a matchinglocators/<feature>_locators.py - Run locally before pushing:
pytest tests/e2e/test_my_feature.py -v --reruns 0
# pages/my_feature_page.py
from pages.base_page import BasePage
from locators.my_feature_locators import MyFeatureLocators
class MyFeaturePage(BasePage):
HEADING = MyFeatureLocators.HEADING
def go_to_my_feature(self) -> "MyFeaturePage":
self.navigate(self.config.url("/my-feature"))
self.wait_for_load("domcontentloaded")
self.wait_for_app_ready()
self.skip_if_redirected_to_home("/my-feature")
return self# tests/e2e/test_my_feature.py
import pytest
from playwright.sync_api import Page
from pages.my_feature_page import MyFeaturePage
pytestmark = [pytest.mark.e2e, pytest.mark.regression, pytest.mark.auth]
def test_heading_visible(authenticated_page_fast: Page):
page = MyFeaturePage(authenticated_page_fast)
page.go_to_my_feature()
assert page.is_visible(page.HEADING)After each run, reports are written to reports/:
| File | Contents |
|---|---|
reports/report.html |
Default HTML report (full run) |
reports/report.json |
pytest-json-report output |
reports/TEST_REPORT.md |
Markdown summary |
reports/e2e_report_shard_N.html |
Per-shard HTML (CI only) |
reports/e2e_shard_N.json |
Per-shard JSON (CI only) |
reports/e2e_combined.json |
Merged shard JSON (CI test-summary job) |
reports/e2e_summary.md |
Merged Markdown posted to GitHub Step Summary |
reports/a11y_report.html |
Accessibility HTML report |
reports/accessibility_report.json |
Structured axe violations JSON |
reports/performance_metrics.json |
Public-page timing metrics |
reports/performance_metrics_auth.json |
Authenticated-route timing metrics |
screenshots/FAIL_*.png |
Failure screenshots |
screenshots/DIFF_*.png |
Visual regression diff images |
All scripts share auth + GraphQL plumbing in scripts/_api_client.py: headless Playwright login through Keycloak using TEST_EMAIL_1 / TEST_PASSWORD_1, access token from /api/auth/session, then GraphQL via requests. Defaults target the dev backend; override via BASE_URL / GRAPHQL_URL.
python scripts/cleanup_drafts.py --dry-run
python scripts/cleanup_drafts.py
python scripts/cleanup_drafts.py --org-id 5python scripts/cleanup_all.py --dry-run
python scripts/cleanup_all.py --include-cancelled-older-than 7
python scripts/cleanup_all.py --status DRAFT,RUNNINGpython scripts/seed_test_data.py # 5 drafts in org 1
python scripts/seed_test_data.py --count 10
python scripts/seed_test_data.py --model-id 129
python scripts/seed_test_data.py --dry-runpython scripts/sandbox_reset.py --dry-run
python scripts/sandbox_reset.py --yes # non-interactive (CI)Requires SANDBOX_ORG_SLUG in .env — refuses to run otherwise.
python scripts/merge_test_reports.py \
reports/e2e_shard_1.json \
reports/e2e_shard_2.json \
reports/e2e_shard_3.json \
--output reports/e2e_combined.json \
--markdown reports/e2e_summary.mdUsed by the test-summary CI job to unify per-shard results into a single Markdown summary posted to GitHub Step Summary.
Tests marked @pytest.mark.regression_write mutate platform state (creating evaluations, adding evaluators, etc.) and only run against a dedicated sandbox organisation. Set SANDBOX_ORG_SLUG in .env (or as a GitHub Actions secret) to enable them. The autouse forbid_outside_sandbox fixture in tests/conftest.py skips every regression_write test when it is unset.
ParakhAI_test/
├── .github/workflows/
│ ├── ci.yml # PR + push pipeline (lint → parallel suites → summary)
│ └── scheduled.yml # Nightly regression (02:00 UTC)
├── tests/
│ ├── e2e/ # Browser UI tests (~310 tests)
│ │ ├── test_auth.py
│ │ ├── test_homepage.py
│ │ ├── test_navigation.py
│ │ ├── test_feature_tabs.py
│ │ ├── test_models.py
│ │ ├── test_evaluations.py
│ │ ├── test_evaluation_detail.py
│ │ ├── test_new_evaluation_smoke.py
│ │ ├── test_new_evaluation_regression.py
│ │ ├── test_new_evaluation_full_flow.py
│ │ ├── test_new_evaluation_cancel.py
│ │ ├── test_evaluation_workspace.py
│ │ ├── test_evaluation_workspace_manual.py
│ │ ├── test_evaluators_management.py
│ │ ├── test_evaluators_management_write.py
│ │ ├── test_evaluator_role.py
│ │ ├── test_assignment_workflow.py
│ │ ├── test_multi_user_assignment.py
│ │ ├── test_auditor_evaluations.py
│ │ ├── test_auditor_model_detail.py
│ │ ├── test_ai_maker_dashboard.py
│ │ ├── test_org_selection.py
│ │ ├── test_prompt_libraries.py
│ │ ├── test_user_flows.py
│ │ └── test_mobile.py
│ ├── accessibility/ # WCAG / axe-core tests
│ ├── visual/ # Screenshot regression
│ ├── api/ # HTTP-layer tests
│ │ ├── test_graphql.py
│ │ └── test_graphql_authenticated.py
│ ├── performance/ # Load & timing tests
│ ├── data/
│ │ └── test_data.py # GraphQL queries/mutations + sandbox constants
│ └── conftest.py # All shared fixtures
├── pages/ # Page Object Models
│ ├── base_page.py # Base — all pages inherit from here
│ ├── home_page.py
│ ├── login_page.py
│ ├── dashboard_page.py
│ ├── ai_maker_dashboard_page.py
│ ├── models_page.py
│ ├── evaluations_page.py
│ ├── evaluation_detail_page.py
│ ├── new_evaluation_page.py
│ ├── evaluation_workspace_page.py
│ ├── evaluators_page.py
│ ├── evaluator_role_page.py
│ ├── auditor_model_detail_page.py
│ ├── org_selection_page.py
│ └── prompt_libraries_page.py
├── locators/ # Raw CSS/text selectors (one file per page)
├── scripts/
│ ├── _api_client.py # Shared auth + GraphQL helpers
│ ├── cleanup_drafts.py
│ ├── cleanup_all.py
│ ├── seed_test_data.py
│ ├── sandbox_reset.py
│ └── merge_test_reports.py # Merges CI shard JSONs → combined report
├── utils/
│ ├── config.py
│ ├── helpers.py
│ ├── reporters.py
│ └── test_data_factory.py
├── docs/
│ ├── app_bugs.md # Known platform bugs; all xfails link here
│ ├── a11y_findings.md
│ └── visual_diffs.md
├── snapshots/ # Visual regression baselines
├── reports/ # Generated reports (gitignored)
├── screenshots/ # Failure + diff screenshots (gitignored)
├── pytest.ini
├── requirements.txt
├── .env.example
├── CLAUDE.md
├── CONTRIBUTING.md
├── SECURITY.md
└── README.md
MIT — see LICENSE for details.