compiler-research · HrisShterev · Apr 19, 2026 · Apr 24, 2026 · Apr 25, 2026 · Apr 26, 2026
diff --git a/openmp/01_openmp-demo.ipynb b/openmp/01_openmp-demo.ipynb
@@ -0,0 +1,317 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2bf44eba-b903-40f1-9ef6-67fb8e2d9cc8",
+   "metadata": {},
+   "source": [
+    "# Introduction to OpenMP\n",
+    "\n",
+    "OpenMP (Open Multi-Processing) allows you to use multiple threads of the CPU when programming. It is an industry-standard API for shared-memory parallel programming, designed to help C++ developers take full advantage of multi-core processors without needing to manage low-level thread creation.\n",
+    "\n",
+    "## How it works\n",
+    "\n",
+    "1. The Fork-Join Model\n",
+    "OpenMP operates on a simple execution pattern:\n",
+    "\n",
+    "- The Master Thread: The program begins as a single serial thread.\n",
+    "- The Fork: When a parallel directive is reached, the master thread creates a team of worker threads.\n",
+    "- Parallel Execution: The work is distributed across these threads.\n",
+    "- The Join: Once the work is finished, the threads synchronize and terminate, leaving only the master thread to continue.\n",
+    "\n",
+    "2. Practical Implementation\n",
+    "- The most common use case is parallelising a for loop. By adding a single \"pragma\" line, you can distribute millions of calculations across your CPU cores.\n",
+    "\n",
+    "Here is an image of how the Fork-Join model works\n",
+    "\n",
+    "![fork-join](images/fork_join.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fb794c1-2999-43ae-9c9d-58060e4eb66f",
+   "metadata": {},
+   "source": [
+    "In this first example function we can see how to print Hello World normally. There is only standard c++ syntax."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "b0c15570-ee24-42ed-b61f-11a3fc858b2d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#include <iostream>\n",
+    "#include <omp.h>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "5001e441-1fa5-4bdc-9fa5-2ca103ae484f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Hello World!\n"
+     ]
+    }
+   ],
+   "source": [
+    "void example1() {\n",
+    "  std::cout << \"Hello World!\" << std::endl;\n",
+    "}\n",
+    "example1();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f37a7cfd-f158-4ef9-a4cf-977296ccf5d4",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "Now, let's use OpenMP to run the same code in parallel. By adding a simple compiler directive, we tell the system to create a \"team\" of threads.\n",
+    "\n",
+    "```#pragma omp parallel```  This directive tells the compiler to execute the following block of code in parallel using multiple threads.\n",
+    "\n",
+    "The result is that instead of printing once, you will see \"Hello World!\" printed multiple times—once for every available logical core on your CPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "53fb7656-b72e-42bc-ade7-2ae2077142da",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Hello World!\n",
+      "Hello World!\n",
+      "Hello World!\n",
+      "Hello World!\n",
+      "Hello World!\n",
+      "Hello World!\n",
+      "Hello World!Hello World!\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "void example2() {\n",
+    "    #pragma omp parallel\n",
+    "    {\n",
+    "    std::cout << \"Hello World!\" << std::endl;\n",
+    "    }\n",
+    "}\n",
+    "example2();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1568cd4-d407-49c7-98e4-26f971fdb63d",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "There is one thing we need to address before we continue\n",
+    "\n",
+    "When we use #pragma omp parallel, OpenMP spawns multiple threads that all execute the code inside the curly braces simultaneously. Every thread is trying to talk to our monitor at the exact same time.\n",
+    "\n",
+    "Thread A might finish sending \"Hello World!\" but, before it can send the newline, Thread B jumps in and prints its own \"Hello World!\". This results in two greetings on one line, followed by two newlines later.\n",
+    "\n",
+    "We fix this by using this by using ```#pragma omp critical```. It makes each thread wait for each other. This significantly slows down the program and loses the whole point of using multiple threads, but we are going to use it here in some examples to show you better how OpenMP works. Below we have a function that uses this fix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "efcdfdb6-a60b-46af-8194-75ef9cc0e27f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Hello World! (0)\n",
+      "Hello World! (5)\n",
+      "Hello World! (6)\n",
+      "Hello World! (4)\n",
+      "Hello World! (1)\n",
+      "Hello World! (3)\n",
+      "Hello World! (7)\n",
+      "Hello World! (2)\n"
+     ]
+    }
+   ],
+   "source": [
+    "void example3() {\n",
+    "    #pragma omp parallel\n",
+    "    {\n",
+    "        #pragma omp critical\n",
+    "        {\n",
+    "            std::cout << \"Hello World! (\" << omp_get_thread_num() << \")\" << std::endl;\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "example3();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6123ee3-92b9-42fe-b1a0-57f9764569aa",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "Other things we can do in OMP, includes using a set number of threads that are going to be used: ```#pragma omp parallel num_threads(2)``` "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "d86a9efa-ba28-4cb6-bbfc-abc00ee63506",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Hello World! (0)\n",
+      "Hello World! (6)\n",
+      "Hello World! (5)\n",
+      "Hello World! (3)\n",
+      "Hello World! (4)\n",
+      "Hello World! (1)\n",
+      "Hello World! (7)\n",
+      "Hello World! (2)\n",
+      "This is another message! (0)\n",
+      "Goodbye World! (Goodbye World! (01)\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "void example4() {\n",
+    "    #pragma omp parallel\n",
+    "    {\n",
+    "        #pragma omp critical\n",
+    "        {\n",
+    "            std::cout << \"Hello World! (\" << omp_get_thread_num() << \")\" << std::endl;\n",
+    "        }\n",
+    "    }\n",
+    "\n",
+    "    std::cout << \"This is another message! (\" << omp_get_thread_num() << \")\" << std::endl;\n",
+    "\n",
+    "    #pragma omp parallel num_threads(2)\n",
+    "    {\n",
+    "    std::cout << \"Goodbye World! (\" << omp_get_thread_num() << \")\" << std::endl;\n",
+    "    }\n",
+    "}\n",
+    "example4();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25421d19-67e5-4bc9-9420-d96439cffba5",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "Now that you know basic syntax we can do a speed benchmark. We are going to time filling two arrays with 1 million elements each, adding them and calculating the average. In the average calculation there is a problem that can occur. In parallel programming, if multiple threads try to add to the same average variable at the same time, they will overwrite each other. When we use ```reduction(+:average)```, OpenMP does the following:\n",
+    "\n",
+    "- Each thread gets its own private mini-average variable initialized to 0.\n",
+    "- Each thread calculates the sum for its assigned chunk of the array (e.g., Thread 1 sums indices 0 to 1000, Thread 2 sums 1001 to 2000).\n",
+    "- Once all threads finish, OpenMP safely adds all those private mini-sums together into the final global average variable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "5557e01a-7c7d-4b54-8545-962ad11027df",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Initialize a[] time: 0.657814\n",
+      "Initialize b[] time: 0.856986\n",
+      "Add arrays time: 0.581521\n",
+      "Average result time: 0.449791\n",
+      "Average: 5e+08\n",
+      "Total time: 2.54675\n"
+     ]
+    }
+   ],
+   "source": [
+    "void example5() {\n",
+    "    double start_time = omp_get_wtime();\n",
+    "    double start_loop;\n",
+    "    \n",
+    "    const int N = 1000000000;\n",
+    "    int* a = new int[N];\n",
+    "    int* b = new int[N];\n",
+    "    \n",
+    "    start_loop = omp_get_wtime();\n",
+    "    #pragma omp parallel for\n",
+    "    for (int i=0; i<N; i++) {\n",
+    "        a[i] = 1.0;\n",
+    "    }\n",
+    "    std::cout << \"Initialize a[] time: \" << omp_get_wtime()-start_loop << std::endl;\n",
+    "\n",
+    "    start_loop = omp_get_wtime();\n",
+    "    #pragma omp parallel for\n",
+    "    for (int i=0; i<N; i++) {\n",
+    "        b[i] = 1.0 + double(i);\n",
+    "    }\n",
+    "    std::cout << \"Initialize b[] time: \" << omp_get_wtime()-start_loop << std::endl;\n",
+    "\n",
+    "    start_loop = omp_get_wtime();\n",
+    "    #pragma omp parallel for\n",
+    "    for (int i=0; i<N; i++) {\n",
+    "        a[i] = a[i] + b[i];\n",
+    "    }\n",
+    "    std::cout << \"Add arrays time: \" << omp_get_wtime()-start_loop << std::endl;\n",
+    "    \n",
+    "    start_loop = omp_get_wtime();\n",
+    "    double average = 0.0;\n",
+    "    #pragma omp parallel for reduction(+:average)\n",
+    "    for (int i=0; i<N; i++) {\n",
+    "        average += a[i];\n",
+    "    }\n",
+    "    average = average/double(N);\n",
+    "    std::cout << \"Average result time: \" << omp_get_wtime()-start_loop << std::endl;\n",
+    "    \n",
+    "    std::cout << \"Average: \" << average << std::endl;\n",
+    "\n",
+    "    std::cout << \"Total time: \" << omp_get_wtime()-start_time << std::endl;\n",
+    "}\n",
+    "example5();"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "C++23 + OpenMP",
+   "language": "cpp",
+   "name": "xcpp23-omp"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-c++src",
+   "file_extension": ".cpp",
+   "mimetype": "text/x-c++src",
+   "name": "C++",
+   "nbconvert_exporter": "",
+   "pygments_lexer": "",
+   "version": "cxx23"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/openmp/hello_world.ipynb → openmp/02_hello_world.ipynb b/openmp/hello_world.ipynb → openmp/02_hello_world.ipynb
@@ -1,5 +1,19 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "aa52f895-3a3f-4c39-adaa-27a52c704fdb",
+   "metadata": {},
+   "source": [
+    "# OpenMP Hello World\n",
+    "\n",
+    "This is a simple OpenMP \"Hello World\" example that demonstrates the basics of parallel programming. We get the maximum number of threads available on the machine and set OpenMP to use all of them - in this case 8. Then we open a parallel region with `#pragma omp parallel`, which spawns all 8 threads simultaneously. Each thread independently reads its own ID via `omp_get_thread_num()` and prints a Hello World message along with the total thread count.\n",
+    "\n",
+    "One important thing to notice in the output is that the threads do not print in order, but all print in a seemingly random sequence. This is completely normal and expected. The OS and CPU scheduler decide which thread gets CPU time at any given moment, and since all threads are racing to reach the `printf` at the same time, whichever thread gets scheduled first prints first. This is called a race condition on output and is a fundamental characteristic of parallel execution. The order will likely be different every time you run the program.\n",
+    "\n",
+    "It is important to understand that while the print order is unpredictable, the computation itself is correct - each thread does its own independent work without interfering with the others. This distinction between chaotic ordering and correct parallel computation is one of the core concepts in parallel programming and OpenMP."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -64,7 +78,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "C++23 (xcpp+OpenMP)",
+   "display_name": "C++23 + OpenMP",
    "language": "cpp",
    "name": "xcpp23-omp"
   },
@@ -73,7 +87,9 @@
    "file_extension": ".cpp",
    "mimetype": "text/x-c++src",
    "name": "C++",
-   "version": "23"
+   "nbconvert_exporter": "",
+   "pygments_lexer": "",
+   "version": "cxx23"
   }
  },
  "nbformat": 4,