|
| 1 | +# Parallel Processing Documentation |
| 2 | + |
| 3 | +I will lay out how to use the `thread.ParallelProcessing` class! |
| 4 | + |
| 5 | +<br /> |
| 6 | +<details> |
| 7 | + <summary>Jump to</summary> |
| 8 | + <ul> |
| 9 | + <li><a href='#how-does-it-work'> How it works </a></li> |
| 10 | + <li><a href='#initializing-a-parallel-process'> Initialize a Parallel Process </a></li> |
| 11 | + <li><a href='#parameters'> Parameters </a></li> |
| 12 | + <li><a href='#attributes'> Attributes </a></li> |
| 13 | + <li><a href='#methods'> Class Methods </a></li> |
| 14 | + </ul> |
| 15 | +</details> |
| 16 | + |
| 17 | + |
| 18 | +Don't have the thread library? [See here](./getting-started.md) for installing thread |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +## Importing the class |
| 23 | + |
| 24 | +```py |
| 25 | +from thread import ParallelProcessing |
| 26 | +``` |
| 27 | + |
| 28 | +<br /> |
| 29 | + |
| 30 | + |
| 31 | +## How does it work? |
| 32 | + |
| 33 | +Parallel Processing works best by optimizing data processing with large datasets. |
| 34 | + |
| 35 | +What it does: |
| 36 | +```py |
| 37 | +dataset = [1, 2, 3, ..., 2e10] |
| 38 | + |
| 39 | +# Splits into chunks as evenly as possible |
| 40 | +# thread_count = min(max_threads, len(dataset)) |
| 41 | +# n == len(chunks) == len(thread_count) |
| 42 | +chunks = [[1, 2, 3, ...], [50, 51, 52, ...], ...] |
| 43 | + |
| 44 | +# Initialize and run n threads |
| 45 | +# each thread handles 1 chunk of data and parses it into the function |
| 46 | + |
| 47 | +# processed data is arranged back in order |
| 48 | + |
| 49 | +# processed data is returned as a list[Data_Out] |
| 50 | +``` |
| 51 | + |
| 52 | +<br /> |
| 53 | + |
| 54 | + |
| 55 | +## Initializing a parallel process |
| 56 | + |
| 57 | +A simple example |
| 58 | +```py |
| 59 | +def my_data_processor(Data_In) -> Data_Out: ... |
| 60 | + |
| 61 | +# Reccommended way |
| 62 | +my_processor = ParallelProcessing( |
| 63 | + function = my_data_processor, |
| 64 | + dataset = [i in range(0, n)] |
| 65 | +) |
| 66 | + |
| 67 | +# OR |
| 68 | +# Not the reccommended way |
| 69 | +my_processor = ParallelProcessing(my_data_processor, [i in range(0, n)]) |
| 70 | +``` |
| 71 | + |
| 72 | +It can be ran by invoking the `start()` method |
| 73 | +```py |
| 74 | +my_processor.start() |
| 75 | +``` |
| 76 | + |
| 77 | +> [!NOTE] |
| 78 | +> The **threading.ParallelProcessing()** class from python will only be initialized when **start()** is invoked |
| 79 | +
|
| 80 | +<br /> |
| 81 | + |
| 82 | + |
| 83 | +### Parameters |
| 84 | + |
| 85 | +* function : (DataProcessor, dataset, *args, **kwargs) -> Any | Data_Out |
| 86 | + > This should be a function that takes in a dataset and/or anything and returns Data_Out and/or anything |
| 87 | +
|
| 88 | +* dataset : Sequence[Data_In] = () |
| 89 | + > This should be an interable sequence of arguments parsed to the `DataProcessor` function<br /> |
| 90 | + > (e.g. tuple('foo', 'bar')) |
| 91 | + |
| 92 | +* *overflow_args : Overflow_In |
| 93 | + > These are arguments parsed to [**thread.Thread**](./threading.md#parameters) |
| 94 | +
|
| 95 | +* **overflow_kwargs : Overflow_In |
| 96 | + > These are arguments parsed to [**thread.Thread**](./threading.md#parameters)<br /> |
| 97 | + > [!NOTE] |
| 98 | + > If `args` is present, then it will automatically be removed from kwargs and joined with `overflow_args` |
| 99 | +
|
| 100 | +* **Raises** AssertionError: max_threads is invalid |
| 101 | + |
| 102 | +<br /> |
| 103 | + |
| 104 | + |
| 105 | +### Attributes |
| 106 | + |
| 107 | +These are attributes of [`ParallelProcessing`](#importing-the-class) class |
| 108 | + |
| 109 | +* results : List[Data_Out] |
| 110 | + > The result value |
| 111 | + > **Raises** [`ThreadNotInitializedError`](./exceptions.md#threadNotInitializedError) |
| 112 | + > **Raises** [`ThreadNotRunningError`](./exceptions.md#threadnotrunningerror) |
| 113 | + > **Raises** [`ThreadStillRunningError`](./exceptions.md#threadStillRunningError) |
| 114 | +
|
| 115 | +<br /> |
| 116 | + |
| 117 | + |
| 118 | +### Methods |
| 119 | + |
| 120 | +These are methods of [`ParallelProcessing`](#importing-the-class) class |
| 121 | + |
| 122 | +* start : () -> None |
| 123 | + > Initializes the threads and starts it<br /> |
| 124 | + > **Raises** [`ThreadStillRunningError`](./exceptions.md#threadStillRunningError) |
| 125 | +
|
| 126 | +* is_alive : () -> bool |
| 127 | + > Indicates whether the thread is still alive<br /> |
| 128 | + > **Raises** [`ThreadNotInitializedError`](./exceptions.md#threadNotInitializedError) |
| 129 | +
|
| 130 | +* get_return_values : () -> Data_Out |
| 131 | + > Halts the current thread execution until the thread completes |
| 132 | +
|
| 133 | +* join : () -> JoinTerminatedStatus |
| 134 | + > Halts the current thread execution until a thread completes or exceeds the timeout |
| 135 | + > **Raises** [`ThreadNotInitializedError`](./exceptions.md#threadNotInitializedError) |
| 136 | + > **Raises** [`ThreadNotRunningError`](./exceptions.md#threadnotrunningerror) |
| 137 | +
|
| 138 | +<br /> |
| 139 | + |
| 140 | + |
| 141 | +Now you know how to use the [`ParallelProcessing`](#importing-the-class) class! |
| 142 | + |
| 143 | +[See here](./parallel-processing.md) for how to using the `thread.ParallelProcessing` class! |
0 commit comments