* Every thread has its own stack, stack pointer (SP), and program counter(PC)
### Example
Creates N structs, each thread processes it and returns it. No locks or anything fancy like that, just an example how
somebody would start with pthreads in c.
```c
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
typedef struct {
int x;
int y;
} thread_data;
void *thread_func(void *arg) {
thread_data *data = (thread_data *)arg; // cast arguments to correct type
printf("thread id = %lu\n", pthread_self()); // Get unique thread id, type: pthread_t
// Perform complex operations here
printf("x = %d, y = %d\n", data->x, data->y);
data->x += 10;
data->y += 10;
//
//return NULL; // if you do not need to return anything
pthread_exit((void *)data);
}
int main(int argc, char *argv[]) {
if (argc != 2) {
printf("Usage: %s <nthreads>", argv[0]);
exit(1);
}
int nthreads = atoi(argv[1]);
pthread_t threads[nthreads];
thread_data data[nthreads];
thread_data *ret_data[nthreads];
for (int i = 0; i <nthreads;i++){
data[i].x = i;
data[i].y = i + 1;
pthread_create(&threads[i], NULL, thread_func, (void *)&data[i]); // Creaate thread with arguments
} // If main thread is killed, then all the
// threads are also killed
for (int i = 0; i <nthreads;i++){
pthread_join(threads[i], (void **)&ret_data[i]); // Wait for every thread to finish
printf("x = %d, y = %d\n", ret_data[i]->x, ret_data[i]->y);
}
return 0;
}
```
### Few properties:
* Memory is shared between the threads (basically heap is shared)
* Global variables are seen by every thread
* Communication between threads is therefore through global variables, or pointers to same structures
### Synchronization of access to same variables
* We want to limit acces to some variable, that all the threads are accessing, so that only single one will be addressed ( avoiding race condition )
* We solve this problem by using p_thread_mutexes or in other words **locks** and introduce a "critical section"
* So one threads reveserves execution of that part of the code for itself, and prevents other threads from accessing that part of the code
* When lock is released then the next thread can continue with execution of that part of the code
* **locks** are created using atomic asm instructions (eg. test-and-set, fetch-and-add, compare-and-swap), that way a thread can lock part of a code using single instruction, preventing other threads from interrupting this behaivour.
-`pthread_cond_wait`: This function causes the calling thread to block until the specified condition is met. The thread must hold the mutex lock when calling this function, and it will be released while the thread is blocked. When the function returns, the mutex lock is reacquired by the thread.
-`pthread_cond_signal`: This function unblocks one thread that is blocked on the specified condition variable. If no threads are blocked on the condition variable, the function has no effect.
-`pthread_cond_broadcast`: This function unblocks all threads that are blocked on the specified condition variable. If no threads are blocked on the condition variable, the function has no effect.
For loop should be in a [canonical](https://www.openmp.org/spec-html/5.0/openmpsu40.html) form.
Basically should not include any function calls.
##### `schedule` addon
-`schedlue(static, [,N])`
- every thread gets N iters in series
- default
-`schedule(dynamic[,N])`
- every thread does N iters in series
- sum of all N <problem
- so when one thread computes N iters, if there are still things to proces, it processes it
-`schedule(guided[,N])`
- Same as dynamic but blocks are getting smaller and smaller
### Variables
Default:
- Most of the variables are shared
- Threads share global variables
- Variables that are in parallel blocks, are not shared
- Variables of the forked programs that are called from parrallel sections, are local for every thread
#### OpenMp variable addons
-`shared(vars)`
- Explicitly set global variables to be shared inside a block
- default, not needed
-`private(vars)`
- Not initialized, so you must set it a value
- It is per thread private, and does not affect the variable defined in program or any other thread
-`firstprivate(vars)`
- Copies local variable value and does not need to be initialized
- It is per thread private, and does not affect the variable defined in program or any other thread
-`lastprivate(vars)`
- Stores the result to the local variable on finish of the block
-`threadprivate(vars)`
- Makes vars local per thread
- They are not freed after block execution
- That means if we call that block again, the value of the variable will remain from previous call
-`reduction(op:var)`
- Every thread gets a local copy of variable var
- Local copies are initialized specified by op
- When they are done, they are grouped together into one group var
```c
#pragma omp parallell for reduction(+:counter)
for(i = 0; i <N;i++)
counter++;
```
| op | initial value |
|------|---------------|
| + | 0 |
| * | 1 |
| & | 1 |
| \| | 0 |
| ^ | 0 |
| && | 1 |
| \|\| | 0 |
### Synchronization between threads
-`critical(name)`
- basically locked section like in pthreads
- If critical section is not enough for you, you can also use `omp_lock_t` for more flexibilty. But watchout becouse the thread that locks the lock must be the one that also unlocks it!
-`n` is the number of nested for loops that are executed concurrently
-`flush(vars)`
- When a thread executes a flush directive, it forces any updates that the thread has made to its own cache to be written back to the main memory, so that other threads can see the updated data.
- This is useful when multiple threads are accessing and updating shared data, as it ensures that the changes made by one thread are visible to other threads in a timely manner.
### Tasks
The task construct in OpenMP is a mechanism for expressing parallelism in your code that allows you to specify units of work (called tasks) that can be executed concurrently.
It provides a way to create parallelism that is more dynamic and flexible than the traditional OpenMP constructs, such as parallel and for.
## OpenMPI (message passing interface)
It is designed to allow multiple computers to work together as a single system
and to enable the efficient exchange of data between them.
### Functions
-`int MPI_Init(int *argc, char **argv);`
- initializes MPI and sets up connections between processes
- CLI arguments are passed only to process 0!
-`int MPI_Comm_size(MPI_Comm, int *size)`
- returns number of nodes/processes in comunication
-`int MPI_Comm_rank(MPI_Comm, int *rank)`
- returns process id
-`MPI_Finalize(void)`
- closes connections
- It is the last function call to MPI in our program.
- cleans up
-`MPI_Send(void *message, int count, MPI_Datatype datatype, int destination, int tag, MPI_Comm comm)`
- send message to process, using its id(`destination`)
-<16kB`MPI_Bsend()`
- > 16kB `MPI_Ssend()`
-`MPI_Ssend()`
- returns information about message transmission
- waits for reciever confirmation of data
- Blocking function
-`MPI_Bsend()`
- The message is buffered.
- Function ends when message is written into a buffer