What is thread safety in Python

Python course

Threads in Python

General definition of a thread


A thread is often referred to as a lightweight process. In general, a thread in computer science denotes an execution thread or an execution sequence in the processing of a program.

There are two types of thread. Kernel threads run as part of the operating system, while user threads are not implemented in the kernel of the operating system.

In a certain sense, user threads can also be understood as an extension of the functional concept of a programming language. You can see a thread like a function call or procedure call. In this view, a user thread corresponds to a procedure that is called from another point (via the explicit scheduling of precisely this user thread). In their return behavior in particular, however, they differ significantly from normal functions or procedures.

By default, every process has at least one thread, in a sense the process itself. A process can start several threads. Like processes, the operating system apparently executes these simultaneously.

The advantage of threads over processes is that the threads of a process share the same memory area for global variables. If a thread changes a global variable, the new value is also immediately visible in this variable for all other threads in the process. Further advantages are that on a system with several CPU cores, the program can be executed much faster because the threads can actually be executed simultaneously by the processes. In addition, the program can always remain accessible for both single-core and multi-core systems. However, a thread also has its own local variables.

The management of threads is easier for the operating system, which is why threads are also known as lightweight processes.

Threads in Python

Two modules support the use of threads in Python:
Warning: The "thread" module no longer exists in Python3 because it is considered obsolete. If you really want or have to, you can still use it in Python3. You just have to note that it has been renamed to _thread.

The thread module regards a thread as a function, while threading is implemented in an object-oriented manner, and each thread corresponds to its own object.

The thread module

With the thread module, individual functions can be executed in a separate thread. There is also the function thread.start_new_thread:



function is a reference to the function to be executed. args is a tuple with the parameters for the function function. The optional parameter kwargs can contain a dictionary with additional key-value parameters. The return value of start_new_thread () is a number that uniquely identifies the thread. After exiting function, the thread is automatically deleted.

Example of a thread in Python:
from thread import start_new_thread def heron (a): "" "Calculates the root of a" "" eps = 0.0000001 old = 1 new = 1 while True: old, new = new, (new + a / new) / 2.0 print old , new if abs (new - old) < eps:="" break="" return="" new="" start_new_thread(heron,(99,))="" start_new_thread(heron,(999,))="" start_new_thread(heron,(1733,))="" c="raw_input("Eingabe.")" raw_input()="" im="" vorigen="" beispiel="" ist="" notwendig,="" da="" alle="" threads="" sofort="" abgebrochen="" werden,="" wenn="" das="" hauptprogramm="" beendet="" ist.="" raw_input()="" bewirkt="" ein="" warten.="">

We extend the previous example with a counter for the threads. from thread import start_new_thread num_threads = 0 def heron (a): global num_threads num_threads + = 1 # code omitted num_threads - = 1 return new start_new_thread (heron, (99,)) start_new_thread (heron, (999,)) start_new_thread (heron, (1733,)) start_new_thread (heron, (17334,)) while num_threads> 0: pass But the script doesn't work as we might expect. What is wrong?
The final while loop is reached before one of the threads could be started. This happens because it prevents num_threads from increasing before the while loop is reached.

But there is another, more serious problem:
The problem lies in the assignments

and

that are not atomic. They basically consist of three actions: Reading the value of num_threads, then a new instance is created with the value increased or decreased by one. The new value must be assigned to the global variable num_threads again.

Errors can happen as follows:
The first thread reads in variable num_threads, which still has the value 0, then it goes "to sleep". Then the second thread also reads in the variable num_threads, which still has the value 0 because the first thread could no longer increase it. Now the third thread also reads in the variable num_threads, which still has the value 0 because the first and second thread could no longer increase it. The three threads then each save a 1, unless one of the other threads has already decreased the variable with the instruction num_threads - = 1 in the meantime.

solution

Problems of the previous kind can be solved by marking "critical sections" with lock objects. This makes them atomic, i.e. they cannot be split up and must be executed as a whole before another thread is allowed to continue working. A new lock object can be created with the thread.allocate_lock function:



The beginning of a "critical section" is marked with and the end with.
The solution with locks now looks like this:
from thread import start_new_thread, allocate_lock num_threads = 0 thread_started = False lock = allocate_lock () def heron (a): global num_threads, thread_started lock.acquire () num_threads + = 1 thread_started = True lock.release () ... lock.acquire () num_threads - = 1 lock.release () return new start_new_thread (heron, (99,)) start_new_thread (heron, (999,)) start_new_thread (heron, (1733,)) while not thread_started: pass while num_threads> 0: passport

threading module

We want to introduce the threading module with an example. The thread implemented there does very little, i.e. it sleeps for 5 seconds and outputs corresponding messages:
import time from threading import Thread def sleeper (i): print "thread% d sleeps for 5 seconds"% i time.sleep (5) print "thread% d woke up"% i for i in range (10): t = Thread (target = sleeper, args = (i,)) t.start () To explain how the threding.Thread class works: The threading.Thread class has a start () method that starts a thread. It triggers the run () method, which must be overloaded. The join () method ensures that the main program waits until all threads have terminated.

The output of the previous script looks like this:
thread 0 sleeps for 5 seconds thread 1 sleeps for 5 seconds thread 2 sleeps for 5 seconds thread 3 sleeps for 5 seconds thread 4 sleeps for 5 seconds thread 5 sleeps for 5 seconds thread 6 sleeps for 5 seconds thread 7 sleeps for 5 seconds thread 8 sleeps for 5 seconds thread 9 sleeps for 5 seconds thread 1 woke up thread 0 woke up thread 3 woke up thread 2 woke up thread 5 woke up thread 9 woke up thread 8 woke up thread 7 woke up thread 6 woke up thread 4 woke up The next example shows a thread that determines whether a number is a prime number. The thread is defined via the threading module. import threading class PrimeNumber (threading.Thread): def __init __ (self, number): threading.Thread .__ init __ (self) self.Number = number def run (self): counter = 2 while counter * counter < self.number:="" if="" self.number="" %="" counter="=" 0:="" print="" "%d="" ist="" keine="" primzahl,="" da="" %d="%d" *="" %d"="" %="" (="" self.number,="" self.number,="" counter,="" self.number="" counter)="" return="" counter="" +="1" print="" "%d="" ist="" eine="" primzahl"="" %="" self.number="" threads="[]" while="" true:="" input="long(raw_input("number:" "))="" if="" input="">< 1:="" break="" thread="PrimeNumber(input)" threads="" +="[thread]" thread.start()="" for="" x="" in="" threads:="" x.join()="" mit="" locks="" sollte="" es="" so="" aussehen:="" class="" primenumber(threading.thread):="" prime_numbers="{}" lock="threading.Lock()" def="" __init__(self,="" number):="" threading.thread.__init__(self)="" self.number="number" primenumber.lock.acquire()="" primenumber.prime_numbers[number]="None" primenumber.lock.release()="" def="" run(self):="" counter="2" res="True" while="" counter*counter="">< self.number="" and="" res:="" if="" self.number="" %="" counter="=" 0:="" res="False" counter="" +="1" primenumber.lock.acquire()="" primenumber.prime_numbers[self.number]="res" primenumber.lock.release()="" threads="[]" while="" true:="" input="long(raw_input("number:" "))="" if="" input="">< 1:="" break="" thread="PrimeNumber(input)" threads="" +="[thread]" thread.start()="" for="" x="" in="" threads:="" x.join()="">

Thread, real-life example

The previous examples were in principle only of didactic interest, but had little practical relevance. The following example shows an interesting application that can be used very well in practice. You want to find out in an existing local network which IP addresses are assigned or which computers are currently active. Manually we would proceed as follows for a network 192.168.178.x: We would ping the addresses 192.168.178.0, 192.168.178.1, 192.168.178.3 etc. to 192.168.178.255 and wait for the result. This can be implemented in Python using a for loop over the address area and an os.popen ("ping -q -c2" + ip, "r").

A threadless solution like the following is not very efficient. Since each ping has to be waited for separately.

Solution without threads:
import os, re received_packages = re.compile (r "(\ d) received") status = ("no response", "alive but losses", "alive") for suffix in range (20,30): ip = " 192.168.178. "+ Str (suffix) ping_out = os.popen (" ping -q -c2 "+ ip," r ") print" ... pinging ", ip while True: line = ping_out.readline () if not line: break n_received = received_packages.findall (line) if n_received: print ip + ":" + status [int (n_received [0])] To understand the script, you should look at the results of a ping call in the shell look at: $ ping -q -c2 192.168.178.26 PING 192.168.178.26 (192.168.178.26) 56 (84) bytes of data. --- 192.168.178.26 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min / avg / max / mdev = 0.022 / 0.032 / 0.042 / 0.010 ms If a ping does not lead to success, there is the following output: $ ping -q -c2 192.168.178.23 PING 192.168.178.23 (192.168.178.23) 56 (84) bytes of data. --- 192.168.178.23 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1006ms
And now a much faster solution with threads: import os, re, threading class ip_check (threading.Thread): def __init__ (self, ip): threading.Thread .__ init __ (self) self.ip = ip self .__ successful_pings = -1 def run (self): ping_out = os.popen ("ping -q -c2" + self.ip, "r") while True: line = ping_out.readline () if not line: break n_received = re.findall (received_packages , line) if n_received: self .__ successful_pings = int (n_received [0]) def status (self): if self .__ successful_pings == 0: return "no response" elif self .__ successful_pings == 1: return "alive, but 50 % package loss "elif self .__ successful_pings == 2: return" alive "else: return" shouldn't occur "received_packages = re.compile (r" (\ d) received ") check_results = [] for suffix in range (20 , 70): ip = "192.168.178." + Str (suffix) current = ip_check (ip) check_results.append (current) current.start () for el in check_results: el.join () print "Status from", el.ip, "is", el.status ()