Pool.map語法
Pool.map 會阻塞(Blocking)主進程,直到所有子進程完成任務才會繼續往下執行,並且輸出結果也會按照原資料的順序排列,以下是語法:
Python
from multiprocessing import Pool # 引用進程池類
with Pool(進程數量) as pool: # 使用with做上下文管理
結果=pool.map(函式,資料列表) # 使用map函式
這邊有一個簡單的迴圈的程式(example_no_multi.py)
Python
import time
data = [i for i in range(0, 16)]
def f(x):
time.sleep(2)
return x**2
data_out = [f(x) for x in data]
print("final result:")
print(data_out)
資料data經過函式f逐一處理後得到data_out,並輸出。現在我們希望用進程池,以map平行處理,便可改寫如下(example.py):
Python
from multiprocessing import Pool
import time
data = [i for i in range(0, 16)]
def f(x):
time.sleep(2)
return x**2
with Pool(16) as pool:
data_out = pool.map(f, data)
print("final result:")
print(data_out)
這裡更動的地方只是開一個Pool出來,並將函式f和資料data作為參數傳入map函式。為了展示實際的運作時間和程序,以shell script 簡單了監測腳本(exe.sh):
ShellScript
#!/bin/bash
########################
# A Sample run monitor #
########################
delay=1
start=$(date +%s.%N)
cmd=$1
echo "execute command = $cmd"
eval $cmd &
echo 'running process:'
sleep $delay
proc=$(ps aux | grep "$cmd" | grep -v grep | grep -v bash)
echo "$proc"
pids=$(echo "$proc" | awk '{print $2}')
flag=0
########## wait until all pids exit ##############
## edit from https://unix.stackexchange.com/questions/305039/pausing-a-bash-script-until-previous-commands-are-finished
while [ $flag -eq 0 ]; do
for PID in $(echo ${pids[@]}); do
flag=1
ps -ef | grep ${PID} | grep -v grep >/dev/null
r=${?}
if [ ${r} -eq 0 ]; then
flag=0
fi
done
done
########################
end=$(date +%s.%N)
runtime=$(echo "$end - $start " | bc -l)
echo "time cost = $runtime sec. "
實際運作結果如下,非平行處理在終端機執行:
Python入門./exe.sh "python example_no_multi.py"
得到結果:
execute command = python example_no_multi.py
running process:
whuang0+ 10141 2.0 0.0 27676 9652 pts/1 S+ 11:27 0:00 python example_no_multi.py
final result:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]
time cost = 32.056646352 sec.
也就是每筆資料使用2秒,總共消耗2×16=32秒,並且只有一個主進程負責
平行處理在終端機執行:
./exe.sh "python example.py"
得到結果:
execute command = python example.py
running process:
whuang0+ 20767 4.0 0.0 251352 12312 pts/3 Sl+ 11:28 0:00 python example.py
whuang0+ 20768 0.0 0.0 30156 9844 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20769 0.0 0.0 30156 9848 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20770 0.0 0.0 30156 9852 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20771 0.0 0.0 30156 9852 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20772 0.0 0.0 30156 9856 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20773 0.0 0.0 30156 9860 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20774 0.0 0.0 30156 9864 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20775 0.0 0.0 30156 9864 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20776 0.0 0.0 30156 9864 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20777 0.0 0.0 30156 9864 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20778 0.0 0.0 30156 9868 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20779 0.0 0.0 30156 9864 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20780 0.0 0.0 30156 9864 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20781 0.0 0.0 30156 9868 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20782 0.0 0.0 30156 9868 pts/3 S+ 11:28 0:00 python example.py
whuang0+ 20783 0.0 0.0 30156 10004 pts/3 S+ 11:28 0:00 python example.py
final result:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]
time cost = 2.211762556 sec.
可以看到主進程為1個,另外有16個子進程,並且總耗費時間只有2秒左右。如此一來我們便成功實驗了使用Pool.map的方式加速處理資料。
本文允許重製、散布、傳輸以及修改,但不得為商業目的之使用
使用時必須註明出處自:楊明翰 , 台灣人工智慧與資料科學研究室 https://aistudio.tw