【Python 中階教學】多進程Pool.map的使用

Pool.map語法

Pool.map 會阻塞(Blocking)主進程，直到所有子進程完成任務才會繼續往下執行，並且輸出結果也會按照原資料的順序排列，以下是語法：

Python

from multiprocessing import Pool # 引用進程池類
with Pool(進程數量) as pool: # 使用with做上下文管理
   結果=pool.map(函式,資料列表) # 使用map函式

這邊有一個簡單的迴圈的程式（example_no_multi.py）

Python

import time

data = [i for i in range(0, 16)]


def f(x):
    time.sleep(2)
    return x**2


data_out = [f(x) for x in data]
print("final result:")
print(data_out)

資料data經過函式ｆ逐一處理後得到data_out，並輸出。現在我們希望用進程池，以map平行處理，便可改寫如下（example.py）：

Python

from multiprocessing import Pool
import time

data = [i for i in range(0, 16)]


def f(x):
    time.sleep(2)
    return x**2


with Pool(16) as pool:
    data_out = pool.map(f, data)

print("final result:")
print(data_out)

這裡更動的地方只是開一個Pool出來，並將函式f和資料data作為參數傳入map函式。為了展示實際的運作時間和程序，以shell script 簡單了監測腳本（exe.sh）：

ShellScript

#!/bin/bash
########################
# A Sample run monitor #
########################
delay=1
start=$(date +%s.%N)
cmd=$1
echo "execute command = $cmd"
eval $cmd &
echo 'running process:'
sleep $delay
proc=$(ps aux | grep "$cmd" | grep -v grep | grep -v bash)
echo "$proc"
pids=$(echo "$proc" | awk '{print $2}')
flag=0
########## wait until all pids exit ##############
## edit from https://unix.stackexchange.com/questions/305039/pausing-a-bash-script-until-previous-commands-are-finished
while [ $flag -eq 0 ]; do
    for PID in $(echo ${pids[@]}); do
        flag=1
        ps -ef | grep ${PID} | grep -v grep >/dev/null
        r=${?}
        if [ ${r} -eq 0 ]; then
            flag=0
        fi
    done
done
########################
end=$(date +%s.%N)
runtime=$(echo "$end - $start " | bc -l)
echo "time cost = $runtime sec. "

實際運作結果如下，非平行處理在終端機執行：

Python入門./exe.sh "python example_no_multi.py"

得到結果：

execute command = python example_no_multi.py
running process:
whuang0+   10141  2.0  0.0  27676  9652 pts/1    S+   11:27   0:00 python example_no_multi.py
final result:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]
time cost = 32.056646352 sec.

也就是每筆資料使用2秒，總共消耗2×16=32秒，並且只有一個主進程負責

平行處理在終端機執行：

 ./exe.sh "python example.py"

得到結果：

execute command = python example.py
running process:
whuang0+   20767  4.0  0.0 251352 12312 pts/3    Sl+  11:28   0:00 python example.py
whuang0+   20768  0.0  0.0  30156  9844 pts/3    S+   11:28   0:00 python example.py
whuang0+   20769  0.0  0.0  30156  9848 pts/3    S+   11:28   0:00 python example.py
whuang0+   20770  0.0  0.0  30156  9852 pts/3    S+   11:28   0:00 python example.py
whuang0+   20771  0.0  0.0  30156  9852 pts/3    S+   11:28   0:00 python example.py
whuang0+   20772  0.0  0.0  30156  9856 pts/3    S+   11:28   0:00 python example.py
whuang0+   20773  0.0  0.0  30156  9860 pts/3    S+   11:28   0:00 python example.py
whuang0+   20774  0.0  0.0  30156  9864 pts/3    S+   11:28   0:00 python example.py
whuang0+   20775  0.0  0.0  30156  9864 pts/3    S+   11:28   0:00 python example.py
whuang0+   20776  0.0  0.0  30156  9864 pts/3    S+   11:28   0:00 python example.py
whuang0+   20777  0.0  0.0  30156  9864 pts/3    S+   11:28   0:00 python example.py
whuang0+   20778  0.0  0.0  30156  9868 pts/3    S+   11:28   0:00 python example.py
whuang0+   20779  0.0  0.0  30156  9864 pts/3    S+   11:28   0:00 python example.py
whuang0+   20780  0.0  0.0  30156  9864 pts/3    S+   11:28   0:00 python example.py
whuang0+   20781  0.0  0.0  30156  9868 pts/3    S+   11:28   0:00 python example.py
whuang0+   20782  0.0  0.0  30156  9868 pts/3    S+   11:28   0:00 python example.py
whuang0+   20783  0.0  0.0  30156 10004 pts/3    S+   11:28   0:00 python example.py
final result:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]
time cost = 2.211762556 sec.

可以看到主進程為1個，另外有16個子進程，並且總耗費時間只有2秒左右。如此一來我們便成功實驗了使用Pool.map的方式加速處理資料。

楊明翰

是一名八年級中段班的創業者與資料科學家

“With belief and action, we change the world.”

憑藉信念與行動，我們改變世界💪

更多關於站長

本文允許重製、散布、傳輸以及修改，但不得為商業目的之使用

使用時必須註明出處自：楊明翰 , 台灣人工智慧與資料科學研究室 https://aistudio.tw

Pool.map語法

發佈留言 取消回覆

發佈留言取消回覆