如何优化使用DASK搜索参数空间? (无交叉验证)
下面是代码(在这里没有DASK):
def build(ntries,param,niter,func,score,train,test):
res=[]
for i in range(ntries):
cparam=param.rvs(size=niter,random_state=i)
res.append( func(cparam, train, test, score) )
return res
def score(test,correct):
return np.linalg.norm(test-correct)
def compute_optimal(res):
from operator import itemgetter
_sorted=sorted(res,None,itemgetter(1))
return _sorted
def func(c,train,test,score):
dt=1.0/len(c)
for cc in c:
train=train - cc*dt
return (c,score(train,test))
这是我如何使用它:
from dask import delayed
from distributed import LocalCluster, Client
cluster=LocalCluster(n_workers=4, threads_per_worker=1)
cli=Client(cluster)
from scipy.stats import uniform
import numpy as np
niter=500
loc=1.0e-09
scale=1.0
ntries=1000
sched=uniform(loc=loc,scale=scale)
train=np.arange(1000)+0.5
test=np.arange(1000)
# HERE IS THE DASK
graph=build(ntries,sched,niter,delayed(func),score,train,test)
# THE QUESTION SECTION
# I do these steps to bring back all the values so that I could search for the score-wise optimal pair: (parameter, score)
res=[cli.compute(g) for g in graph]
results=[r.result() for r in res]
# Actual search for the optimal pair
optimal=compute_optimal(results)
best,worst=optimal[0],optimal[-1]
这些问题是:
- 我使用DASK正确吗?
- 上午我取数据返回给客户端是否正确? 有没有更有效的方法来做到这一点?
- 有没有办法做搜索工人的最佳配对?
PS最近我张贴相关的问题,但不同的问题( 使用DASK自定义参数搜索类中thread.lock分布 )。 我已经解决了它,不久将发布一个答案,并会关闭的问题。