-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Labels
daskRelating to the use of DaskRelating to the use of DaskenhancementNew feature or requestNew feature or requestquestionGeneral questionGeneral question
Description
In order to use a scikit regression module, I need to
-
Extract the non-missing elements from a 3D field
fin(z,y,x) into a numpy array (z,p), where p is a 1D index over all the non-missing (y,x) locations, which are the same for every z. -
Put a 1D array of results
r(p) into the non-missing data elements of a 2D fieldfout(y,x).
I've done it as follows:
findata=fin.data.compressed().to_dask_array()
findata.compute_chunk_sizes()
findata=findata.reshape(fin.shape[0],fin[0,:].data.compressed().size)
# compute r from findata
foutdata=fout.get_data().flatten()
foutdata[~foutdata.mask]=r
fout.set_data(foutdata.reshape(fout.shape))
Are there better ways, or could you provide better ways? The following occur to me:
- It would be handy if
compressed()was a method of a CF field, to flatten it into a 1D array containing only the non-missing values. - I needed three lines to produce
findatabecause of having tocompute_chunk_sizeson the dask array. Without that step, I got errors as below. - Is there a way to avoid having to get the
Dataout of the field in order to assign to some elements of it using the mask? This was a bit clumsy.
In the Met O IDL lib, both input and output are easier:
findata=fin.data(where(fin(0).data ne fin(0).bmdi))
; compute r from findata
fout.data(where(fout.data ne fout.bmdi))=r
It would be good to have something equally convenient in cf-python. Perhaps there is?
Here's the dask problem:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/users/sws02jmg/.local/lib/python3.11/site-packages/cfdm/decorators.py", line 44, in inplace_wrapper
processed_copy = operation_method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/users/sws02jmg/.local/lib/python3.11/site-packages/cf/data/data.py", line 10489, in reshape
dx = dx.reshape(*shape, merge_chunks=merge_chunks, limit=limit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/applications/anaconda/2023.09-0/lib/python3.11/site-packages/dask/array/core.py", line 2220, in reshape
return reshape(self, shape, merge_chunks=merge_chunks, limit=limit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/applications/anaconda/2023.09-0/lib/python3.11/site-packages/dask/array/reshape.py", line 222, in reshape
raise ValueError(
ValueError: Array chunk size or shape is unknown. shape: (nan,)
Possible solution with x.compute_chunk_sizes()
Thanks. Jonathan
Metadata
Metadata
Assignees
Labels
daskRelating to the use of DaskRelating to the use of DaskenhancementNew feature or requestNew feature or requestquestionGeneral questionGeneral question