Skip to content

manipulating non-missing elements in a field #911

@JonathanGregory

Description

@JonathanGregory

In order to use a scikit regression module, I need to

  • Extract the non-missing elements from a 3D field fin(z,y,x) into a numpy array (z,p), where p is a 1D index over all the non-missing (y,x) locations, which are the same for every z.

  • Put a 1D array of results r(p) into the non-missing data elements of a 2D field fout(y,x).

I've done it as follows:

findata=fin.data.compressed().to_dask_array()
findata.compute_chunk_sizes()
findata=findata.reshape(fin.shape[0],fin[0,:].data.compressed().size)
# compute r from findata
foutdata=fout.get_data().flatten()
foutdata[~foutdata.mask]=r
fout.set_data(foutdata.reshape(fout.shape))

Are there better ways, or could you provide better ways? The following occur to me:

  • It would be handy if compressed() was a method of a CF field, to flatten it into a 1D array containing only the non-missing values.
  • I needed three lines to produce findata because of having to compute_chunk_sizes on the dask array. Without that step, I got errors as below.
  • Is there a way to avoid having to get the Data out of the field in order to assign to some elements of it using the mask? This was a bit clumsy.

In the Met O IDL lib, both input and output are easier:

findata=fin.data(where(fin(0).data ne fin(0).bmdi))
; compute r from findata
fout.data(where(fout.data ne fout.bmdi))=r

It would be good to have something equally convenient in cf-python. Perhaps there is?

Here's the dask problem:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/sws02jmg/.local/lib/python3.11/site-packages/cfdm/decorators.py", line 44, in inplace_wrapper
    processed_copy = operation_method(self, *args, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/users/sws02jmg/.local/lib/python3.11/site-packages/cf/data/data.py", line 10489, in reshape
    dx = dx.reshape(*shape, merge_chunks=merge_chunks, limit=limit)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/software/applications/anaconda/2023.09-0/lib/python3.11/site-packages/dask/array/core.py", line 2220, in reshape
    return reshape(self, shape, merge_chunks=merge_chunks, limit=limit)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/software/applications/anaconda/2023.09-0/lib/python3.11/site-packages/dask/array/reshape.py", line 222, in reshape
    raise ValueError(
ValueError: Array chunk size or shape is unknown. shape: (nan,)

Possible solution with x.compute_chunk_sizes()

Thanks. Jonathan

Metadata

Metadata

Assignees

No one assigned

    Labels

    daskRelating to the use of DaskenhancementNew feature or requestquestionGeneral question

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions