Skip to content

Unexpected type problems with has_dtype decorator #55

@kot-behemoth

Description

@kot-behemoth

First of all, I'd like to say this lib looks really good - it's simple, easy to integrate and has very usable API. I've been meaning to integrate it into my workflow for a while, and I finally got a chance to today. However, it's been an unsuccessful experience.

I'm on Windows, using Python 3.6.

I tried following the docs example for has_dtypes decorator. In short, it seems you really can't use the usual Python types in the dtype schema information. Instead of int, you have to use np.int4, and instead of str, np.object_ should be used instead.

I'd imagine this would be very confusing for newcomers, when they get assertion errors by just following the examples.

Here's a simple reprex:

import engarde.decorators as ed
import pandas as pd
import numpy as np


sample_df = pd.DataFrame([
    dict(a=1, b='test 1'),
    dict(a=2, b='test 2'),
])

expected_schema = dict(
    a=int,
    b=str
)

# I expect this to work, following the doc example. However, it fails
@ed.has_dtypes(items=expected_schema)
def expected_process(df):
    return df

# Fails with AssertionError: a has the wrong dtype (<class 'int'>)
# comment it out to see the working example below
expected_process(sample_df)


working_schema = dict(
    a=np.int64,
    b=np.object_
)

# fixed the schema
@ed.has_dtypes(items=working_schema)
def working_process(df):
    return df

# this works
working_process(sample_df)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions