-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
First of all, I'd like to say this lib looks really good - it's simple, easy to integrate and has very usable API. I've been meaning to integrate it into my workflow for a while, and I finally got a chance to today. However, it's been an unsuccessful experience.
I'm on Windows, using Python 3.6.
I tried following the docs example for has_dtypes decorator. In short, it seems you really can't use the usual Python types in the dtype schema information. Instead of int, you have to use np.int4, and instead of str, np.object_ should be used instead.
I'd imagine this would be very confusing for newcomers, when they get assertion errors by just following the examples.
Here's a simple reprex:
import engarde.decorators as ed
import pandas as pd
import numpy as np
sample_df = pd.DataFrame([
dict(a=1, b='test 1'),
dict(a=2, b='test 2'),
])
expected_schema = dict(
a=int,
b=str
)
# I expect this to work, following the doc example. However, it fails
@ed.has_dtypes(items=expected_schema)
def expected_process(df):
return df
# Fails with AssertionError: a has the wrong dtype (<class 'int'>)
# comment it out to see the working example below
expected_process(sample_df)
working_schema = dict(
a=np.int64,
b=np.object_
)
# fixed the schema
@ed.has_dtypes(items=working_schema)
def working_process(df):
return df
# this works
working_process(sample_df)Metadata
Metadata
Assignees
Labels
No labels