In my code, I have several variables which can either contain a pandas DataFrame or nothing at all. Let's say I want to test and see if a certain DataFrame has been created yet or not. My first thought would be to test for it like this:
if df1:
# do something
However, that code fails in this way:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Fair enough. Ideally, I would like to have a presence test that works for either a DataFrame or Python None.
Here is one way this can work:
if not isinstance(df1, type(None)):
# do something
However, testing for type is really slow.
t = timeit.Timer('if None: pass')
t.timeit()
# approximately 0.04
t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')
t.timeit()
# approximately 0.4
Ouch. Along with being slow, testing for NoneType isn't very flexible, either.
A different solution would be to initialize df1 as an empty DataFrame, so that the type would be the same in both the null and non-null cases. I could then just test using len(), or any(), or something like that. Making an empty DataFrame seems kind of silly and wasteful, though.
Another solution would be to have an indicator variable: df1_exists, which is set to False until df1 is created. Then, instead of testing df1, I would be testing df1_exists. But this doesn't seem all that elegant, either.
Is there a better, more Pythonic way of handling this issue? Am I missing something, or is this just an awkward side effect all the awesome things about pandas?




所有评论(0)