arcgis-python-api: Enhanced Logging when Row cannot be added to Spatially Enabled Dataframe

Is your feature request related to a problem? Please describe. When ingesting records into a spatially enabled dataframe from another source, a warning occurs stating:

Could not insert the row because of the error message: value #XX - unsupported type: NAType. Recheck your data.

This warning/error would be significantly more helpful if it stated the record row number and the column/field name or index. Is the value number supposed to be the column? I don’t know and the end user probably doesn’t either.

sedf = arcgis.features.GeoAccessor.from_featureclass(featureclass)

Describe the solution you’d like Enrich the error message to include:

the record row’s number that could not be added to the spatially enabled dataframe
- especially an ObjectID if coming from an enterprise geodatabase or other Esri-specific format.
the field or column name and/or index number to alert the user to the location of the error.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 23 (13 by maintainers)

Most upvoted comments

I really think we’ve come full circle on this; what does the user expect when they push a spatially enabled dataframe to a file-based format: to_featureclass(), to_arrow, to_parquet, etc.?

I would contend that the user does not expect these functions to drop rows containing data. I certainly did not expect that.
- if this is the API’s preferred default method of handling data that it does not expect, then the error reporting should be better
- in your example above, I think that an analysis would be equally affected if certain values were dropped which would also skew the results and have unintended downstream consequences
- when I first developed this data hygiene process, that data didn’t contain errors that would trigger that warning, so this issue remained unnoticed as it had already been put into production and automated
there is precedent already in these functions for managing the user’s data proactively to avoid limitations in the destination file type:
- sanitizing NA types would be congruent with also managing user data in response to the destination file type limitations

Now, ideally, should the user know better? Sure. But without additional documentation or error handling, how would the user know:

the term value actually means field/column
which row to check for their bad data
resolving NA types-- --or even that NA type is a limitation
the preference is to drop data that doesn’t fit

I think it’s easy to say, well, be a better programmer. This is fairly analogous to Recheck your data. But the reality of your user base here is, by and large, not going to be computer science types that necessarily appreciate the difference between different types of nulls or who will have an expert understanding of dataframes generally. They’re going to be self-taught, or otherwise lightly trained, and come out of the button-clicker, turned ArcPy user, to API for Python user.

In my opinion, this is a fairly compelling user story to add a few lines of error handling that has shown up in the Esri Ideas forums independently twice. This is a change that lowers the bar to entry, creates appropriate guardrails, and allows users to “stand on the shoulders of giants”-- --which is arguably the ethos of programming. We attempt to spare others the labor of re-solving problems that are well-understood by experienced programmers. People play in the Esri-verse because they just need things to work. Insisting that they become intimately familiar with a wide variety of pandas operations and fully understand the limitations of a variety of destination file types isn’t congruent with the ease of use pattern.

FeralCatColonist on Mar 22, 2023