Skip to content

Conversation

@shashbha14
Copy link
Contributor

Fixes #49222

When you create a Table from a dict that has pandas Timedelta objects made with Timestamp.replace(), the values were getting lost and showing up as 0:00:00 instead of the actual duration.

The problem was that from_pydict was treating lists of pandas Timedelta/Timestamp as plain Python lists, so it wasn't using the pandas-aware conversion path that knows how to handle these types correctly.

I fixed it by detecting when a list contains pandas temporals and wrapping it in a pandas Series before conversion. That way pa.array() uses the pandas conversion logic and preserves the values.

Added a test that reproduces the exact issue from the bug report. CI should verify everything works.

Note: I couldn't run the tests locally because my C++ build is currently broken from unrelated work, but this is a small Python-only change so CI should cover it.

Shashwati added 9 commits January 19, 2026 17:32
…able function

- Add errors parameter to cast() function with 'raise' (default) and 'coerce' options
- errors='coerce' converts invalid values to null instead of raising errors
- Add errors parameter to Array.cast(), Scalar.cast(), and ChunkedArray.cast() instance methods
- Verify is_castable() function is properly exposed and working
- Add comprehensive tests including the exact example from issue apache#48972
- Update documentation with examples showing errors='coerce' usage

This addresses issue apache#48972 by providing pandas.to_numeric(errors='coerce')
equivalent functionality in PyArrow.
…ma is provided

When reading JSON with explicit schema, the parser now attempts to convert
values to match the schema type before erroring. This allows JSON files
with inconsistent types (e.g., number and string for the same field) to
be read successfully when an explicit schema is provided.

Changes:
- Store explicit_schema in HandlerBase for access during parsing
- Modified AppendScalar to check for conversion before erroring
- Added TryConvertAndAppend helper function to handle conversions
- Updated Bool handler to also support conversion
- Added tests for number->string and string->number conversions

Supported conversions:
- Number <-> String (when numeric)
- Boolean <-> String
- Boolean <-> Number
- Number -> Boolean (0=false, non-zero=true)

Fixes apache#49158
…v for Apple Clang 14.0.0 compatibility

This fixes the CRAN build failure on macOS 13.3 with Apple Clang 14.0.0,
which doesn't fully support the C++20 std::floating_point concept.

The change replaces std::floating_point<T> with std::is_floating_point_v<T>
in the CFloatingPointConcept definition, maintaining the same functionality
while ensuring compatibility with older compilers.

Fixes apache#49176
… from replace

Adds a regression test for issue apache#49222 and adjusts _from_pydict to
box lists of pandas Timedelta/Timestamp into a pandas Series so that
pa.array uses the pandas-aware conversion path.
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] data loss converting to Table from pandas Timedelta built using replace

1 participant