Unconditional dependency on pandas/numpy increases package size by ~24x (<6MB -> 135MB)

Thank you for publishing a client library!

### Issue

The `here-location-services` package currently unconditionally depends on `pandas`, which depends on `numpy`, `pytz` and `python-dateutil`. On x86-64 Linux (for Python 3.9), these end up being very large (~130MB), with all of the rest of the dependencies being ~5MB. However, `pandas` is only used for converting the result for two functions associated with the matrix routing API:

https://github.com/heremaps/here-location-services-python/blob/325b4c08bddcce49c08b6eaf8660e9d1a30022fb/here_location_services/responses.py#L151-L182

It seems unfortunate to require these huge dependencies to be installed for only these wo functions when many people are likely to not be calling them anyway, and when the dependencies seemingly aren't required for any additional functionality within this client library.

#### Potential alternatives

1. Have pandas be an optional dependency (for example, via `extra_requires={"pandas": ["pandas"]}` in setup.py), and import it on-demand in the individual functions that need it. For example:
    ```python
    def to_distnaces_matrix(self):
        """Return distnaces matrix in a dataframe."""
        try:
            from pandas import DataFrame
        except ImportError as e:
            raise ImportError("pandas is not installed, run `pip install here-location-services[pandas]`) from e
    
        # ... existing implementation as before ...
    ```
   For an example of prior art, this option is what the popular Pydantic library does:
   - 'extra' dependency on `python-dotenv`: https://github.com/samuelcolvin/pydantic/blob/8846ec4685e749b93907081450f592060eeb99b1/setup.py#L134-L137
   - importing from `dotenv` within a function (not at the top level) and catching the `ImportError` to provide additional help to the user: https://github.com/samuelcolvin/pydantic/blob/8846ec4685e749b93907081450f592060eeb99b1/pydantic/env_settings.py#L297-L300
2. Remove the pandas dependency totally, and have the functions return the nested lists (`nested_distances`) without converting to a `DataFrame`. A user who wants to use pandas can still convert to a DataFrame themselves: `DataFrame(result.to_distnaces_matrix())` (the `columns=` argument seems to be unnecessary, as doing that call gives the same result AFAICT).

Both of these are probably best considered as breaking changes.

### Context

We were attempting to use this package in an AWS Lambda, which has strict size limits on the size of the code asset, and exceeding it results in errors like 'Unzipped size must be smaller than 262144000 bytes' when deploying (relevant docs: https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html#function-configuration-deployment-and-execution "Deployment package (.zip file archive)"). Additionally, larger packages result in slower cold starts: https://mikhail.io/serverless/coldstarts/aws/ . 

There's various ways to provide more code beyond the size limits (layers or docker images), but this provides some context for why someone might care about the size of a package and its dependency. (Those methods are fiddly enough and the cold start impact large enough that we've actually switched away from using this client library for now.)

### Package size details

Here's some commands I used to investigate the size impact, leveraging `pip install --target` to install a set of packages to a specific directory:

```shell
uname -a # Linux 322c9a327f85 5.10.104-linuxkit #1 SMP PREEMPT Wed Mar 9 19:01:25 UTC 2022 x86_64 GNU/Linux
python --version # Python 3.9.10

pip install --target=everything here-location-services
pip install --target=deps-pandas requests geojson flexpolyline pyhocon requests_oauthlib
pip install --target=deps-no-pandas requests geojson flexpolyline pyhocon requests_oauthlib pandas

du -sh everything # 135M
du -sh deps-pandas # 134M
du -sh deps-no-pandas # 5.1M
du -sh everything/here_location_services # 484K
```

That is, without pandas, the total installed package size would be 5.1M (`deps-no-pandas`) + 484K (`everything/here_location_services`) = ~5.6MB, down from 135MB (`everything`).

Summary of individual packages (reported by `du -sh everything/*`, ignoring the `$package.dist-info` directories that are mostly less than 50k anyway):

| package | size | only required for pandas? |
|---|---|---|
| pandas | 58M | yes |
| numpy.libs | 35M | yes |
| numpy | 33M | yes |
| pytz | 2.8M | yes |
| oauthlib | 1.4M | |
| urllib3 | 872K | |
| dateutil | 748K | yes |
| idna | 496K | |
| here_location_services | 484K | |
| 8 others | 1.5M | |




	class MatrixRoutingResponse(ApiResponse):
	"""A class representing Matrix routing response data."""

	def __init__(self, **kwargs):
	super().__init__()
	self._filters = {"matrix": None}
	for param, default in self._filters.items():
	setattr(self, param, kwargs.get(param, default))

	def to_geojson(self):
	"""Return API response as GeoJSON."""
	raise NotImplementedError("This method is not valid for MatrixRoutingResponse.")

	def to_distnaces_matrix(self):
	"""Return distnaces matrix in a dataframe."""
	if self.matrix and self.matrix.get("distances"):
	distances = self.matrix.get("distances")
	dest_count = self.matrix.get("numDestinations")
	nested_distances = [
	distances[i : i + dest_count] for i in range(0, len(distances), dest_count)
	]
	return DataFrame(nested_distances, columns=range(dest_count))

	def to_travel_times_matrix(self):
	"""Return travel times matrix in a dataframe."""
	if self.matrix and self.matrix.get("travelTimes"):
	distances = self.matrix.get("travelTimes")
	dest_count = self.matrix.get("numDestinations")
	nested_distances = [
	distances[i : i + dest_count] for i in range(0, len(distances), dest_count)
	]
	return DataFrame(nested_distances, columns=range(dest_count))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unconditional dependency on pandas/numpy increases package size by ~24x (<6MB -> 135MB) #29

Issue

Potential alternatives

Context

Package size details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

package	size	only required for pandas?
pandas	58M	yes
numpy.libs	35M	yes
numpy	33M	yes
pytz	2.8M	yes
oauthlib	1.4M
urllib3	872K
dateutil	748K	yes
idna	496K
here_location_services	484K
8 others	1.5M

Unconditional dependency on pandas/numpy increases package size by ~24x (<6MB -> 135MB) #29

Description

Issue

Potential alternatives

Context

Package size details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions