.. _manual:

=======================
Meridian User Manual
=======================


Introduction
============

Geospatial data and processes which deal with it have a good bit of jargon associated with them, and that
jargon is heavily influenced by the relational database environment in which many people manipulate spatial data. When
you try to describe to someone the process by which you relate one dataset to another, matching records from each
where geometries intersect, the word that naturally comes to mind is a *spatial join*. This terminology is even present
in most desktop GIS applications, making it a very common mental model for processing these kinds of data.

Often times, I find that tools and libraries try to make Python fit into the world of spatial joins and relational
databases, and while there have been several *very* successful projects to do so, it has never felt right to me.
Something has always been lost in translation and I don't feel like I'm writing idiomatic Python code. Besides, "JOIN"
is basically just syntacic sugar for a nested for loop:

.. code-block:: sql

    select
      *
    from table1 left join table2
    on st_intersects(table2.geom, table2.geom)

is approximately equivalent to

.. code-block:: python

    for r1 in table1:
        for r2 in table2:
            if intersects(r1, r2):
                # do something


Iteration is already a core part of Python which is highly optimized, can be done lazily (with generators), has many
libraries built around manipulating it (like `itertools`) and is well known to users of Python, so let's talk about these
things in the language of Python, instead of the language of relational databases.

Meridian, at its core, is trying to make bring these concepts for geospatial data processing into common Python idioms,
so you can use familiar terminologies, tools, and mental models when you're dealing with this domain.

Meridian is not trying to displace a tool like Geopandas, which is excellent for exploring and understanding datasets.
Instead, Meridian wants to be an efficient option for intensive and repeated geospatial processing applications
once data exploration is concluded.

.. _core:

Core Models
============

Meridian implements two main data types which users will leverage: the `Record` and the `Dataset`. Naturally,
a `Record` represents an individual row or record in your dataset, while a `Dataset` is a collection of `Record`
with a spatial index for efficient queries.

Record
^^^^^^^
Create a custom `Record` to model your data by subclassing `meridian.Record` and adding
annotations to the class. The simplest example of which has no annotations and therefore no attributes:

.. code-block:: python

    import meridian

    class GeomOnly(meridian.Record):
        pass

`Record` objects have similar attributes to `NamedTuple`s and `pydantic` models, but "under the hood" they are essentially
`NamedTuples` which always have at least one property called `geom` and implement e.g. the `__geo_interface__` protocol
for compatibility. You can create a `Record` through direct instantiation or the `Record.from_geojson` classmethod:

.. code-block:: python

    from shapely import wkt

    my_geom = wkt.loads("POINT(0 0)")
    empty1 = GeomOnly(my_geom)
    empty2 = GeomOnly.from_geojson({'geometry': {'type': 'Point', 'coordinates': [0, 0]}, 'properties': {}})

    print(empty1.geom.wkt)
    'POINT(0 0)'

However, in most cases, we will want to define some attributes to go along with our data. We do this by
adding annotations to our class definition:

.. code-block:: python

    import meridian

    class PowerPlant(meridian.Record):
        plant_code: int
        plant_name: str
        sector_name: str
        primsource: str
        install_mw: float
        total_mw: float
        year_built: int = -1

Now, when we create `PowerPlant` objects, each of the annotated attributes will be available as a named property
on the instantiated `Record`. When creating `Record`s, the types of incoming data *are not validated*, they are simply
passed through to the instance. The hints are primarily for your use as the developer. You can specify defaults for any
field, otherwise they will default to `None`.

When creating `Records` with annotations from geojson, the fields in the geojson's `properties` must match
the names in the annotations. Only the fields which are annotated on the class will be used, so this is a useful way
to filter fields which are not needed. If you are instantiating a `Record` directly, then the geometry must be the
first argument, and all attributes must be passed in as kwargs so they are named explicitly.

Modelling our data using classes has the advantage of allowing us to easily add custom behavior or derived attributes
to our data:

.. code-block:: python

    import meridian

    class PowerPlant(meridian.Record):
        install_mw: float
        total_mw: float

		@property
		def capacity_factor(self) -> float:
			"""https://en.wikipedia.org/wiki/Capacity_factor"""
			return self.total_mw / self.install_mw * 100


.. _design:

Design Goals
=============

Some items which are important to me, in no particular order:
 - Pythonicity. should be interoperable with standard library tools and be intuitive to use.
 - Efficiency. Memory use is kept as low as possible and operations are optimized when appropriate.
 - Type hinting wherever possible.
 - Strong support for dataset attribution.


Meridian's `Record` models draw strong inspiration from `pydantic`'s `BaseModel`, choosing to re-invent a small
part of that wheel for the purpose of efficiency and narrowing of focus.


.. _benchmarks:

Benchmarks
===========

A dataset opened with Meridian can use up to half as much memory as the same dataset in GeoPandas,
depending on the characteristics of the geometry. Oh yeah? Prove it!