Skip to content

Masks

Donut Masking

donut(gdf, low, high, container=None, distribution='uniform', seed=None, snap_to_streets=False)

Apply donut masking to a GeoDataFrame, randomly displacing points between a minimum and maximum distance. Advantages of this mask is speed and simplicity, though it does not handle highly varied population densities well.

Example
from maskmypy import donut

masked = donut(
    gdf=sensitive_points,
    min=100,
    max=1000
)

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrame containing sensitive points.

required
low float

Minimum distance to displace points. Unit must match that of the gdf CRS.

required
high float

Maximum displacement to displace points. Unit must match that of the gdf CRS.

required
container GeoDataFrame

A GeoDataFrame containing polygons within which intersecting sensitive points should remain after masking. This works by masking a point, checking if it intersects the same polygon prior to masking, and retrying until it does. Useful for preserving statistical relationships, such as census tract, or to ensure that points are not displaced into impossible locations, such as the ocean. CRS must match that of gdf.

None
distribution str

The distribution used to determine masking distances. uniform provides a flat distribution where any value between the minimum and maximum distance is equally likely to be selected. areal is more likely to select distances that are further away. The gaussian distribution uses a normal distribution, where values towards the middle of the range are most likely to be selected. Note that gaussian distribution has a small chance of selecting values beyond the defined minimum and maximum.

'uniform'
seed int

Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined.

None
snap_to_streets bool

If True, points are snapped to the nearest node on the OSM street network after masking. This can reduce the chance of false-attribution.

False

Returns:

Type Description
GeoDataFrame

A GeoDataFrame containing masked points.

Source code in maskmypy/masks/donut.py
def donut(
    gdf: GeoDataFrame,
    low: float,
    high: float,
    container: GeoDataFrame = None,
    distribution: str = "uniform",
    seed: int = None,
    snap_to_streets: bool = False,
) -> GeoDataFrame:
    """
    Apply donut masking to a GeoDataFrame, randomly displacing points between a minimum and
    maximum distance. Advantages of this mask is speed and simplicity, though it does not
    handle highly varied population densities well.

    Example
    -------
    ```python
    from maskmypy import donut

    masked = donut(
        gdf=sensitive_points,
        min=100,
        max=1000
    )
    ```

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame containing sensitive points.
    low : float
        Minimum distance to displace points. Unit must match that of the `gdf` CRS.
    high : float
        Maximum displacement to displace points. Unit must match that of the `gdf` CRS.
    container : GeoDataFrame
        A  GeoDataFrame containing polygons within which intersecting sensitive points should
        remain after masking. This works by masking a point, checking if it intersects
        the same polygon prior to masking, and retrying until it does. Useful for preserving
        statistical relationships, such as census tract, or to ensure that points are not
        displaced into impossible locations, such as the ocean. CRS must match that of `gdf`.
    distribution : str
        The distribution used to determine masking distances. `uniform` provides
        a flat distribution where any value between the minimum and maximum distance is
        equally likely to be selected. `areal` is more likely to select distances that are
        further away. The `gaussian` distribution uses a normal distribution, where values
        towards the middle of the range are most likely to be selected. Note that gaussian
        distribution has a small chance of selecting values beyond the defined minimum and
        maximum.
    seed : int
        Used to seed the random number generator so that masked datasets are reproducible.
        Randomly generated if left undefined.
    snap_to_streets : bool
        If True, points are snapped to the nearest node on the OSM street network after masking.
        This can reduce the chance of false-attribution.

    Returns
    -------
    GeoDataFrame
        A GeoDataFrame containing masked points.
    """
    _gdf = gdf.copy()
    _validate_donut(_gdf, low, high, container)

    seed = tools.gen_seed() if not seed else seed

    args = locals()
    del args["snap_to_streets"]
    del args["gdf"]

    masked_gdf = _Donut(**args).run()

    if snap_to_streets:
        masked_gdf = tools.snap_to_streets(masked_gdf)

    return masked_gdf

Street Masking

street(gdf, low, high, max_length=1000, seed=None, padding=0.2)

Apply street masking to a GeoDataFrame, displacing points along the OpenStreetMap street network. This helps account for variations in population density, and reduces the likelihood of false attribution as points are always displaced to the street network. Each point is snapped to the nearest node on the network, then displaced along the surround network between low and high nodes away.

Example
from maskmypy import street

masked = street(
    gdf=sensitive_points,
    low=20,
    high=30
)

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrame containing sensitive points.

required
low int

Minimum number of nodes along the OSM street network to traverse.

required
high int

Maximum number of nodes along the OSM street network to traverse.

required
max_length float

When locating the closest node to each point on the street network, MaskMyPy verifies that its immediate neighbours are no more than max_length away, in meters. This prevents extremely large masking distances, such as those caused by long highways.

1000
seed int

Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined.

None
padding float

OSM network data is retrieved based on the bounding box of the sensitive GeoDataFrame. Padding is used to expand this bounding box slightly to reduce unwanted edge-effects. A value of 0.2 would add 20% of the x and y extent to each side of the bounding box.

0.2

Returns:

Type Description
GeoDataFrame

A GeoDataFrame containing masked points.

Source code in maskmypy/masks/street.py
def street(
    gdf: GeoDataFrame,
    low: int,
    high: int,
    max_length: float = 1000,
    seed: int = None,
    padding: float = 0.2,
) -> GeoDataFrame:
    """
    Apply street masking to a GeoDataFrame, displacing points along the OpenStreetMap street
    network. This helps account for variations in population density, and reduces the likelihood
    of false attribution as points are always displaced to the street network. Each point is
    snapped to the nearest node on the network, then displaced along the surround network between
    `low` and `high` nodes away.

    Example
    -------
    ```python
    from maskmypy import street

    masked = street(
        gdf=sensitive_points,
        low=20,
        high=30
    )
    ```

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame containing sensitive points.
    low : int
        Minimum number of nodes along the OSM street network to traverse.
    high : int
        Maximum number of nodes along the OSM street network to traverse.
    max_length : float
        When locating the closest node to each point on the street network, MaskMyPy verifies
        that its immediate neighbours are no more than `max_length` away, in meters. This prevents
        extremely large masking distances, such as those caused by long highways.
    seed : int
        Used to seed the random number generator so that masked datasets are reproducible.
        Randomly generated if left undefined.
    padding : float
        OSM network data is retrieved based on the bounding box of the sensitive GeoDataFrame.
        Padding is used to expand this bounding box slightly to reduce unwanted edge-effects.
        A value of `0.2` would add 20% of the x and y extent to *each side* of the bounding box.

    Returns
    -------
    GeoDataFrame
        A GeoDataFrame containing masked points.
    """
    _gdf = gdf.copy()
    _validate_street(_gdf, low, high)

    seed = tools.gen_seed() if not seed else seed

    args = locals()
    del args["gdf"]

    masked_gdf = _Street(**args).run()

    return masked_gdf

street_k(gdf, population_gdf, population_column='pop', min_k=30, start=10, stop=60, spread=2, increment=2, suppression=0.99, max_length=1000, seed=None, padding=0.2)

Iteratively applies street masking to a GeoDataFrame, incrementally increasing the low/high node values until a given k-satisfaction threshold is reached. This provides a much more robust privacy promise, but requires population data.

For instance, if min_k=30 and suppression=0.99, then street masking will be repeated with progressively higher values until 99% of points have a k-anonymity of at least 30. Suppressed points are displaced to the center of the point distribution and labeled as such in a SUPPRESSED column.

Example
from maskmypy import street_k

masked = street(
    gdf=sensitive_points,
    population_gdf=addresses,
    start=20,
    spread=5,
    min_k=30,
    suppression=0.95
)

This will perform street masking starting with street(gdf, low=20, high=25) and slowly increment values until 95% of points achieve a k-anonymity of at least 30, with the rest being suppressed.

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrame containing sensitive points.

required
population_gdf GeoDataFrame

A GeoDataFrame containing either address points or polygons with a population column (see population_column). Used to calculate k-anonymity metrics. Note that address points tend to provide more accurate results.

required
population_column str

If a polygon-based population_gdf is provided, the name of the column containing population counts.

'pop'
min_k int

Points that do not reach this k-anonymity value will be suppressed.

30
start int

Initial value of low in street().

10
stop int

Maximum value of low in street() before exiting. Used to prevent endless searches.

60
spread int

Used to calculate the high value in street(). High = start + spread.

2
increment int

Amounted incremented in each iteration until min_k is met

2
suppression float

Percent of points that must satisfy min_k.

0.99
max_length float

When locating the closest node to each point on the street network, MaskMyPy verifies that its immediate neighbours are no more than max_length away, in meters. This prevents extremely large masking distances, such as those caused by long highways.

1000
seed int

Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined.

None
padding float

OSM network data is retrieved based on the bounding box of the sensitive GeoDataFrame. Padding is used to expand this bounding box slightly to reduce unwanted edge-effects. A value of 0.2 would add 20% of the x and y extent to each side of the bounding box.

0.2

Returns:

Type Description
GeoDataFrame

A GeoDataFrame containing masked points.

Source code in maskmypy/masks/street.py
def street_k(
    gdf: GeoDataFrame,
    population_gdf: GeoDataFrame,
    population_column: str = "pop",
    min_k: int = 30,
    start: int = 10,
    stop: int = 60,
    spread: int = 2,
    increment: int = 2,
    suppression: float = 0.99,
    max_length: float = 1000,
    seed: int = None,
    padding: float = 0.2,
) -> GeoDataFrame:
    """
    Iteratively applies street masking to a GeoDataFrame, incrementally increasing the low/high node values
    until a given k-satisfaction threshold is reached. This provides a much more robust privacy
    promise, but requires population data.

    For instance, if `min_k=30` and `suppression=0.99`, then street masking will be repeated with 
    progressively higher values until 99% of points have a k-anonymity of at least 30. 
    Suppressed points are displaced to the center of the point distribution and labeled as such 
    in a `SUPPRESSED` column.

    Example
    -------
    ```python
    from maskmypy import street_k

    masked = street(
        gdf=sensitive_points,
        population_gdf=addresses,
        start=20,
        spread=5,
        min_k=30,
        suppression=0.95
    )
    ```

    This will perform street masking starting with `street(gdf, low=20, high=25)` and slowly increment
    values until 95% of points achieve a k-anonymity of at least 30, with the rest being suppressed.

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame containing sensitive points.
    population_gdf : GeoDataFrame
        A GeoDataFrame containing either address points or polygons with a population column
        (see `population_column`). Used to calculate k-anonymity metrics. Note that
        address points tend to provide more accurate results.
    population_column : str
        If a polygon-based `population_gdf` is provided, the name of the column containing
        population counts.
    min_k: int
        Points that do not reach this k-anonymity value will be suppressed.
    start: int
        Initial value of `low` in `street()`.
    stop: int
        Maximum value of `low` in `street()` before exiting. Used to prevent endless searches.
    spread: int
        Used to calculate the `high` value in `street()`. High = `start + spread`.
    increment: int
        Amounted incremented in each iteration until `min_k` is met
    suppression: float
        Percent of points that must satisfy `min_k`.
    max_length : float
        When locating the closest node to each point on the street network, MaskMyPy verifies
        that its immediate neighbours are no more than `max_length` away, in meters. This prevents
        extremely large masking distances, such as those caused by long highways.
    seed : int
        Used to seed the random number generator so that masked datasets are reproducible.
        Randomly generated if left undefined.
    padding : float
        OSM network data is retrieved based on the bounding box of the sensitive GeoDataFrame.
        Padding is used to expand this bounding box slightly to reduce unwanted edge-effects.
        A value of `0.2` would add 20% of the x and y extent to *each side* of the bounding box.

    Returns
    -------
    GeoDataFrame
        A GeoDataFrame containing masked points.
    """

    k_sat = 0

    while k_sat < suppression:
        if start > stop:
            raise RuntimeError(
                "Reached maximum network depth (stop value). Unable to achieve min_k."
            )

        masked = street(
            gdf=gdf,
            low=start,
            high=start + spread,
            max_length=max_length,
            seed=seed,
            padding=padding,
        )
        masked_k = analysis.k_anonymity(
            gdf, masked, population_gdf=population_gdf, population_column=population_column
        )

        k_sat = analysis.k_satisfaction(masked_k, min_k=min_k)

        if k_sat >= suppression:
            masked_k = tools.suppress(masked_k, min_k=min_k)

        start += increment

    return masked_k

Location Swapping

locationswap(gdf, low, high, address, seed=None, snap_to_streets=False)

Applies location swapping to a GeoDataFrame, displacing points to a randomly selected address that is between a minimum and maximum distance away from the original point. While address data is the most common data type used to provide eligible swap locations, other point-based datasets may be used.

Note: If a sensitive point has no address points within range, the point is displaced to (0,0).

Example
from maskmypy import locationswap

masked = locationswap(
    gdf=sensitive_points,
    low=50,
    high=500,
    address=address_points
)

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrame containing sensitive points.

required
low float

Minimum distance to displace points. Unit must match that of the gdf CRS.

required
high float

Maximum displacement to displace points. Unit must match that of the gdf CRS.

required
address GeoDataFrame

GeoDataFrame containing points that sensitive locations may be swapped to. While addresses are most common, other point-based data may be used as well.

required
seed int

Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined.

None
snap_to_streets bool

If True, points are snapped to the nearest node on the OSM street network after masking. This can reduce the chance of false-attribution.

False
Source code in maskmypy/masks/locationswap.py
def locationswap(
    gdf: GeoDataFrame,
    low: float,
    high: float,
    address: GeoDataFrame,
    seed: int = None,
    snap_to_streets: bool = False,
):
    """
    Applies location swapping to a GeoDataFrame, displacing points to a randomly selected address
    that is between a minimum and maximum distance away from the original point. While address
    data is the most common data type used to provide eligible swap locations, other point-based
    datasets may be used.

    Note: If a sensitive point has no address points within range, the point is displaced to (0,0).

    Example
    -------
    ```python
    from maskmypy import locationswap

    masked = locationswap(
        gdf=sensitive_points,
        low=50,
        high=500,
        address=address_points
    )
    ```

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame containing sensitive points.
    low : float
        Minimum distance to displace points. Unit must match that of the `gdf` CRS.
    high : float
        Maximum displacement to displace points. Unit must match that of the `gdf` CRS.
    address : GeoDataFrame
        GeoDataFrame containing points that sensitive locations may be swapped to.
        While addresses are most common, other point-based data may be used as well.
    seed : int
        Used to seed the random number generator so that masked datasets are reproducible.
        Randomly generated if left undefined.
    snap_to_streets : bool
        If True, points are snapped to the nearest node on the OSM street network after masking.
        This can reduce the chance of false-attribution.
    """

    _gdf = gdf.copy()
    _validate_locationswap(_gdf, low, high, address)

    seed = tools.gen_seed() if not seed else seed

    args = locals()
    del args["snap_to_streets"]
    del args["gdf"]

    mask = _LocationSwap(**args)
    masked_gdf = mask.run()

    if mask._unmasked_points:
        masked_gdf = tools._mark_unmasked_points(gdf, masked_gdf)

    if snap_to_streets:
        masked_gdf = tools.snap_to_streets(masked_gdf)

    return masked_gdf

Voronoi Masking

voronoi(gdf, snap_to_streets=False)

Apply voronoi masking to a GeoDataFrame, displacing points to the nearest edges of a vornoi diagram. Note: because voronoi masking lacks any level of randomization, snapping to streets is recommended for this mask to provide another level of obfuscation.

Example
from maskmypy import voronoi

masked = voronoi(
    gdf=sensitive_points,
    snap_to_streets=True
)

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrame containing sensitive points.

required
snap_to_streets bool

If True, points are snapped to the nearest node on the OSM street network after masking. This can reduce the chance of false-attribution.

False

Returns:

Type Description
GeoDataFrame

A GeoDataFrame containing masked points.

Source code in maskmypy/masks/voronoi.py
def voronoi(gdf: GeoDataFrame, snap_to_streets: bool = False) -> GeoDataFrame:
    """
    Apply voronoi masking to a GeoDataFrame, displacing points to the nearest edges of a vornoi
    diagram. Note: because voronoi masking lacks any level of randomization, snapping to streets
    is recommended for this mask to provide another level of obfuscation.

    Example
    -------
    ```python
    from maskmypy import voronoi

    masked = voronoi(
        gdf=sensitive_points,
        snap_to_streets=True
    )
    ```

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame containing sensitive points.
    snap_to_streets : bool
        If True, points are snapped to the nearest node on the OSM street network after masking.
        This can reduce the chance of false-attribution.

    Returns
    -------
    GeoDataFrame
        A GeoDataFrame containing masked points.
    """
    _gdf = gdf.copy()
    _validate_voronoi(gdf)

    args = locals()
    del args["snap_to_streets"]
    del args["gdf"]

    masked_gdf = _Voronoi(**args).run()

    if snap_to_streets:
        masked_gdf = tools.snap_to_streets(masked_gdf)

    return masked_gdf