Skip to content

Masks

Donut Masking

donut(gdf, low, high, container=None, distribution='uniform', seed=None, snap_to_streets=False)

Apply donut masking to a GeoDataFrame, randomly displacing points between a minimum and maximum distance. Advantages of this mask is speed and simplicity, though it does not handle highly varied population densities well.

Example
from maskmypy import donut

masked = donut(
    gdf=sensitive_points,
    min=100,
    max=1000
)

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrame containing sensitive points.

required
low float

Minimum distance to displace points. Unit must match that of the gdf CRS.

required
high float

Maximum displacement to displace points. Unit must match that of the gdf CRS.

required
container GeoDataFrame

A GeoDataFrame containing polygons within which intersecting sensitive points should remain after masking. This works by masking a point, checking if it intersects the same polygon prior to masking, and retrying until it does. Useful for preserving statistical relationships, such as census tract, or to ensure that points are not displaced into impossible locations, such as the ocean. CRS must match that of gdf.

None
distribution str

The distribution used to determine masking distances. uniform provides a flat distribution where any value between the minimum and maximum distance is equally likely to be selected. areal is more likely to select distances that are further away. The gaussian distribution uses a normal distribution, where values towards the middle of the range are most likely to be selected. Note that gaussian distribution has a small chance of selecting values beyond the defined minimum and maximum.

'uniform'
seed int

Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined.

None
snap_to_streets bool

If True, points are snapped to the nearest node on the OSM street network after masking. This can reduce the chance of false-attribution.

False

Returns:

Type Description
GeoDataFrame

A GeoDataFrame containing masked points.

Source code in maskmypy/masks/donut.py
def donut(
    gdf: GeoDataFrame,
    low: float,
    high: float,
    container: GeoDataFrame = None,
    distribution: str = "uniform",
    seed: int = None,
    snap_to_streets: bool = False,
) -> GeoDataFrame:
    """
    Apply donut masking to a GeoDataFrame, randomly displacing points between a minimum and
    maximum distance. Advantages of this mask is speed and simplicity, though it does not
    handle highly varied population densities well.

    Example
    -------
    ```python
    from maskmypy import donut

    masked = donut(
        gdf=sensitive_points,
        min=100,
        max=1000
    )
    ```

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame containing sensitive points.
    low : float
        Minimum distance to displace points. Unit must match that of the `gdf` CRS.
    high : float
        Maximum displacement to displace points. Unit must match that of the `gdf` CRS.
    container : GeoDataFrame
        A  GeoDataFrame containing polygons within which intersecting sensitive points should
        remain after masking. This works by masking a point, checking if it intersects
        the same polygon prior to masking, and retrying until it does. Useful for preserving
        statistical relationships, such as census tract, or to ensure that points are not
        displaced into impossible locations, such as the ocean. CRS must match that of `gdf`.
    distribution : str
        The distribution used to determine masking distances. `uniform` provides
        a flat distribution where any value between the minimum and maximum distance is
        equally likely to be selected. `areal` is more likely to select distances that are
        further away. The `gaussian` distribution uses a normal distribution, where values
        towards the middle of the range are most likely to be selected. Note that gaussian
        distribution has a small chance of selecting values beyond the defined minimum and
        maximum.
    seed : int
        Used to seed the random number generator so that masked datasets are reproducible.
        Randomly generated if left undefined.
    snap_to_streets : bool
        If True, points are snapped to the nearest node on the OSM street network after masking.
        This can reduce the chance of false-attribution.

    Returns
    -------
    GeoDataFrame
        A GeoDataFrame containing masked points.
    """
    _gdf = gdf.copy()
    _validate_donut(_gdf, low, high, container)

    seed = tools.gen_seed() if not seed else seed

    args = locals()
    del args["snap_to_streets"]
    del args["gdf"]

    masked_gdf = _Donut(**args).run()

    if snap_to_streets:
        masked_gdf = tools.snap_to_streets(masked_gdf)

    return masked_gdf

Street Masking

street(gdf, low, high, max_length=1000, seed=None, padding=0.2)

Apply street masking to a GeoDataFrame, displacing points along the OpenStreetMap street network. This helps account for variations in population density, and reduces the likelihood of false attribution as points are always displaced to the street network. Each point is snapped to the nearest node on the network, then displaced along the surround network between low and high nodes away.

Example
from maskmypy import street

masked = street(
    gdf=sensitive_points,
    low=20,
    high=30
)

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrame containing sensitive points.

required
low int

Minimum number of nodes along the OSM street network to traverse.

required
high int

Maximum number of nodes along the OSM street network to traverse.

required
max_length float

When locating the closest node to each point on the street network, MaskMyPy verifies that its immediate neighbours are no more than max_length away, in meters. This prevents extremely large masking distances, such as those caused by long highways.

1000
seed int

Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined.

None
padding float

OSM network data is retrieved based on the bounding box of the sensitive GeoDataFrame. Padding is used to expand this bounding box slightly to reduce unwanted edge-effects. A value of 0.2 would add 20% of the x and y extent to each side of the bounding box.

0.2

Returns:

Type Description
GeoDataFrame

A GeoDataFrame containing masked points.

Source code in maskmypy/masks/street.py
def street(
    gdf: GeoDataFrame,
    low: int,
    high: int,
    max_length: float = 1000,
    seed: int = None,
    padding: float = 0.2,
) -> GeoDataFrame:
    """
    Apply street masking to a GeoDataFrame, displacing points along the OpenStreetMap street
    network. This helps account for variations in population density, and reduces the likelihood
    of false attribution as points are always displaced to the street network. Each point is
    snapped to the nearest node on the network, then displaced along the surround network between
    `low` and `high` nodes away.

    Example
    -------
    ```python
    from maskmypy import street

    masked = street(
        gdf=sensitive_points,
        low=20,
        high=30
    )
    ```

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame containing sensitive points.
    low : int
        Minimum number of nodes along the OSM street network to traverse.
    high : int
        Maximum number of nodes along the OSM street network to traverse.
    max_length : float
        When locating the closest node to each point on the street network, MaskMyPy verifies
        that its immediate neighbours are no more than `max_length` away, in meters. This prevents
        extremely large masking distances, such as those caused by long highways.
    seed : int
        Used to seed the random number generator so that masked datasets are reproducible.
        Randomly generated if left undefined.
    padding : float
        OSM network data is retrieved based on the bounding box of the sensitive GeoDataFrame.
        Padding is used to expand this bounding box slightly to reduce unwanted edge-effects.
        A value of `0.2` would add 20% of the x and y extent to *each side* of the bounding box.

    Returns
    -------
    GeoDataFrame
        A GeoDataFrame containing masked points.
    """
    _gdf = gdf.copy()
    _validate_street(_gdf, low, high)

    seed = tools.gen_seed() if not seed else seed

    args = locals()
    del args["gdf"]

    masked_gdf = _Street(**args).run()

    return masked_gdf

Location Swapping

locationswap(gdf, low, high, address, seed=None, snap_to_streets=False)

Applies location swapping to a GeoDataFrame, displacing points to a randomly selected address that is between a minimum and maximum distance away from the original point. While address data is the most common data type used to provide eligible swap locations, other point-based datasets may be used.

Note: If a sensitive point has no address points within range, the point is displaced to (0,0).

Example
from maskmypy import locationswap

masked = locationswap(
    gdf=sensitive_points,
    low=50,
    high=500,
    address=address_points
)

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrame containing sensitive points.

required
low float

Minimum distance to displace points. Unit must match that of the gdf CRS.

required
high float

Maximum displacement to displace points. Unit must match that of the gdf CRS.

required
address GeoDataFrame

GeoDataFrame containing points that sensitive locations may be swapped to. While addresses are most common, other point-based data may be used as well.

required
seed int

Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined.

None
snap_to_streets bool

If True, points are snapped to the nearest node on the OSM street network after masking. This can reduce the chance of false-attribution.

False
Source code in maskmypy/masks/locationswap.py
def locationswap(
    gdf: GeoDataFrame,
    low: float,
    high: float,
    address: GeoDataFrame,
    seed: int = None,
    snap_to_streets: bool = False,
):
    """
    Applies location swapping to a GeoDataFrame, displacing points to a randomly selected address
    that is between a minimum and maximum distance away from the original point. While address
    data is the most common data type used to provide eligible swap locations, other point-based
    datasets may be used.

    Note: If a sensitive point has no address points within range, the point is displaced to (0,0).

    Example
    -------
    ```python
    from maskmypy import locationswap

    masked = locationswap(
        gdf=sensitive_points,
        low=50,
        high=500,
        address=address_points
    )
    ```

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame containing sensitive points.
    low : float
        Minimum distance to displace points. Unit must match that of the `gdf` CRS.
    high : float
        Maximum displacement to displace points. Unit must match that of the `gdf` CRS.
    address : GeoDataFrame
        GeoDataFrame containing points that sensitive locations may be swapped to.
        While addresses are most common, other point-based data may be used as well.
    seed : int
        Used to seed the random number generator so that masked datasets are reproducible.
        Randomly generated if left undefined.
    snap_to_streets : bool
        If True, points are snapped to the nearest node on the OSM street network after masking.
        This can reduce the chance of false-attribution.
    """

    _gdf = gdf.copy()
    _validate_locationswap(_gdf, low, high, address)

    seed = tools.gen_seed() if not seed else seed

    args = locals()
    del args["snap_to_streets"]
    del args["gdf"]

    mask = _LocationSwap(**args)
    masked_gdf = mask.run()

    if mask._unmasked_points:
        masked_gdf = tools._mark_unmasked_points(gdf, masked_gdf)

    if snap_to_streets:
        masked_gdf = tools.snap_to_streets(masked_gdf)

    return masked_gdf

Voronoi Masking

voronoi(gdf, snap_to_streets=False)

Apply voronoi masking to a GeoDataFrame, displacing points to the nearest edges of a vornoi diagram. Note: because voronoi masking lacks any level of randomization, snapping to streets is recommended for this mask to provide another level of obfuscation.

Example
from maskmypy import voronoi

masked = voronoi(
    gdf=sensitive_points,
    snap_to_streets=True
)

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrame containing sensitive points.

required
snap_to_streets bool

If True, points are snapped to the nearest node on the OSM street network after masking. This can reduce the chance of false-attribution.

False

Returns:

Type Description
GeoDataFrame

A GeoDataFrame containing masked points.

Source code in maskmypy/masks/voronoi.py
def voronoi(gdf: GeoDataFrame, snap_to_streets: bool = False) -> GeoDataFrame:
    """
    Apply voronoi masking to a GeoDataFrame, displacing points to the nearest edges of a vornoi
    diagram. Note: because voronoi masking lacks any level of randomization, snapping to streets
    is recommended for this mask to provide another level of obfuscation.

    Example
    -------
    ```python
    from maskmypy import voronoi

    masked = voronoi(
        gdf=sensitive_points,
        snap_to_streets=True
    )
    ```

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame containing sensitive points.
    snap_to_streets : bool
        If True, points are snapped to the nearest node on the OSM street network after masking.
        This can reduce the chance of false-attribution.

    Returns
    -------
    GeoDataFrame
        A GeoDataFrame containing masked points.
    """
    _gdf = gdf.copy()
    _validate_voronoi(gdf)

    args = locals()
    del args["snap_to_streets"]
    del args["gdf"]

    masked_gdf = _Voronoi(**args).run()

    if snap_to_streets:
        masked_gdf = tools.snap_to_streets(masked_gdf)

    return masked_gdf