Skip to content

Tools

maskmypy.tools

checksum(gdf)

Calculate SHA256 checksum of a GeoDataFrame and return the first 8 characters. Two completely identical GeoDataFrames will always return the exact same value, whereas two similar, but not completely identical GeoDataFrames will return entirely different values.

Parameters:

Name Type Description Default
gdf GeoDataFrame

Any valid GeoDataFrame.

required

Returns:

Type Description
str

The first 8 characters of the SHA256 checksum of the input GeoDataFrame.

Source code in maskmypy/tools.py
def checksum(gdf: GeoDataFrame) -> str:
    """
    Calculate SHA256 checksum of a GeoDataFrame and return the first 8 characters.
    Two completely identical GeoDataFrames will always return the exact same value,
    whereas two similar, but not completely identical GeoDataFrames will return
    entirely different values.

    Parameters
    ----------
    gdf : GeoDataFrame
        Any valid GeoDataFrame.

    Returns
    -------
    str
        The first 8 characters of the SHA256 checksum of the input GeoDataFrame.
    """
    return sha256(bytearray(hash_pandas_object(gdf).values)).hexdigest()[0:8]

gen_rng(seed=None)

Create a seeded numpy default_rng() object.

Parameters:

Name Type Description Default
seed int

An integer used to seed the random number generator. A seed is randomly generated using gen_seed() if one is not provided.

None

Returns:

Type Description
object

numpy.default_rng()

Source code in maskmypy/tools.py
def gen_rng(seed: int = None) -> object:
    """
    Create a seeded numpy default_rng() object.

    Parameters
    ----------
    seed : int
        An integer used to seed the random number generator. A seed is randomly
        generated using gen_seed() if one is not provided.
    Returns
    -------
    object
        numpy.default_rng()
    """
    if not seed:
        seed = gen_seed()
    return random.default_rng(seed=seed)

gen_seed()

Generate a 16-digit random integer to seed random number generators.

Returns:

Type Description
int

A 16 digit random integer.

Source code in maskmypy/tools.py
def gen_seed() -> int:
    """
    Generate a 16-digit random integer to seed random number generators.

    Returns
    -------
    int
        A 16 digit random integer.
    """

    return int(SystemRandom().random() * (10**16))

snap_to_streets(gdf)

Relocates each point of a GeoDataFrame to the nearest node on the OpenStreetMap driving network. Performing this on masked datasets may reduce the chances of false attribution, and may provide an additional layer of obfuscation.

This is not an alternative to masking.

Parameters:

Name Type Description Default
gdf GeoDataFrame

A GeoDataFrame containing point data.

required

Returns:

Type Description
GeoDataFrame

A GeoDataFrame containing points that have been snapped to street nodes.

Source code in maskmypy/tools.py
def snap_to_streets(gdf: GeoDataFrame) -> GeoDataFrame:
    """
    Relocates each point of a GeoDataFrame to the nearest node on the OpenStreetMap driving
    network. Performing this on masked datasets may reduce the chances of false attribution,
    and may provide an additional layer of obfuscation.

    This is *not* an alternative to masking.

    Parameters
    ----------
    gdf : GeoDataFrame
        A GeoDataFrame containing point data.

    Returns
    -------
    GeoDataFrame
        A GeoDataFrame containing points that have been snapped to street nodes.
    """
    snapped_gdf = gdf.copy()
    bbox = gdf.to_crs(epsg=4326).total_bounds
    graph = remove_isolated_nodes(
        graph_from_bbox(
            bbox=(bbox[3], bbox[1], bbox[2], bbox[0]),
            network_type="drive",
            truncate_by_edge=True,
        ),
        warn=False,
    )
    graph = project_graph(graph, to_crs=gdf.crs)
    node_gdf = graph_to_gdfs(graph)[0]

    snapped_gdf[snapped_gdf.geometry.name] = snapped_gdf[snapped_gdf.geometry.name].apply(
        lambda geom: node_gdf.at[nearest_nodes(graph, geom.x, geom.y), node_gdf.geometry.name]
    )

    return snapped_gdf

suppress(gdf, min_k, col='k_anonymity', label=True)

Suppresses points that do not meet a minimum k-anonymity value by displacing them to the mean center of the overall masked point pattern and (optionally) labelling them.

Parameters:

Name Type Description Default
gdf GeoDataFrame

A GeoDataFrame containing point data and a column with k-anonymity values.

required
min_k int

Minimum k-anonymity. Points with a k-anonymity below this value will be suppressed.

required
col str

Name of the column containing k-anonymity values.

'k_anonymity'
label bool

If True, adds a "SUPPRESSED" column and labels suppressed points.

True

Returns:

Type Description
gdf

A GeoDataFrame containing the result of the suppression.

Source code in maskmypy/tools.py
def suppress(gdf, min_k, col: str = "k_anonymity", label: bool = True):
    """
    Suppresses points that do not meet a minimum k-anonymity value by displacing them
    to the mean center of the overall masked point pattern and (optionally) labelling them.

    Parameters
    ----------
    gdf : GeoDataFrame
        A GeoDataFrame containing point data and a column with k-anonymity values.
    min_k : int
        Minimum k-anonymity. Points with a k-anonymity below this value will be suppressed.
    col : str
        Name of the column containing k-anonymity values.
    label : bool
        If True, adds a "SUPPRESSED" column and labels suppressed points.

    Returns
    -------
    gdf
        A GeoDataFrame containing the result of the suppression.
    """
    sgdf = gdf.copy()
    centroid = sgdf.dissolve().centroid[0]
    sgdf.loc[sgdf[col] < min_k, sgdf.geometry.name] = centroid
    if label:
        sgdf.loc[sgdf[col] < min_k, "SUPPRESSED"] = "TRUE"
        sgdf.loc[sgdf[col] >= min_k, "SUPPRESSED"] = "FALSE"
    return sgdf