Analysis

`maskmypy.analysis` ¶

`central_drift(sensitive_gdf, candidate_gdf)` ¶

Calculates how far the centroid of the point pattern has been displaced due to masking. Higher central drift indicates more information loss.

Parameters:

Name	Type	Description	Default
`sensitive_gdf`	`GeoDataFrame`	A GeoDataFrame containing sensitive points prior to masking.	required
`candidate_gdf`	`GeoDataFrame`	A GeoDataFrame containing masked points.	required

Returns:

Type	Description
`float`	The central drift, with units equal to the CRS of the `sensitive_gdf`.

Source code in maskmypy/analysis.py

def central_drift(sensitive_gdf: GeoDataFrame, candidate_gdf: GeoDataFrame) -> float:
    """
    Calculates how far the centroid of the point pattern has been displaced due to masking.
    Higher central drift indicates more information loss.

    Parameters
    ----------
    sensitive_gdf : GeoDataFrame
        A GeoDataFrame containing sensitive points prior to masking.
    candidate_gdf : GeoDataFrame
        A GeoDataFrame containing masked points.

    Returns
    -------
    float
        The central drift, with units equal to the CRS of the `sensitive_gdf`.
    """
    centroid_a = sensitive_gdf.dissolve().centroid
    centroid_b = candidate_gdf.dissolve().centroid
    return round(float(centroid_a.distance(centroid_b).iloc[0]), 6)

`displacement(sensitive_gdf, candidate_gdf, col='_distance')` ¶

Adds a column to the candidate_gdf containing the distance between each masked point and its original, unmasked location (sensitive_gdf).

Parameters:

Name	Type	Description	Default
`sensitive_gdf`	`GeoDataFrame`	A GeoDataFrame containing sensitive points prior to masking.	required
`candidate_gdf`	`GeoDataFrame`	A GeoDataFrame containing masked points.	required
`col`	`str`	Name of the displacement distance column to add to `candidate_gdf`.	`'_distance'`

Returns:

Type	Description
`GeoDataFrame`	The `candidate_gdf` with an additional column describing displacement distance.

Source code in maskmypy/analysis.py

def displacement(
    sensitive_gdf: GeoDataFrame, candidate_gdf: GeoDataFrame, col: str = "_distance"
) -> GeoDataFrame:
    """
    Adds a column to the `candidate_gdf` containing the distance between each masked point
    and its original, unmasked location (`sensitive_gdf`).

    Parameters
    ----------
    sensitive_gdf : GeoDataFrame
        A GeoDataFrame containing sensitive points prior to masking.
    candidate_gdf : GeoDataFrame
        A GeoDataFrame containing masked points.
    col : str
        Name of the displacement distance column to add to `candidate_gdf`.

    Returns
    -------
    GeoDataFrame
        The `candidate_gdf` with an additional column describing displacement distance.
    """
    candidate_gdf = candidate_gdf.copy()
    candidate_gdf[col] = candidate_gdf.geometry.distance(sensitive_gdf.geometry)
    return candidate_gdf

`evaluate(sensitive_gdf, candidate_gdf, population_gdf=None, population_column='pop', skip_slow=True)` ¶

Evaluate the privacy protection and information loss of a masked dataset (candidate_gdf) compared to the unmasked sensitive dataset (sensitive_gdf). This is a convenience function that automatically runs many of the analysis tools that MaskMyPy offers, returning a simple dictionary of results. Note that privacy metrics require a population_gdf to be provided.

Parameters:

Name	Type	Description	Default
`sensitive_gdf`	`GeoDataFrame`	A GeoDataFrame containing sensitive points prior to masking.	required
`candidate_gdf`	`GeoDataFrame`	A GeoDataFrame containing masked points to be evaluated.	required
`population_gdf`	`GeoDataFrame`	A GeoDataFrame containing either address points or polygons with a population column (see `population_column`). Used to calculate k-anonymity metrics.	`None`
`population_column`	`str`	If a polygon-based `population_gdf` is provided, the name of the column containing population counts.	`'pop'`
`skip_slow`	`bool`	If True, skips analyses that are known to be slow. Currently, this only includes the root-mean-square error of Ripley's K results between the masked and unmasked data.	`True`

Returns:

Type	Description
`dict`	A dictionary containing evaluation results.

Source code in maskmypy/analysis.py

def evaluate(
    sensitive_gdf: GeoDataFrame,
    candidate_gdf: GeoDataFrame,
    population_gdf: GeoDataFrame = None,
    population_column: str = "pop",
    skip_slow: bool = True,
) -> dict:
    """
    Evaluate the privacy protection and information loss of a masked dataset (`candidate_gdf`)
    compared to the unmasked sensitive dataset (`sensitive_gdf`). This is a convenience function
    that automatically runs many of the analysis tools that MaskMyPy offers, returning a simple
    dictionary of results. Note that privacy metrics require a `population_gdf` to be provided.

    Parameters
    ----------
    sensitive_gdf : GeoDataFrame
        A GeoDataFrame containing sensitive points prior to masking.
    candidate_gdf : GeoDataFrame
        A GeoDataFrame containing masked points to be evaluated.
    population_gdf : GeoDataFrame
        A GeoDataFrame containing either address points or polygons with a population column
        (see `population_column`). Used to calculate k-anonymity metrics.
    population_column : str
        If a polygon-based `population_gdf` is provided, the name of the column containing
        population counts.
    skip_slow : bool
        If True, skips analyses that are known to be slow. Currently, this only includes the
        root-mean-square error of Ripley's K results between the masked and unmasked data.

    Returns
    -------
    dict
        A dictionary containing evaluation results.
    """
    stats = {}

    # Information Loss
    stats["central_drift"] = central_drift(
        sensitive_gdf=sensitive_gdf, candidate_gdf=candidate_gdf
    )
    stats.update(
        summarize_displacement(
            displacement(
                sensitive_gdf=sensitive_gdf,
                candidate_gdf=candidate_gdf,
            )
        )
    )
    stats.update(nnd_delta(sensitive_gdf=sensitive_gdf, candidate_gdf=candidate_gdf))
    if not skip_slow:
        stats["ripley_rmse"] = ripley_rmse(ripleys_k(sensitive_gdf), ripleys_k(candidate_gdf))

    # Privacy
    if isinstance(population_gdf, GeoDataFrame):
        k_gdf = k_anonymity(
            sensitive_gdf=sensitive_gdf,
            candidate_gdf=candidate_gdf,
            population_gdf=population_gdf,
            population_column=population_column,
        )
        stats.update(summarize_k(k_gdf))
        stats["k_satisfaction_5"] = k_satisfaction(k_gdf, 5)
        stats["k_satisfaction_25"] = k_satisfaction(k_gdf, 25)
        stats["k_satisfaction_50"] = k_satisfaction(k_gdf, 50)
    return stats

`graph_ripleyresult(result, subtitle=None)` ¶

Generate a graph depicting a given KtestResult, such as would be generated from using maskmypy.analysis.ripleys_k().

Parameters:

Name	Type	Description	Default
`result`	`KtestResult`	The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on a given layer.	required
`subtitle`	`str`	A subtitle to add to the graph.	`None`

Returns:

Type	Description
`Figure`	A matplotlib.figure.Figure object.

Source code in maskmypy/analysis.py

def graph_ripleyresult(result: KtestResult, subtitle: str = None) -> Figure:
    """
    Generate a graph depicting a given KtestResult, such as would be generated from using
    `maskmypy.analysis.ripleys_k()`.

    Parameters
    ----------
    result : KtestResult
        The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on a given layer.
    subtitle : str
        A subtitle to add to the graph.

    Returns
    -------
    Figure
        A matplotlib.figure.Figure object.
    """
    bounds = _bounds_from_ripleyresult(result)
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(result.support, bounds, color="#303030", label="Upper/Lower Bounds", alpha=0.25)
    ax.plot(result.support, result.statistic, color="#1f77b4", label="Observed K")
    ax.scatter(result.support, result.statistic, c="#1f77b4")
    ax.set_xlabel("Distance")
    ax.set_ylabel("K Function")
    ax.set_title(subtitle)
    _legend_deduped_labels(ax)
    fig.suptitle("K Function Plot")
    return fig

`graph_ripleyresults(sensitive_result, candidate_result, subtitle=None)` ¶

Generate a graph depicting two KtestResults, such as would be generated from using maskmypy.analysis.ripleys_k().

Similar to maskmypy.analysis.graph_ripleyresult() except this function graphs both the sensitive and candidate results, allowing for visual comparison of clustering and dispersion between the two.

Parameters:

Name	Type	Description	Default
`sensitive_result`	`KtestResult`	The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on the sensitive layer.	required
`candidate_result`	`KtestResult`	The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on a masked layer.	required
`subtitle`	`str`	A subtitle to add to the graph.	`None`

Returns:

Type	Description
`Figure`	A matplotlib.figure.Figure object.

Source code in maskmypy/analysis.py

def graph_ripleyresults(
    sensitive_result: KtestResult,
    candidate_result: KtestResult,
    subtitle: str = None,
) -> Figure:
    """
    Generate a graph depicting two KtestResults, such as would be generated from using
    `maskmypy.analysis.ripleys_k()`.

    Similar to `maskmypy.analysis.graph_ripleyresult()` except this function graphs both
    the sensitive and candidate results, allowing for visual comparison of clustering and dispersion
    between the two.

    Parameters
    ----------
    sensitive_result : KtestResult
        The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on the sensitive layer.
    candidate_result : KtestResult
        The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on a masked layer.
    subtitle : str
        A subtitle to add to the graph.

    Returns
    -------
    Figure
        A matplotlib.figure.Figure object.
    """
    bounds = _bounds_from_ripleyresult(sensitive_result)
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(
        sensitive_result.support,
        bounds,
        color="#ff7f0e",
        label="Sensitive Upper/Lower Bounds",
        alpha=0.35,
    )
    ax.plot(
        candidate_result.support,
        bounds,
        color="#1f77b4",
        label="Candidate Upper/Lower Bounds",
        alpha=0.35,
    )
    ax.plot(
        sensitive_result.support,
        sensitive_result.statistic,
        color="#ff7f0e",
        label="Sensitive Statistic",
    )
    ax.plot(
        candidate_result.support,
        candidate_result.statistic,
        color="#1f77b4",
        label="Candidate Statistic",
    )
    ax.scatter(sensitive_result.support, sensitive_result.statistic, zorder=6, c="#ff7f0e")
    ax.scatter(candidate_result.support, candidate_result.statistic, zorder=5, c="#1f77b4")
    ax.set_title(subtitle)
    ax.set_xlabel("Distance")
    ax.set_ylabel("K Function")
    _legend_deduped_labels(ax)
    fig.suptitle("K Function Result Comparison")
    return fig

`k_anonymity(sensitive_gdf, candidate_gdf, population_gdf, population_column='pop')` ¶

Adds a column to the candidate_gdf containing the spatial k-anonymity value of each masked point.

Parameters:

Name	Type	Description	Default
`sensitive_gdf`	`GeoDataFrame`	A GeoDataFrame containing sensitive points prior to masking.	required
`candidate_gdf`	`GeoDataFrame`	A GeoDataFrame containing masked points.	required
`population_gdf`	`GeoDataFrame`	A GeoDataFrame containing either address points or polygons with a population column (see `population_column`). Used to calculate k-anonymity metrics. Note that address points tend to provide more accurate results.	required
`population_column`	`str`	If a polygon-based `population_gdf` is provided, the name of the column containing population counts.	`'pop'`

Returns:

Type	Description
`GeoDataFrame`	The `candidate_gdf` with an additional column describing k-anonymity.

Source code in maskmypy/analysis.py

def k_anonymity(
    sensitive_gdf: GeoDataFrame,
    candidate_gdf: GeoDataFrame,
    population_gdf: GeoDataFrame,
    population_column: str = "pop",
) -> GeoDataFrame:
    """
    Adds a column to the `candidate_gdf` containing the spatial k-anonymity value of each
    masked point.

    Parameters
    ----------
    sensitive_gdf : GeoDataFrame
        A GeoDataFrame containing sensitive points prior to masking.
    candidate_gdf : GeoDataFrame
        A GeoDataFrame containing masked points.
    population_gdf : GeoDataFrame
        A GeoDataFrame containing either address points or polygons with a population column
        (see `population_column`). Used to calculate k-anonymity metrics. Note that
        address points tend to provide more accurate results.
    population_column : str
        If a polygon-based `population_gdf` is provided, the name of the column containing
        population counts.

    Returns
    -------
    GeoDataFrame
        The `candidate_gdf` with an additional column describing k-anonymity.
    """
    if tools._validate_geom_type(population_gdf, "Point"):
        k_gdf = _calculate_k(sensitive_gdf, candidate_gdf, population_gdf)
    elif tools._validate_geom_type(population_gdf, "Polygon", "MultiPolygon"):
        if population_column not in population_gdf:
            raise ValueError(
                f"Cannot find population column {population_column} in population_gdf"
            )
        k_gdf = _estimate_k(sensitive_gdf, candidate_gdf, population_gdf, population_column)
    else:
        raise ValueError("population_gdf must include either Points or Polygons/MultiPolygons.")
    return k_gdf

`k_satisfaction(gdf, min_k, col='k_anonymity')` ¶

For a masked GeoDataFrame containing k-anonymity values, calculate the percentage of points that are equal to or greater than (i.e. satisfy) a given k-anonymity threshold (min_k).

Parameters:

Name	Type	Description	Default
`gdf`	`GeoDataFrame`	A GeoDataFrame containing k-anonymity values.	required
`min_k`	`int`	The minimum k-anonymity that must be satisfied.	required
`col`	`str`	Name of the column containing k-anonymity values.	`'k_anonymity'`

Returns:

Type	Description
`float`	A percentage of points in the GeoDataFrame that satisfy `min_k`.

Source code in maskmypy/analysis.py

def k_satisfaction(gdf: GeoDataFrame, min_k: int, col: str = "k_anonymity") -> float:
    """
    For a masked GeoDataFrame containing k-anonymity values, calculate the percentage of
    points that are equal to or greater than (i.e. satisfy) a given k-anonymity threshold (`min_k`).

    Parameters
    ----------
    gdf : GeoDataFrame
        A GeoDataFrame containing k-anonymity values.
    min_k : int
        The minimum k-anonymity that must be satisfied.
    col : str
        Name of the column containing k-anonymity values.

    Returns
    -------
    float
        A percentage of points in the GeoDataFrame that satisfy `min_k`.
    """
    return round(gdf.loc[gdf[col] >= min_k, col].count() / gdf[col].count(), 3)

`map_displacement(sensitive_gdf, candidate_gdf, filename=None, context_gdf=None)` ¶

Generate a map showing the displacement of each masked point from its original location. Requires the contextily package.

Parameters:

Name	Type	Description	Default
`sensitive_gdf`	`GeoDataFrame`	A GeoDataFrame containing sensitive points prior to masking.	required
`candidate_gdf`	`GeoDataFrame`	A GeoDataFrame containing masked points.	required
`filename`	`str`	If specified, saves the map to the filesystem.	`None`
`context_gdf`	`GeoDataFrame`	A GeoDataFrame containing contextual data to be added to the map, such as address points, administrative boundaries, etc.	`None`

Returns:

Type	Description
`pyplot`	A pyplot object containing the mapped data.

Source code in maskmypy/analysis.py

def map_displacement(
    sensitive_gdf: GeoDataFrame,
    candidate_gdf: GeoDataFrame,
    filename: str = None,
    context_gdf: GeoDataFrame = None,
) -> plt:
    """
    Generate a map showing the displacement of each masked point from its original location.
    Requires the `contextily` package.

    Parameters
    ----------
    sensitive_gdf : GeoDataFrame
        A GeoDataFrame containing sensitive points prior to masking.
    candidate_gdf : GeoDataFrame
        A GeoDataFrame containing masked points.
    filename : str
        If specified, saves the map to the filesystem.
    context_gdf : GeoDataFrame
        A GeoDataFrame containing contextual data to be added to the map, such as address points,
        administrative boundaries, etc.

    Returns
    -------
    matplotlib.pyplot
        A pyplot object containing the mapped data.
    """
    import contextily as ctx

    lines = sensitive_gdf.copy()
    lines = lines.join(candidate_gdf, how="left", rsuffix="_masked")
    lines.geometry = lines.apply(
        lambda x: LineString([x["geometry"], x["geometry_masked"]]), axis=1
    )
    ax = lines.plot(color="black", zorder=2, linewidth=1, figsize=[8, 8])
    ax = sensitive_gdf.plot(ax=ax, color="red", zorder=3, markersize=6)
    ax = candidate_gdf.plot(ax=ax, color="blue", zorder=4, markersize=6)
    if isinstance(context_gdf, GeoDataFrame):
        ax = context_gdf.plot(ax=ax, color="grey", zorder=1, markersize=3)

    ctx.add_basemap(ax, crs=sensitive_gdf.crs, source=ctx.providers.OpenStreetMap.Mapnik)
    plt.title("Displacement Distances", fontsize=16)
    plt.figtext(
        0.5,
        0.025,
        "Sensitive points (red), Masked points (blue). \n KEEP CONFIDENTIAL",
        wrap=True,
        horizontalalignment="center",
        fontsize=12,
    )
    if filename:
        plt.savefig(filename)

    return plt

`nnd(gdf)` ¶

Calculate the minimum, maximum, and mean nearest neighbor distance for a given GeoDataFrame.

Parameters:

Name	Type	Description	Default
`gdf`	`GeoDataFrame`	A GeoDataFrame containing points.	required

Returns:

Type	Description
`dict`	A dictionary containing the minimum, maximum, and mean nearest neighbor distance.

Source code in maskmypy/analysis.py

def nnd(gdf: GeoDataFrame) -> dict:
    """
    Calculate the minimum, maximum, and mean nearest neighbor distance for a given GeoDataFrame.

    Parameters
    ----------
    gdf : GeoDataFrame
        A GeoDataFrame containing points.

    Returns
    -------
    dict
        A dictionary containing the minimum, maximum, and mean nearest neighbor distance.
    """
    pp = _gdf_to_pointpattern(gdf)
    return {"nnd_min": pp.min_nnd, "nnd_max": pp.max_nnd, "nnd_mean": pp.mean_nnd}

`nnd_delta(sensitive_gdf, candidate_gdf)` ¶

Calculate the difference between minimum, maximum, and mean nearest neighbor distances before (sensitive_gdf) and after (candidate_gdf) masking. Higher values indicate greater information loss due to masking.

Parameters:

Name	Type	Description	Default
`sensitive_gdf`	`GeoDataFrame`	A GeoDataFrame containing sensitive points prior to masking.	required
`candidate_gdf`	`GeoDataFrame`	A GeoDataFrame containing masked points.	required

Returns:

Type	Description
`dict`	A dictionary describing deltas in nearest neighbor distance before and after masking.

Source code in maskmypy/analysis.py

def nnd_delta(sensitive_gdf: GeoDataFrame, candidate_gdf: GeoDataFrame) -> dict:
    """
    Calculate the *difference* between minimum, maximum, and mean nearest neighbor distances
    before (`sensitive_gdf`) and after (`candidate_gdf`) masking. Higher values indicate
    greater information loss due to masking.

    Parameters
    ----------
    sensitive_gdf : GeoDataFrame
        A GeoDataFrame containing sensitive points prior to masking.
    candidate_gdf : GeoDataFrame
        A GeoDataFrame containing masked points.

    Returns
    -------
    dict
        A dictionary describing deltas in nearest neighbor distance before and after masking.
    """
    before = nnd(sensitive_gdf)
    after = nnd(candidate_gdf)
    delta = {}
    for key, value in before.items():
        delta.update({f"{key}_delta": round(after[key] - before[key], 6)})
    return delta

`ripley_rmse(sensitive_result, candidate_result)` ¶

Calculates the root-mean-square error between the Ripley's K-test results of unmasked and masked data. As the goal of geographic masking is to reduce information loss, the actual amount of clustering in masked data is unimportant; what matters is that the clustering or dispersion of the masked data resembles that of the original, sensitive data. By comparing the RMSE of k-test results, we can reduce this deviation to a single figure, which is useful for quickly comparing how multiple masks perform.

Lower RMSE values indicate less information loss due to masking, whereas higher values indicate greater information loss due to masking.

Parameters:

Name	Type	Description	Default
`sensitive_result`	`KtestResult`	The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on a sensitive layer.	required
`candidate_result`	`KtestResult`	The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on a masked layer.	required

Returns:

Type	Description
`float`	The root-mean-square error between the two k-test results.

Source code in maskmypy/analysis.py

def ripley_rmse(sensitive_result: KtestResult, candidate_result: KtestResult) -> float:
    """
    Calculates the root-mean-square error between the Ripley's K-test results of unmasked and
    masked data. As the goal of geographic masking is to reduce information loss, the actual
    amount of clustering in masked data is unimportant; what matters is that the clustering
    or dispersion of the masked data resembles that of the original, sensitive data. By comparing
    the RMSE of k-test results, we can reduce this deviation to a single figure, which is useful
    for quickly comparing how multiple masks perform.

    Lower RMSE values indicate less information loss due to masking, whereas higher values
    indicate greater information loss due to masking.

    Parameters
    ----------
    sensitive_result : KtestResult
        The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on a sensitive layer.
    candidate_result : KtestResult
        The KtestResult tuple from applying `maskmypy.analysis.ripleys_k()` on a masked layer.

    Returns
    -------
    float
        The root-mean-square error between the two k-test results.
    """
    step_count = len(candidate_result.statistic)
    residuals = []
    for i in range(step_count):
        residual = candidate_result.statistic[i] - sensitive_result.statistic[i]
        residuals.append(residual)
    return round(sqrt(square(residuals).mean()), 3)

`ripleys_k(gdf, max_dist=None, min_dist=None, steps=10, simulations=99)` ¶

Performs Ripley's K clustering analysis on a GeoDataFrame. This evaluates clustering across a range of spatial scales.

See maskmypy.analysis.ripley_rmse(), maskmypy.analysis.graph_ripleyresult(), and maskmypy.analysis.graph_ripleyresults() for functions that process/visualize the results of this function.

Parameters:

Name	Type	Description	Default
`gdf`	`GeoDataFrame`	GeoDataFrame to analyse.	required
`max_dist`	`float`	The largest distance band used for cluster analysis. If `None`, this defaults to one quarter of the smallest side of the bounding box (i.e. Ripleys Rule of Thumb).	`None`
`min_dist`	`float`	The smallest distance band used for cluster analysis. If `None`, this is automatically set to `max_dist / steps`.	`None`
`steps`	`int`	The number of equally spaced intervals between the minimum and maximum distance bands to analyze clustering on.	`10`
`simulations`	`int`	The number of simulations to perform.	`99`

Returns:

Type	Description
`KtestResult`	A named tuple that contains `("support", "statistic", "pvalue", "simulations")`.

Source code in maskmypy/analysis.py

def ripleys_k(
    gdf: GeoDataFrame,
    max_dist: float = None,
    min_dist: float = None,
    steps: int = 10,
    simulations: int = 99,
) -> KtestResult:
    """
    Performs Ripley's K clustering analysis on a GeoDataFrame. This evaluates clustering across a
    range of spatial scales.

    See `maskmypy.analysis.ripley_rmse()`, `maskmypy.analysis.graph_ripleyresult()`, and
    `maskmypy.analysis.graph_ripleyresults()` for functions that process/visualize the results
    of this function.

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrame to analyse.
    max_dist : float
        The largest distance band used for cluster analysis. If `None`, this defaults to one
        quarter of the smallest side of the bounding box (i.e. Ripleys Rule of Thumb).
    min_dist : float
        The smallest distance band used for cluster analysis. If `None`, this is automatically set
        to  `max_dist / steps`.
    steps : int
        The number of equally spaced intervals between the minimum and maximum distance bands
        to analyze clustering on.
    simulations : int
        The number of simulations to perform.

    Returns
    -------
    KtestResult
        A named tuple that contains `("support", "statistic", "pvalue", "simulations")`.
    """
    if not max_dist:
        max_dist = _gdf_to_pointpattern(gdf).rot

    if not min_dist:
        min_dist = max_dist / steps

    k_results = k_test(
        array(list(zip(gdf.geometry.x, gdf.geometry.y))),
        keep_simulations=True,
        support=(min_dist, max_dist, steps),
        n_simulations=simulations,
    )
    return k_results

`summarize_displacement(gdf, col='_distance')` ¶

For a masked GeoDataFrame containing displacement distances, calculate the minimum, maximum, median, and mean displacement distance.

Parameters:

Name	Type	Description	Default
`gdf`	`GeoDataFrame`	A GeoDataFrame containing displacement distance values.	required
`col`	`str`	Name of the column containing displacement distance values.	`'_distance'`

Returns:

Type	Description
`dict`	A dictionary containing summary displacement distance statistics.

Source code in maskmypy/analysis.py

def summarize_displacement(gdf: GeoDataFrame, col: str = "_distance") -> dict:
    """
    For a masked GeoDataFrame containing displacement distances, calculate the minimum, maximum,
    median, and mean displacement distance.

    Parameters
    ----------
    gdf : GeoDataFrame
        A GeoDataFrame containing displacement distance values.
    col : str
        Name of the column containing displacement distance values.

    Returns
    -------
    dict
        A dictionary containing summary displacement distance statistics.
    """
    return {
        "displacement_min": round(float(gdf.loc[:, col].min()), 6),
        "displacement_max": round(float(gdf.loc[:, col].max()), 6),
        "displacement_med": round(float(gdf.loc[:, col].median()), 6),
        "displacement_mean": round(float(gdf.loc[:, col].mean()), 6),
    }

`summarize_k(gdf, col='k_anonymity')` ¶

For a masked GeoDataFrame containing k-anonymity values, calculate the minimum, maximum, median, and mean k-anonymity.

Parameters:

Name	Type	Description	Default
`gdf`	`GeoDataFrame`	A GeoDataFrame containing k-anonymity values.	required
`col`	`str`	Name of the column containing k-anonymity values.	`'k_anonymity'`

Returns:

Type	Description
`dict`	A dictionary containing summary k-anonymity statistics.

Source code in maskmypy/analysis.py

def summarize_k(gdf: GeoDataFrame, col: str = "k_anonymity") -> dict:
    """
    For a masked GeoDataFrame containing k-anonymity values, calculate the minimum, maximum,
    median, and mean k-anonymity.

    Parameters
    ----------
    gdf : GeoDataFrame
        A GeoDataFrame containing k-anonymity values.
    col : str
        Name of the column containing k-anonymity values.

    Returns
    -------
    dict
        A dictionary containing summary k-anonymity statistics.
    """
    return {
        "k_min": int(gdf.loc[:, col].min()),
        "k_max": int(gdf.loc[:, col].max()),
        "k_med": round(float(gdf.loc[:, col].median()), 2),
        "k_mean": round(float(gdf.loc[:, col].mean()), 2),
    }

Analysis

maskmypy.analysis ¶

central_drift(sensitive_gdf, candidate_gdf) ¶

displacement(sensitive_gdf, candidate_gdf, col='_distance') ¶

evaluate(sensitive_gdf, candidate_gdf, population_gdf=None, population_column='pop', skip_slow=True) ¶

graph_ripleyresult(result, subtitle=None) ¶

graph_ripleyresults(sensitive_result, candidate_result, subtitle=None) ¶

k_anonymity(sensitive_gdf, candidate_gdf, population_gdf, population_column='pop') ¶

k_satisfaction(gdf, min_k, col='k_anonymity') ¶

map_displacement(sensitive_gdf, candidate_gdf, filename=None, context_gdf=None) ¶

nnd(gdf) ¶

nnd_delta(sensitive_gdf, candidate_gdf) ¶

ripley_rmse(sensitive_result, candidate_result) ¶

ripleys_k(gdf, max_dist=None, min_dist=None, steps=10, simulations=99) ¶

summarize_displacement(gdf, col='_distance') ¶

summarize_k(gdf, col='k_anonymity') ¶

`maskmypy.analysis` ¶

`central_drift(sensitive_gdf, candidate_gdf)` ¶

`displacement(sensitive_gdf, candidate_gdf, col='_distance')` ¶

`evaluate(sensitive_gdf, candidate_gdf, population_gdf=None, population_column='pop', skip_slow=True)` ¶

`graph_ripleyresult(result, subtitle=None)` ¶

`graph_ripleyresults(sensitive_result, candidate_result, subtitle=None)` ¶

`k_anonymity(sensitive_gdf, candidate_gdf, population_gdf, population_column='pop')` ¶

`k_satisfaction(gdf, min_k, col='k_anonymity')` ¶

`map_displacement(sensitive_gdf, candidate_gdf, filename=None, context_gdf=None)` ¶

`nnd(gdf)` ¶

`nnd_delta(sensitive_gdf, candidate_gdf)` ¶

`ripley_rmse(sensitive_result, candidate_result)` ¶

`ripleys_k(gdf, max_dist=None, min_dist=None, steps=10, simulations=99)` ¶

`summarize_displacement(gdf, col='_distance')` ¶

`summarize_k(gdf, col='k_anonymity')` ¶