Analysis
maskmypy.analysis
¶
central_drift(sensitive_gdf, candidate_gdf)
¶
Calculates how far the centroid of the point pattern has been displaced due to masking. Higher central drift indicates more information loss.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sensitive_gdf |
GeoDataFrame
|
A GeoDataFrame containing sensitive points prior to masking. |
required |
candidate_gdf |
GeoDataFrame
|
A GeoDataFrame containing masked points. |
required |
Returns:
Type | Description |
---|---|
float
|
The central drift, with units equal to the CRS of the |
Source code in maskmypy/analysis.py
displacement(sensitive_gdf, candidate_gdf, col='_distance')
¶
Adds a column to the candidate_gdf
containing the distance between each masked point
and its original, unmasked location (sensitive_gdf
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sensitive_gdf |
GeoDataFrame
|
A GeoDataFrame containing sensitive points prior to masking. |
required |
candidate_gdf |
GeoDataFrame
|
A GeoDataFrame containing masked points. |
required |
col |
str
|
Name of the displacement distance column to add to |
'_distance'
|
Returns:
Type | Description |
---|---|
GeoDataFrame
|
The |
Source code in maskmypy/analysis.py
evaluate(sensitive_gdf, candidate_gdf, population_gdf=None, population_column='pop', skip_slow=True)
¶
Evaluate the privacy protection and information loss of a masked dataset (candidate_gdf
)
compared to the unmasked sensitive dataset (sensitive_gdf
). This is a convenience function
that automatically runs many of the analysis tools that MaskMyPy offers, returning a simple
dictionary of results. Note that privacy metrics require a population_gdf
to be provided.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sensitive_gdf |
GeoDataFrame
|
A GeoDataFrame containing sensitive points prior to masking. |
required |
candidate_gdf |
GeoDataFrame
|
A GeoDataFrame containing masked points to be evaluated. |
required |
population_gdf |
GeoDataFrame
|
A GeoDataFrame containing either address points or polygons with a population column
(see |
None
|
population_column |
str
|
If a polygon-based |
'pop'
|
skip_slow |
bool
|
If True, skips analyses that are known to be slow. Currently, this only includes the root-mean-square error of Ripley's K results between the masked and unmasked data. |
True
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing evaluation results. |
Source code in maskmypy/analysis.py
graph_ripleyresult(result, subtitle=None)
¶
Generate a graph depicting a given KtestResult, such as would be generated from using
maskmypy.analysis.ripleys_k()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
result |
KtestResult
|
The KtestResult tuple from applying |
required |
subtitle |
str
|
A subtitle to add to the graph. |
None
|
Returns:
Type | Description |
---|---|
Figure
|
A matplotlib.figure.Figure object. |
Source code in maskmypy/analysis.py
graph_ripleyresults(sensitive_result, candidate_result, subtitle=None)
¶
Generate a graph depicting two KtestResults, such as would be generated from using
maskmypy.analysis.ripleys_k()
.
Similar to maskmypy.analysis.graph_ripleyresult()
except this function graphs both
the sensitive and candidate results, allowing for visual comparison of clustering and dispersion
between the two.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sensitive_result |
KtestResult
|
The KtestResult tuple from applying |
required |
candidate_result |
KtestResult
|
The KtestResult tuple from applying |
required |
subtitle |
str
|
A subtitle to add to the graph. |
None
|
Returns:
Type | Description |
---|---|
Figure
|
A matplotlib.figure.Figure object. |
Source code in maskmypy/analysis.py
k_anonymity(sensitive_gdf, candidate_gdf, population_gdf, population_column='pop')
¶
Adds a column to the candidate_gdf
containing the spatial k-anonymity value of each
masked point.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sensitive_gdf |
GeoDataFrame
|
A GeoDataFrame containing sensitive points prior to masking. |
required |
candidate_gdf |
GeoDataFrame
|
A GeoDataFrame containing masked points. |
required |
population_gdf |
GeoDataFrame
|
A GeoDataFrame containing either address points or polygons with a population column
(see |
required |
population_column |
str
|
If a polygon-based |
'pop'
|
Returns:
Type | Description |
---|---|
GeoDataFrame
|
The |
Source code in maskmypy/analysis.py
k_satisfaction(gdf, min_k, col='k_anonymity')
¶
For a masked GeoDataFrame containing k-anonymity values, calculate the percentage of
points that are equal to or greater than (i.e. satisfy) a given k-anonymity threshold (min_k
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gdf |
GeoDataFrame
|
A GeoDataFrame containing k-anonymity values. |
required |
min_k |
int
|
The minimum k-anonymity that must be satisfied. |
required |
col |
str
|
Name of the column containing k-anonymity values. |
'k_anonymity'
|
Returns:
Type | Description |
---|---|
float
|
A percentage of points in the GeoDataFrame that satisfy |
Source code in maskmypy/analysis.py
map_displacement(sensitive_gdf, candidate_gdf, filename=None, context_gdf=None)
¶
Generate a map showing the displacement of each masked point from its original location.
Requires the contextily
package.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sensitive_gdf |
GeoDataFrame
|
A GeoDataFrame containing sensitive points prior to masking. |
required |
candidate_gdf |
GeoDataFrame
|
A GeoDataFrame containing masked points. |
required |
filename |
str
|
If specified, saves the map to the filesystem. |
None
|
context_gdf |
GeoDataFrame
|
A GeoDataFrame containing contextual data to be added to the map, such as address points, administrative boundaries, etc. |
None
|
Returns:
Type | Description |
---|---|
pyplot
|
A pyplot object containing the mapped data. |
Source code in maskmypy/analysis.py
nnd(gdf)
¶
Calculate the minimum, maximum, and mean nearest neighbor distance for a given GeoDataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gdf |
GeoDataFrame
|
A GeoDataFrame containing points. |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing the minimum, maximum, and mean nearest neighbor distance. |
Source code in maskmypy/analysis.py
nnd_delta(sensitive_gdf, candidate_gdf)
¶
Calculate the difference between minimum, maximum, and mean nearest neighbor distances
before (sensitive_gdf
) and after (candidate_gdf
) masking. Higher values indicate
greater information loss due to masking.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sensitive_gdf |
GeoDataFrame
|
A GeoDataFrame containing sensitive points prior to masking. |
required |
candidate_gdf |
GeoDataFrame
|
A GeoDataFrame containing masked points. |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary describing deltas in nearest neighbor distance before and after masking. |
Source code in maskmypy/analysis.py
ripley_rmse(sensitive_result, candidate_result)
¶
Calculates the root-mean-square error between the Ripley's K-test results of unmasked and masked data. As the goal of geographic masking is to reduce information loss, the actual amount of clustering in masked data is unimportant; what matters is that the clustering or dispersion of the masked data resembles that of the original, sensitive data. By comparing the RMSE of k-test results, we can reduce this deviation to a single figure, which is useful for quickly comparing how multiple masks perform.
Lower RMSE values indicate less information loss due to masking, whereas higher values indicate greater information loss due to masking.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sensitive_result |
KtestResult
|
The KtestResult tuple from applying |
required |
candidate_result |
KtestResult
|
The KtestResult tuple from applying |
required |
Returns:
Type | Description |
---|---|
float
|
The root-mean-square error between the two k-test results. |
Source code in maskmypy/analysis.py
ripleys_k(gdf, max_dist=None, min_dist=None, steps=10, simulations=99)
¶
Performs Ripley's K clustering analysis on a GeoDataFrame. This evaluates clustering across a range of spatial scales.
See maskmypy.analysis.ripley_rmse()
, maskmypy.analysis.graph_ripleyresult()
, and
maskmypy.analysis.graph_ripleyresults()
for functions that process/visualize the results
of this function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gdf |
GeoDataFrame
|
GeoDataFrame to analyse. |
required |
max_dist |
float
|
The largest distance band used for cluster analysis. If |
None
|
min_dist |
float
|
The smallest distance band used for cluster analysis. If |
None
|
steps |
int
|
The number of equally spaced intervals between the minimum and maximum distance bands to analyze clustering on. |
10
|
simulations |
int
|
The number of simulations to perform. |
99
|
Returns:
Type | Description |
---|---|
KtestResult
|
A named tuple that contains |
Source code in maskmypy/analysis.py
summarize_displacement(gdf, col='_distance')
¶
For a masked GeoDataFrame containing displacement distances, calculate the minimum, maximum, median, and mean displacement distance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gdf |
GeoDataFrame
|
A GeoDataFrame containing displacement distance values. |
required |
col |
str
|
Name of the column containing displacement distance values. |
'_distance'
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing summary displacement distance statistics. |
Source code in maskmypy/analysis.py
summarize_k(gdf, col='k_anonymity')
¶
For a masked GeoDataFrame containing k-anonymity values, calculate the minimum, maximum, median, and mean k-anonymity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gdf |
GeoDataFrame
|
A GeoDataFrame containing k-anonymity values. |
required |
col |
str
|
Name of the column containing k-anonymity values. |
'k_anonymity'
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing summary k-anonymity statistics. |