Masks
Donut Masking¶
donut(gdf, low, high, container=None, distribution='uniform', seed=None, snap_to_streets=False)
¶
Apply donut masking to a GeoDataFrame, randomly displacing points between a minimum and maximum distance. Advantages of this mask is speed and simplicity, though it does not handle highly varied population densities well.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame
|
GeoDataFrame containing sensitive points. |
required |
low
|
float
|
Minimum distance to displace points. Unit must match that of the |
required |
high
|
float
|
Maximum displacement to displace points. Unit must match that of the |
required |
container
|
GeoDataFrame
|
A GeoDataFrame containing polygons within which intersecting sensitive points should
remain after masking. This works by masking a point, checking if it intersects
the same polygon prior to masking, and retrying until it does. Useful for preserving
statistical relationships, such as census tract, or to ensure that points are not
displaced into impossible locations, such as the ocean. CRS must match that of |
None
|
distribution
|
str
|
The distribution used to determine masking distances. |
'uniform'
|
seed
|
int
|
Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined. |
None
|
snap_to_streets
|
bool
|
If True, points are snapped to the nearest node on the OSM street network after masking. This can reduce the chance of false-attribution. |
False
|
Returns:
| Type | Description |
|---|---|
GeoDataFrame
|
A GeoDataFrame containing masked points. |
Source code in maskmypy/masks/donut.py
Street Masking¶
street(gdf, low, high, max_length=1000, seed=None, padding=0.2)
¶
Apply street masking to a GeoDataFrame, displacing points along the OpenStreetMap street
network. This helps account for variations in population density, and reduces the likelihood
of false attribution as points are always displaced to the street network. Each point is
snapped to the nearest node on the network, then displaced along the surround network between
low and high nodes away.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame
|
GeoDataFrame containing sensitive points. |
required |
low
|
int
|
Minimum number of nodes along the OSM street network to traverse. |
required |
high
|
int
|
Maximum number of nodes along the OSM street network to traverse. |
required |
max_length
|
float
|
When locating the closest node to each point on the street network, MaskMyPy verifies
that its immediate neighbours are no more than |
1000
|
seed
|
int
|
Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined. |
None
|
padding
|
float
|
OSM network data is retrieved based on the bounding box of the sensitive GeoDataFrame.
Padding is used to expand this bounding box slightly to reduce unwanted edge-effects.
A value of |
0.2
|
Returns:
| Type | Description |
|---|---|
GeoDataFrame
|
A GeoDataFrame containing masked points. |
Source code in maskmypy/masks/street.py
street_k(gdf, population_gdf, population_column='pop', min_k=30, start=10, stop=60, spread=2, increment=2, suppression=0.99, max_length=1000, seed=None, padding=0.2)
¶
Iteratively applies street masking to a GeoDataFrame, incrementally increasing the low/high node values until a given k-satisfaction threshold is reached. This provides a much more robust privacy promise, but requires population data.
For instance, if min_k=30 and suppression=0.99, then street masking will be repeated with
progressively higher values until 99% of points have a k-anonymity of at least 30.
Suppressed points are displaced to the center of the point distribution and labeled as such
in a SUPPRESSED column.
Example
from maskmypy import street_k
masked = street(
gdf=sensitive_points,
population_gdf=addresses,
start=20,
spread=5,
min_k=30,
suppression=0.95
)
This will perform street masking starting with street(gdf, low=20, high=25) and slowly increment
values until 95% of points achieve a k-anonymity of at least 30, with the rest being suppressed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame
|
GeoDataFrame containing sensitive points. |
required |
population_gdf
|
GeoDataFrame
|
A GeoDataFrame containing either address points or polygons with a population column
(see |
required |
population_column
|
str
|
If a polygon-based |
'pop'
|
min_k
|
int
|
Points that do not reach this k-anonymity value will be suppressed. |
30
|
start
|
int
|
Initial value of |
10
|
stop
|
int
|
Maximum value of |
60
|
spread
|
int
|
Used to calculate the |
2
|
increment
|
int
|
Amounted incremented in each iteration until |
2
|
suppression
|
float
|
Percent of points that must satisfy |
0.99
|
max_length
|
float
|
When locating the closest node to each point on the street network, MaskMyPy verifies
that its immediate neighbours are no more than |
1000
|
seed
|
int
|
Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined. |
None
|
padding
|
float
|
OSM network data is retrieved based on the bounding box of the sensitive GeoDataFrame.
Padding is used to expand this bounding box slightly to reduce unwanted edge-effects.
A value of |
0.2
|
Returns:
| Type | Description |
|---|---|
GeoDataFrame
|
A GeoDataFrame containing masked points. |
Source code in maskmypy/masks/street.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
Location Swapping¶
locationswap(gdf, low, high, address, seed=None, snap_to_streets=False)
¶
Applies location swapping to a GeoDataFrame, displacing points to a randomly selected address that is between a minimum and maximum distance away from the original point. While address data is the most common data type used to provide eligible swap locations, other point-based datasets may be used.
Note: If a sensitive point has no address points within range, the point is displaced to (0,0).
Example
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame
|
GeoDataFrame containing sensitive points. |
required |
low
|
float
|
Minimum distance to displace points. Unit must match that of the |
required |
high
|
float
|
Maximum displacement to displace points. Unit must match that of the |
required |
address
|
GeoDataFrame
|
GeoDataFrame containing points that sensitive locations may be swapped to. While addresses are most common, other point-based data may be used as well. |
required |
seed
|
int
|
Used to seed the random number generator so that masked datasets are reproducible. Randomly generated if left undefined. |
None
|
snap_to_streets
|
bool
|
If True, points are snapped to the nearest node on the OSM street network after masking. This can reduce the chance of false-attribution. |
False
|
Source code in maskmypy/masks/locationswap.py
Voronoi Masking¶
voronoi(gdf, snap_to_streets=False)
¶
Apply voronoi masking to a GeoDataFrame, displacing points to the nearest edges of a vornoi diagram. Note: because voronoi masking lacks any level of randomization, snapping to streets is recommended for this mask to provide another level of obfuscation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame
|
GeoDataFrame containing sensitive points. |
required |
snap_to_streets
|
bool
|
If True, points are snapped to the nearest node on the OSM street network after masking. This can reduce the chance of false-attribution. |
False
|
Returns:
| Type | Description |
|---|---|
GeoDataFrame
|
A GeoDataFrame containing masked points. |