Skip to content

Atlas

Introduction

The Atlas() class makes it easy to both mask datasets and evaluate new masks. It acts as a type of manager that allows you to quickly test any number of combinations of masks and their associated parameters, automatically performing the evaluation for you and keeping track of the results.

Candidates

When the Atlas executes a given mask, the result is referred to as a 'candidate'. Each candidate is a simple Python dictionary stored in a ordinary list at Atlas.candidates[]. You can also access the candidate list by slicing the Atlas itself, e.g. Atlas[2]

The structure of a candidate is as follows:

{
  mask: str, # Name of the mask callable used to create the candidate
  kwargs: dict, # Dictionary containing the keyword arguments used to create the candidate
  checksum: str, # Checksum of the candidate GeoDataFrame
  stats: { # Dictionary containing statistics describing information loss and privacy protection
    "central_drift": float,
    "displacement_min": float,
    "displacement_max": float,
    "displacement_med": float,
    "displacement_mean": float,
    "nnd_min_delta": float,
    "nnd_max_delta": float,
    "nnd_mean_delta": float,
    "ripley_rmse": float,
    "k_min": int,
    "k_max": int,
    "k_med": float,
    "k_mean": float,
    "k_satisfaction_5": float,
    "k_satisfaction_25": float,
    "k_satisfaction_50": float,
  },
}

Using Custom Masks

The Atlas can utilize custom masking functions passed to Atlas.mask() so long as they meet the following requirements:

  • The first argument is a GeoDataFrame of sensitive points,
  • They return a masked GeoDataFrame in the same CRS as the input,
  • All other arguments are specified as keyword arguments (kwargs),
  • When a seed argument is provided, outputs are reproducible.

Reference

maskmypy.Atlas dataclass

A class for quickly performing and evaluating geographic masks.

Example
from maskmypy import Atlas, donut, locationswap

atlas = Atlas(sensitive=some_points, population=some_addresses)
atlas.mask(donut, low=50, high=500)
atlas.mask(locationswap, low=50, high=500, address=some_addresses)
atlas.as_df()

Attributes:

Name Type Description
sensitive GeoDataFrame

A GeoDataFrame containing sensitive points.

population GeoDataFrame

A GeoDataFrame containing population information, such as address points or polygon with population counts.

population_column str

If the population layer is based on polygons, the name of the column containing population counts.

candidates list[]

A list of existing masked candidates, if any.

Source code in maskmypy/atlas.py
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
@dataclass
class Atlas:
    """
    A class for quickly performing and evaluating geographic masks.

    Example
    -------
    ```python
    from maskmypy import Atlas, donut, locationswap

    atlas = Atlas(sensitive=some_points, population=some_addresses)
    atlas.mask(donut, low=50, high=500)
    atlas.mask(locationswap, low=50, high=500, address=some_addresses)
    atlas.as_df()
    ```

    Attributes
    ----------
    sensitive : GeoDataFrame
        A GeoDataFrame containing sensitive points.
    population : GeoDataFrame
        A GeoDataFrame containing population information, such as address points or polygon
        with population counts.
    population_column : str
        If the population layer is based on polygons, the name of the column containing population
        counts.
    candidates : list[]
        A list of existing masked candidates, if any.
    """

    sensitive: GeoDataFrame
    population: GeoDataFrame = None
    population_column: str = "pop"
    candidates: list = field(default_factory=list)

    def __post_init__(self):
        self.layers = {}
        if isinstance(self.population, GeoDataFrame):
            tools._validate_crs(self.sensitive.crs, self.population.crs)

    def __getitem__(self, idx):
        return self.candidates[idx]

    def __setitem__(self, idx, val):
        self.candidates[idx] = val

    def __len__(self):
        return len(self.candidates)

    def add_layers(self, *gdf: GeoDataFrame):
        """
        Add GeoDataFrames to the layer store (`Atlas.layers`).

        When regenerating masked GeoDataFrames using `Atlas.gen_gdf()`, any context layers
        that were used in creating the associated candidate must be present in the layer store.
        If they are, they will be automatically found and used as needed.

        Note that layers are stored according to their checksum value (see
        `maskmypy.tools.checksum()`) to provide both deduplication and integrity
        checking.

        Parameters
        ----------
        gdf : GeoDataFrame
            GeoDataFrames to be added to the layer store.
        """
        for x in gdf:
            tools._validate_crs(self.sensitive.crs, x.crs)
            self.layers[tools.checksum(x)] = x

    def mask(
        self,
        mask_func: Callable,
        keep_gdf: bool = False,
        keep_candidate: bool = True,
        skip_slow_evaluators: bool = True,
        measure_execution_time: bool = True,
        measure_peak_memory: bool = False,
        **kwargs,
    ):
        """
        Execute a given mask, analyze the result, and add it to the Atlas.

        Parameters
        ----------
        mask_func : GeoDataFrame
            A masking function to apply to the sensitive point dataset. If using a custom mask,
            it must take the sensitive GeoDataFrame as its first argument, all other arguments as
            keyword arguments, and must return a GeoDataFrame containing the results.
        keep_gdf : bool
            If `False`, the resulting GeoDataFrame will be analyzed and then dropped to save memory.
            Use `gen_gdf` to regenerate the GeoDataFrame.
        keep_candidate : bool
            If `True`, a dictionary containing mask parameters and analysis results are added to
            the candidate list (`Atlas.candidates`, or `Atlas[index]`).
        skip_slow_evaluators : bool
            If `True`, skips any analyses that are known to be slow during mask result
            evaluation. See maskmypy.analysis.evaluate() for more information.
        measure_execution_time : bool
            If `True`, measures the execution time of the mask function and adds it to the
            candidate statistics. Mutually exclusive with `measure_peak_memory`
        measure_peak_memory : bool
            If `True`, will profile memory usage while the mask function is being applied,
            and will add the value in MB to the candidate statistics. Note that the reported
            value represents *additional* memory used by the mask, and does not include existing
            allocations. Mutually exclusive with `measure_peak_memory`.

            Warning: this can significantly slow down execution time.

        """
        if measure_execution_time and measure_peak_memory:
            raise ValueError(
                "`measure_execution_time` and `measure_peak_memory` cannot both be true."
            )

        candidate = {
            "mask": mask_func.__name__,
            "kwargs": self._hydrate_mask_kwargs(**kwargs),
        }

        if "seed" in inspect.getfullargspec(mask_func).args and "seed" not in candidate["kwargs"]:
            candidate["kwargs"]["seed"] = tools.gen_seed()

        if measure_execution_time:
            time_start = default_timer()
        elif measure_peak_memory:
            tracemalloc.start()

        gdf = mask_func(self.sensitive, **candidate["kwargs"])

        if measure_execution_time:
            execution_time = default_timer() - time_start
        elif measure_peak_memory:
            _, mem_peak = tracemalloc.get_traced_memory()
            tracemalloc.stop()
            mem_peak_mb = mem_peak / 1024 / 1024

        candidate["checksum"] = tools.checksum(gdf)
        candidate["kwargs"] = self._dehydrate_mask_kwargs(**candidate["kwargs"])
        candidate["stats"] = analysis.evaluate(
            sensitive_gdf=self.sensitive,
            candidate_gdf=gdf,
            population_gdf=self.population,
            population_column=self.population_column,
            skip_slow=skip_slow_evaluators,
        )

        if "UNMASKED" in gdf.columns:
            candidate["stats"]["UNMASKED_POINTS"] = gdf["UNMASKED"].sum()

        if measure_execution_time:
            candidate["stats"]["execution_time"] = round(execution_time, 3)
        elif measure_peak_memory:
            candidate["stats"]["memory_peak_mb"] = round(mem_peak_mb, 3)

        if keep_gdf:
            self.layers[candidate["checksum"]] = gdf
        else:
            del gdf

        if keep_candidate:
            self.candidates.append(candidate)

        return candidate

    def gen_gdf(
        self,
        idx: int = None,
        checksum: str = None,
        keep: bool = False,
        custom_mask: Callable = None,
    ):
        """
        Regenerates the GeoDataFrame for a given candidate based on either its position in the
        `Atlas.candidates` list or its checksum.

        Parameters
        ----------
        idx : int
            Index of the candidate in `Atlas.candidates` to regenerate a GeoDataFrame for.
        checksum : str
            Checksum of the candidate in `Atlas.candidates` to regenerate a GeoDataFrame for.
        keep : bool
            If `True`, return the masked GeoDataFrame and store it in `Atlas.layers` for future
            use so it does not need to be regenerated.
        custom_mask : Callable
            If the candidate was generated using a custom masking function from outside MaskMyPy,
            provide the function here.

        """
        if (idx is None and checksum is None) or (idx is not None and checksum is not None):
            raise ValueError(f"Must specify either idx or checksum.")

        checksum_before = checksum if checksum else self.candidates[idx]["checksum"]

        # Check if layer is already in the store.
        if isinstance(self.layers.get(checksum_before, None), GeoDataFrame):
            return self.layers[checksum_before]

        try:
            candidate = next(
                cand for cand in self.candidates if cand["checksum"] == checksum_before
            )
        except:
            raise ValueError(f"Could not locate candidate with checksum '{checksum_before}'")

        mask_func = custom_mask or getattr(masks, candidate["mask"])

        candidate_after = self.mask(
            mask_func, keep_candidate=False, keep_gdf=True, **candidate["kwargs"]
        )

        checksum_after = candidate_after.get("checksum")
        if checksum_before != checksum_after:
            raise ValueError(
                f"Checksum of masked GeoDataFrame ({checksum_after}) does not match that which is on record for this candidate ({checksum_before}). Did any input layers get modified?"
            )

        gdf = self.layers[checksum_after]

        if not keep:
            del self.layers[checksum_after]

        return gdf

    def sort(self, by: str, desc: bool = False):
        """
        Sorts the list of candidates (`Atlas.candidates`) based on a given statistic.

        Example:
        ```
        # Sort candidate list in ascending order based on maximum displacement distance.
        atlas.sort(by="displacement_max")

        # Sort candidate list in descending order based on minimum k-anonymity.
        atlas.sort(by="k_min", desc=True)
        ```

        Parameters
        ----------
        by : str
            Name of the statistic to sort by.
        desc : bool
            If `True`, sort in descending order.

        """
        if by in self.candidates[0]["stats"].keys():
            self.candidates.sort(key=lambda x: x["stats"][by], reverse=desc)
        else:
            raise ValueError(f"Could not find '{by}' in candidate statistics.")

    def prune(self, by: str, min: float, max: float):
        """
        Prune candidates based on a given statistic. If the value for that attribute is less than
        `min` or greater than `max` (both inclusive), drop the candidate.

        Example:
        ```
        # Prune any candidates with a minimum displacement distance below 50 and above 500.
        atlas.prune(by="displacement_min", min=50, max=500)

        # Prune any candidates with minimum k-anonymity values below 10 and above 50.
        atlas.prune(by="k_min", min=10, max=50)
        ```

        Parameters
        ----------
        by : str
            Name of the candidate statistic to prune by.
        min : float
            Minimum value of the statistic. If below `min`, the candidate is pruned from the
            candidates list. If the statistic is equal to or greater than `min` but not
            greater than `max` it is kept in the list.
        max : float
            Maximum value of the statistic. If above `max`, the candidate is pruned from the
            candidates list. If the statistic is equal to or less than `max` but not less
            than `min` it is kept in the list.
        """
        if by in self.candidates[0]["stats"].keys():
            self.candidates = [
                c for c in self.candidates if c["stats"][by] >= min and c["stats"][by] <= max
            ]
        else:
            raise ValueError(f"Could not find '{by}' in candidate statistics.")

    def to_json(self, file: Path):
        """
        Saves candidates to a JSON file. As long as the input GeoDataFrames are
        also preserved by the user*, this JSON file can be used to later reconstruct
        the atlas using `Atlas.from_json()`, including all resulting candidate GeoDataFrames.

        * Warning: if Street masking is used, there is a chance that a candidate will not be able
        to be regenerated if OpenStreetMap data changes. This will be addressed in a future version
        of MaskMyPy.

        Parameters
        ----------
        file : Path
            File path indicating where the JSON file should be saved.
        """
        with open(file, "w") as f:
            json.dump(self.candidates, f)

    @classmethod
    def from_json(
        cls,
        sensitive: GeoDataFrame,
        candidate_json: Path,
        population: GeoDataFrame = None,
        population_column: str = "pop",
        layers: list = None,
    ):
        """
        Recreate an Atlas from a candidate JSON file previously generated using `Atlas.to_json()`
        as well as the original GeoDataFrames. Masked GeoDataFrames can then be regenerated using
        `Atlas.gen_gdf()`.

        * Warning: if Street masking is used, there is a chance that a candidate will not be able
        to be regenerated if OpenStreetMap data changes. This will be addressed in a future version
        of MaskMyPy.

        Parameters
        ----------
        sensitive : GeoDataFrame
            The original sensitive point layer.
        candidate_json : Path
            Path to a candidate JSON file previously generated using `Atlas.to_json()`.
        population : GeoDataFrame
            The original population layer, if one was specified.
        population_column : str
            If a polygon-based population layer was used, the name of the population column.
        layers : List[GeoDataFrame]
            A list of additional GeoDataFrames used in the original Atlas. For instance,
            any containers used during donut masking.
        """
        with open(candidate_json) as f:
            candidates = json.load(f)

        atlas = cls(
            sensitive=sensitive,
            candidates=candidates,
            population=population,
            population_column=population_column,
        )
        if layers:
            atlas.add_layers(*layers)
        return atlas

    def as_df(self):
        """
        Return a pandas DataFrame describing each candidate.
        """
        df = DataFrame(data=self.candidates)
        df = concat([df.drop(["kwargs"], axis=1), df["kwargs"].apply(Series)], axis=1)
        df = concat([df.drop(["stats"], axis=1), df["stats"].apply(Series)], axis=1)
        return df

    def scatter(self, a: str, b: str):
        """
        Return a scatter plot of candidates across two given statistics.

        Parameters
        ----------
        a : string
            Name of the candidate statistic to plot.
        b : string
            Name of the candidate statistic to plot.
        """
        df = self.as_df()
        fig = plt.figure()
        ax = fig.add_subplot(111)
        ax.scatter(df[a], df[b], c="#1f77b4")
        ax.set_xlabel(a)
        ax.set_ylabel(b)
        for i, label in enumerate(df["checksum"]):
            ax.annotate(label, (df.loc[i, a], df.loc[i, b]))
        return fig

    def _hydrate_mask_kwargs(self, **mask_kwargs: dict) -> dict:
        """
        Find any keyword arguments that contain context layer checksums and
        attempt to restore the layer from `Atlas.layers`.
        """
        for key, value in mask_kwargs.items():
            if isinstance(value, str) and value.startswith("context_"):
                checksum = value.split("_")[1]
                try:
                    mask_kwargs[key] = self.layers[checksum]
                except KeyError as e:
                    raise KeyError(
                        f"Error: cannot find context layer for '{key}, {checksum}', \
                        try loading it first using Atlas.add_layers(). {e}"
                    )
        return mask_kwargs

    def _dehydrate_mask_kwargs(self, **mask_kwargs: dict) -> dict:
        """
        Search mask kwargs for any GeoDataFrames and replace them with their checksums.
        """
        for key, value in mask_kwargs.items():
            if isinstance(value, GeoDataFrame):
                self.add_layers(value)
                mask_kwargs[key] = "_".join(["context", tools.checksum(value)])
        return mask_kwargs

add_layers(*gdf)

Add GeoDataFrames to the layer store (Atlas.layers).

When regenerating masked GeoDataFrames using Atlas.gen_gdf(), any context layers that were used in creating the associated candidate must be present in the layer store. If they are, they will be automatically found and used as needed.

Note that layers are stored according to their checksum value (see maskmypy.tools.checksum()) to provide both deduplication and integrity checking.

Parameters:

Name Type Description Default
gdf GeoDataFrame

GeoDataFrames to be added to the layer store.

()
Source code in maskmypy/atlas.py
def add_layers(self, *gdf: GeoDataFrame):
    """
    Add GeoDataFrames to the layer store (`Atlas.layers`).

    When regenerating masked GeoDataFrames using `Atlas.gen_gdf()`, any context layers
    that were used in creating the associated candidate must be present in the layer store.
    If they are, they will be automatically found and used as needed.

    Note that layers are stored according to their checksum value (see
    `maskmypy.tools.checksum()`) to provide both deduplication and integrity
    checking.

    Parameters
    ----------
    gdf : GeoDataFrame
        GeoDataFrames to be added to the layer store.
    """
    for x in gdf:
        tools._validate_crs(self.sensitive.crs, x.crs)
        self.layers[tools.checksum(x)] = x

as_df()

Return a pandas DataFrame describing each candidate.

Source code in maskmypy/atlas.py
def as_df(self):
    """
    Return a pandas DataFrame describing each candidate.
    """
    df = DataFrame(data=self.candidates)
    df = concat([df.drop(["kwargs"], axis=1), df["kwargs"].apply(Series)], axis=1)
    df = concat([df.drop(["stats"], axis=1), df["stats"].apply(Series)], axis=1)
    return df

from_json(sensitive, candidate_json, population=None, population_column='pop', layers=None) classmethod

Recreate an Atlas from a candidate JSON file previously generated using Atlas.to_json() as well as the original GeoDataFrames. Masked GeoDataFrames can then be regenerated using Atlas.gen_gdf().

  • Warning: if Street masking is used, there is a chance that a candidate will not be able to be regenerated if OpenStreetMap data changes. This will be addressed in a future version of MaskMyPy.

Parameters:

Name Type Description Default
sensitive GeoDataFrame

The original sensitive point layer.

required
candidate_json Path

Path to a candidate JSON file previously generated using Atlas.to_json().

required
population GeoDataFrame

The original population layer, if one was specified.

None
population_column str

If a polygon-based population layer was used, the name of the population column.

'pop'
layers List[GeoDataFrame]

A list of additional GeoDataFrames used in the original Atlas. For instance, any containers used during donut masking.

None
Source code in maskmypy/atlas.py
@classmethod
def from_json(
    cls,
    sensitive: GeoDataFrame,
    candidate_json: Path,
    population: GeoDataFrame = None,
    population_column: str = "pop",
    layers: list = None,
):
    """
    Recreate an Atlas from a candidate JSON file previously generated using `Atlas.to_json()`
    as well as the original GeoDataFrames. Masked GeoDataFrames can then be regenerated using
    `Atlas.gen_gdf()`.

    * Warning: if Street masking is used, there is a chance that a candidate will not be able
    to be regenerated if OpenStreetMap data changes. This will be addressed in a future version
    of MaskMyPy.

    Parameters
    ----------
    sensitive : GeoDataFrame
        The original sensitive point layer.
    candidate_json : Path
        Path to a candidate JSON file previously generated using `Atlas.to_json()`.
    population : GeoDataFrame
        The original population layer, if one was specified.
    population_column : str
        If a polygon-based population layer was used, the name of the population column.
    layers : List[GeoDataFrame]
        A list of additional GeoDataFrames used in the original Atlas. For instance,
        any containers used during donut masking.
    """
    with open(candidate_json) as f:
        candidates = json.load(f)

    atlas = cls(
        sensitive=sensitive,
        candidates=candidates,
        population=population,
        population_column=population_column,
    )
    if layers:
        atlas.add_layers(*layers)
    return atlas

gen_gdf(idx=None, checksum=None, keep=False, custom_mask=None)

Regenerates the GeoDataFrame for a given candidate based on either its position in the Atlas.candidates list or its checksum.

Parameters:

Name Type Description Default
idx int

Index of the candidate in Atlas.candidates to regenerate a GeoDataFrame for.

None
checksum str

Checksum of the candidate in Atlas.candidates to regenerate a GeoDataFrame for.

None
keep bool

If True, return the masked GeoDataFrame and store it in Atlas.layers for future use so it does not need to be regenerated.

False
custom_mask Callable

If the candidate was generated using a custom masking function from outside MaskMyPy, provide the function here.

None
Source code in maskmypy/atlas.py
def gen_gdf(
    self,
    idx: int = None,
    checksum: str = None,
    keep: bool = False,
    custom_mask: Callable = None,
):
    """
    Regenerates the GeoDataFrame for a given candidate based on either its position in the
    `Atlas.candidates` list or its checksum.

    Parameters
    ----------
    idx : int
        Index of the candidate in `Atlas.candidates` to regenerate a GeoDataFrame for.
    checksum : str
        Checksum of the candidate in `Atlas.candidates` to regenerate a GeoDataFrame for.
    keep : bool
        If `True`, return the masked GeoDataFrame and store it in `Atlas.layers` for future
        use so it does not need to be regenerated.
    custom_mask : Callable
        If the candidate was generated using a custom masking function from outside MaskMyPy,
        provide the function here.

    """
    if (idx is None and checksum is None) or (idx is not None and checksum is not None):
        raise ValueError(f"Must specify either idx or checksum.")

    checksum_before = checksum if checksum else self.candidates[idx]["checksum"]

    # Check if layer is already in the store.
    if isinstance(self.layers.get(checksum_before, None), GeoDataFrame):
        return self.layers[checksum_before]

    try:
        candidate = next(
            cand for cand in self.candidates if cand["checksum"] == checksum_before
        )
    except:
        raise ValueError(f"Could not locate candidate with checksum '{checksum_before}'")

    mask_func = custom_mask or getattr(masks, candidate["mask"])

    candidate_after = self.mask(
        mask_func, keep_candidate=False, keep_gdf=True, **candidate["kwargs"]
    )

    checksum_after = candidate_after.get("checksum")
    if checksum_before != checksum_after:
        raise ValueError(
            f"Checksum of masked GeoDataFrame ({checksum_after}) does not match that which is on record for this candidate ({checksum_before}). Did any input layers get modified?"
        )

    gdf = self.layers[checksum_after]

    if not keep:
        del self.layers[checksum_after]

    return gdf

mask(mask_func, keep_gdf=False, keep_candidate=True, skip_slow_evaluators=True, measure_execution_time=True, measure_peak_memory=False, **kwargs)

Execute a given mask, analyze the result, and add it to the Atlas.

Parameters:

Name Type Description Default
mask_func GeoDataFrame

A masking function to apply to the sensitive point dataset. If using a custom mask, it must take the sensitive GeoDataFrame as its first argument, all other arguments as keyword arguments, and must return a GeoDataFrame containing the results.

required
keep_gdf bool

If False, the resulting GeoDataFrame will be analyzed and then dropped to save memory. Use gen_gdf to regenerate the GeoDataFrame.

False
keep_candidate bool

If True, a dictionary containing mask parameters and analysis results are added to the candidate list (Atlas.candidates, or Atlas[index]).

True
skip_slow_evaluators bool

If True, skips any analyses that are known to be slow during mask result evaluation. See maskmypy.analysis.evaluate() for more information.

True
measure_execution_time bool

If True, measures the execution time of the mask function and adds it to the candidate statistics. Mutually exclusive with measure_peak_memory

True
measure_peak_memory bool

If True, will profile memory usage while the mask function is being applied, and will add the value in MB to the candidate statistics. Note that the reported value represents additional memory used by the mask, and does not include existing allocations. Mutually exclusive with measure_peak_memory.

Warning: this can significantly slow down execution time.

False
Source code in maskmypy/atlas.py
def mask(
    self,
    mask_func: Callable,
    keep_gdf: bool = False,
    keep_candidate: bool = True,
    skip_slow_evaluators: bool = True,
    measure_execution_time: bool = True,
    measure_peak_memory: bool = False,
    **kwargs,
):
    """
    Execute a given mask, analyze the result, and add it to the Atlas.

    Parameters
    ----------
    mask_func : GeoDataFrame
        A masking function to apply to the sensitive point dataset. If using a custom mask,
        it must take the sensitive GeoDataFrame as its first argument, all other arguments as
        keyword arguments, and must return a GeoDataFrame containing the results.
    keep_gdf : bool
        If `False`, the resulting GeoDataFrame will be analyzed and then dropped to save memory.
        Use `gen_gdf` to regenerate the GeoDataFrame.
    keep_candidate : bool
        If `True`, a dictionary containing mask parameters and analysis results are added to
        the candidate list (`Atlas.candidates`, or `Atlas[index]`).
    skip_slow_evaluators : bool
        If `True`, skips any analyses that are known to be slow during mask result
        evaluation. See maskmypy.analysis.evaluate() for more information.
    measure_execution_time : bool
        If `True`, measures the execution time of the mask function and adds it to the
        candidate statistics. Mutually exclusive with `measure_peak_memory`
    measure_peak_memory : bool
        If `True`, will profile memory usage while the mask function is being applied,
        and will add the value in MB to the candidate statistics. Note that the reported
        value represents *additional* memory used by the mask, and does not include existing
        allocations. Mutually exclusive with `measure_peak_memory`.

        Warning: this can significantly slow down execution time.

    """
    if measure_execution_time and measure_peak_memory:
        raise ValueError(
            "`measure_execution_time` and `measure_peak_memory` cannot both be true."
        )

    candidate = {
        "mask": mask_func.__name__,
        "kwargs": self._hydrate_mask_kwargs(**kwargs),
    }

    if "seed" in inspect.getfullargspec(mask_func).args and "seed" not in candidate["kwargs"]:
        candidate["kwargs"]["seed"] = tools.gen_seed()

    if measure_execution_time:
        time_start = default_timer()
    elif measure_peak_memory:
        tracemalloc.start()

    gdf = mask_func(self.sensitive, **candidate["kwargs"])

    if measure_execution_time:
        execution_time = default_timer() - time_start
    elif measure_peak_memory:
        _, mem_peak = tracemalloc.get_traced_memory()
        tracemalloc.stop()
        mem_peak_mb = mem_peak / 1024 / 1024

    candidate["checksum"] = tools.checksum(gdf)
    candidate["kwargs"] = self._dehydrate_mask_kwargs(**candidate["kwargs"])
    candidate["stats"] = analysis.evaluate(
        sensitive_gdf=self.sensitive,
        candidate_gdf=gdf,
        population_gdf=self.population,
        population_column=self.population_column,
        skip_slow=skip_slow_evaluators,
    )

    if "UNMASKED" in gdf.columns:
        candidate["stats"]["UNMASKED_POINTS"] = gdf["UNMASKED"].sum()

    if measure_execution_time:
        candidate["stats"]["execution_time"] = round(execution_time, 3)
    elif measure_peak_memory:
        candidate["stats"]["memory_peak_mb"] = round(mem_peak_mb, 3)

    if keep_gdf:
        self.layers[candidate["checksum"]] = gdf
    else:
        del gdf

    if keep_candidate:
        self.candidates.append(candidate)

    return candidate

prune(by, min, max)

Prune candidates based on a given statistic. If the value for that attribute is less than min or greater than max (both inclusive), drop the candidate.

Example:

# Prune any candidates with a minimum displacement distance below 50 and above 500.
atlas.prune(by="displacement_min", min=50, max=500)

# Prune any candidates with minimum k-anonymity values below 10 and above 50.
atlas.prune(by="k_min", min=10, max=50)

Parameters:

Name Type Description Default
by str

Name of the candidate statistic to prune by.

required
min float

Minimum value of the statistic. If below min, the candidate is pruned from the candidates list. If the statistic is equal to or greater than min but not greater than max it is kept in the list.

required
max float

Maximum value of the statistic. If above max, the candidate is pruned from the candidates list. If the statistic is equal to or less than max but not less than min it is kept in the list.

required
Source code in maskmypy/atlas.py
def prune(self, by: str, min: float, max: float):
    """
    Prune candidates based on a given statistic. If the value for that attribute is less than
    `min` or greater than `max` (both inclusive), drop the candidate.

    Example:
    ```
    # Prune any candidates with a minimum displacement distance below 50 and above 500.
    atlas.prune(by="displacement_min", min=50, max=500)

    # Prune any candidates with minimum k-anonymity values below 10 and above 50.
    atlas.prune(by="k_min", min=10, max=50)
    ```

    Parameters
    ----------
    by : str
        Name of the candidate statistic to prune by.
    min : float
        Minimum value of the statistic. If below `min`, the candidate is pruned from the
        candidates list. If the statistic is equal to or greater than `min` but not
        greater than `max` it is kept in the list.
    max : float
        Maximum value of the statistic. If above `max`, the candidate is pruned from the
        candidates list. If the statistic is equal to or less than `max` but not less
        than `min` it is kept in the list.
    """
    if by in self.candidates[0]["stats"].keys():
        self.candidates = [
            c for c in self.candidates if c["stats"][by] >= min and c["stats"][by] <= max
        ]
    else:
        raise ValueError(f"Could not find '{by}' in candidate statistics.")

scatter(a, b)

Return a scatter plot of candidates across two given statistics.

Parameters:

Name Type Description Default
a string

Name of the candidate statistic to plot.

required
b string

Name of the candidate statistic to plot.

required
Source code in maskmypy/atlas.py
def scatter(self, a: str, b: str):
    """
    Return a scatter plot of candidates across two given statistics.

    Parameters
    ----------
    a : string
        Name of the candidate statistic to plot.
    b : string
        Name of the candidate statistic to plot.
    """
    df = self.as_df()
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(df[a], df[b], c="#1f77b4")
    ax.set_xlabel(a)
    ax.set_ylabel(b)
    for i, label in enumerate(df["checksum"]):
        ax.annotate(label, (df.loc[i, a], df.loc[i, b]))
    return fig

sort(by, desc=False)

Sorts the list of candidates (Atlas.candidates) based on a given statistic.

Example:

# Sort candidate list in ascending order based on maximum displacement distance.
atlas.sort(by="displacement_max")

# Sort candidate list in descending order based on minimum k-anonymity.
atlas.sort(by="k_min", desc=True)

Parameters:

Name Type Description Default
by str

Name of the statistic to sort by.

required
desc bool

If True, sort in descending order.

False
Source code in maskmypy/atlas.py
def sort(self, by: str, desc: bool = False):
    """
    Sorts the list of candidates (`Atlas.candidates`) based on a given statistic.

    Example:
    ```
    # Sort candidate list in ascending order based on maximum displacement distance.
    atlas.sort(by="displacement_max")

    # Sort candidate list in descending order based on minimum k-anonymity.
    atlas.sort(by="k_min", desc=True)
    ```

    Parameters
    ----------
    by : str
        Name of the statistic to sort by.
    desc : bool
        If `True`, sort in descending order.

    """
    if by in self.candidates[0]["stats"].keys():
        self.candidates.sort(key=lambda x: x["stats"][by], reverse=desc)
    else:
        raise ValueError(f"Could not find '{by}' in candidate statistics.")

to_json(file)

Saves candidates to a JSON file. As long as the input GeoDataFrames are also preserved by the user*, this JSON file can be used to later reconstruct the atlas using Atlas.from_json(), including all resulting candidate GeoDataFrames.

  • Warning: if Street masking is used, there is a chance that a candidate will not be able to be regenerated if OpenStreetMap data changes. This will be addressed in a future version of MaskMyPy.

Parameters:

Name Type Description Default
file Path

File path indicating where the JSON file should be saved.

required
Source code in maskmypy/atlas.py
def to_json(self, file: Path):
    """
    Saves candidates to a JSON file. As long as the input GeoDataFrames are
    also preserved by the user*, this JSON file can be used to later reconstruct
    the atlas using `Atlas.from_json()`, including all resulting candidate GeoDataFrames.

    * Warning: if Street masking is used, there is a chance that a candidate will not be able
    to be regenerated if OpenStreetMap data changes. This will be addressed in a future version
    of MaskMyPy.

    Parameters
    ----------
    file : Path
        File path indicating where the JSON file should be saved.
    """
    with open(file, "w") as f:
        json.dump(self.candidates, f)