Masking
The sdk provides a variety of different masks that can be used to anonymize data, this tutorial will walk through an example of how to use the masks and then provide a list of the masks available, and their different options. All the masks provided are available in both spark and pandas and the example will show both.
All sdk masks are classes and their usage is straightforward. Once initialised, masking is applied to the data by calling transform()
. The transform()
method returns a new dataset with the specified column masked. Some of the masks are also invertable, where this is the case the method inverse_transform()
performs the inverse transformation on the column specified. The usage with spark and pandas dataframes is identical!
Basic Usage
Dataset
The example dataset we will use is the 'claim_prediction' dataset which is a dataset of insurance claims with attributes of claimants, available in the synthesized_datasets
package. The mask will be used to anonymize the 'charges' column of the dataset.
import synthesized_datasets as sd
data = sd.ALL.claim_prediction.load()
data.head()
....
age sex bmi children smoker region charges insuranceclaim
0 19 0 27.900 0 1 3 16884.92400 1
1 18 1 33.770 1 0 2 1725.55230 1
2 28 1 33.000 3 0 2 4449.46200 0
3 33 1 22.705 0 0 1 21984.47061 0
4 32 1 28.880 0 0 1 3866.85520 1
....
Masking Usage
As an example, let’s use the BucketingMask
which buckets numerical data in order to de-identify columns where the precise value of the numerical data could be personally identifiable.
The BucketingMask can bucket the data into buckets of equal size within a specified range. To use this mode of operation, users should pass the arguments bucket_size
, lower_bound
and upper_bound
when initialising the mask.
Click on the tabs below to see the example in pandas and spark! |
-
Pandas
-
Spark
from synthesized3.mask import BucketingMask
# Create a mask that will bucket the data into buckets of size 1000 between 5000 and 50000
mask_equal_buckets = BucketingMask(bucket_size=1000, lower_bound=5000, upper_bound=50000)
# The masked data is returned by calling `transform()`. The original data is not modified.
mask_equal_buckets.transform(data, col='charges')
....
age sex bmi children smoker region charges insuranceclaim
0 19 0 27.900 0 1 3 16000.0:17000.0 1
1 18 1 33.770 1 0 2 <5000.0 1
2 28 1 33.000 3 0 2 <5000.0 0
3 33 1 22.705 0 0 1 21000.0:22000.0 0
4 32 1 28.880 0 0 1 <5000.0 1
... ... ... ... ... ... ... ... ...
....
We will convert the pandas dataframe to a spark dataframe for the spark example. We will work with a local spark session for this example.
import pyspark.sql as ps
spark = ps.sql.SparkSession.builder \
.master("local[4]") \
.appName("synthesized") \
.getOrCreate()
# Convert the pandas dataframe to a spark dataframe:
data = spark.createDataFrame(data)
data.show(5)
....
+---+---+------+--------+------+------+-----------+--------------+
|age|sex| bmi|children|smoker|region| charges|insuranceclaim|
+---+---+------+--------+------+------+-----------+--------------+
| 19| 0| 27.9| 0| 1| 3| 16884.924| 1|
| 18| 1| 33.77| 1| 0| 2| 1725.5523| 1|
| 28| 1| 33.0| 3| 0| 2| 4449.462| 0|
| 33| 1|22.705| 0| 0| 1|21984.47061| 0|
| 32| 1| 28.88| 0| 0| 1| 3866.8552| 1|
+---+---+------+--------+------+------+-----------+--------------+
only showing top 5 rows
....
# Create a mask that will bucket the data into buckets of size 1000 between 5000 and 50000
mask_equal_buckets = BucketingMask(bucket_size=1000, lower_bound=5000, upper_bound=50000)
# The masked data is returned by calling `transform()`. The original data is not modified.
masked_data = mask_equal_buckets.transform(data, col='charges')
masked_data.show(5)
....
[Stage 4:> (0 + 1) / 1]
+---+---+------+--------+------+------+---------------+--------------+
|age|sex| bmi|children|smoker|region| charges|insuranceclaim|
+---+---+------+--------+------+------+---------------+--------------+
| 19| 0| 27.9| 0| 1| 3|16000.0:17000.0| 1|
| 18| 1| 33.77| 1| 0| 2| <5000.0| 1|
| 28| 1| 33.0| 3| 0| 2| <5000.0| 0|
| 33| 1|22.705| 0| 0| 1|21000.0:22000.0| 0|
| 32| 1| 28.88| 0| 0| 1| <5000.0| 1|
+---+---+------+--------+------+------+---------------+--------------+
only showing top 5 rows
....
Available Masks
The following masks are available in the sdk:
Bucketing
Masks exact numerical values in a column by bucketing numerical data into buckets of equal size or user-specified buckets.
Details
This can be used in two modes:
-
Bucketing into equal sized buckets: In this mode, users should pass the arguments
bucket_size
,lower_bound
andupper_bound
when initialising the mask. The data will be bucketed into buckets of sizebucket_size
betweenlower_bound
andupper_bound
. -
User-specified bucketting: In this mode, users can define specific bucket ranges and names for the buckets via the
bucket_config
argument. Thebucket_config
is a list of dictionaries. Each dictionary specifies a bucket, and must have the following keys:-
'min': The minimum value of the bucket
-
'max': The maximum value of the bucket
-
'replacement_value': The value to replace values in the range with. All values in the column being masked must fall into one of the buckets specified in the
bucket_config
. Examples for both modes of operation are shown below.
-
Example:
from synthesized3.mask import BucketingMask
import synthesized_datasets as sd
data = sd.ALL.claim_prediction.load()
data.head()
....
age sex bmi children smoker region charges insuranceclaim
0 19 0 27.900 0 1 3 16884.92400 1
1 18 1 33.770 1 0 2 1725.55230 1
2 28 1 33.000 3 0 2 4449.46200 0
3 33 1 22.705 0 0 1 21984.47061 0
4 32 1 28.880 0 0 1 3866.85520 1
....
# Create a mask that will bucket the data into buckets of size 1000 between 5000 and 50000
mask_equal_buckets = BucketingMask(bucket_size=1000, lower_bound=5000, upper_bound=50000)
# The masked data is returned by calling `transform()`. The original data is not modified.
mask_equal_buckets.transform(data, col='charges')
....
age sex bmi children smoker region charges insuranceclaim
0 19 0 27.900 0 1 3 16000.0:17000.0 1
1 18 1 33.770 1 0 2 <5000.0 1
2 28 1 33.000 3 0 2 <5000.0 0
3 33 1 22.705 0 0 1 21000.0:22000.0 0
4 32 1 28.880 0 0 1 <5000.0 1
... ... ... ... ... ... ... ... ...
....
Example:
bucket_config = [
{"min": 2000, "max": 10000, "replacement_value": "Low"},
{"min": 10000, "max": 30000, "replacement_value": "Medium"},
{"min": 30000, "max": 80000, "replacement_value": "High"},
]
mask_config_buckets = BucketingMask(bucket_config=bucket_config)
mask_config_buckets.transform(data, col='charges')
....
age sex bmi children smoker region charges insuranceclaim
0 19 0 27.900 0 1 3 Medium 1
1 18 1 33.770 1 0 2 Low 1
2 28 1 33.000 3 0 2 Low 0
3 33 1 22.705 0 0 1 Medium 0
4 32 1 28.880 0 0 1 Low 1
... ... ... ... ... ... ... ... ...
....
DateShift
:
Masks exact date values by shifting dates by an random amount between a specified range. This has 3 modes of operation:
-
Randomly shift the date by a random number of days for all dates in the column. This is the default mode of operation.
-
Shift all dates together by a random amount to maintain intervals between all dates in the column. Toggled by setting
maintain_diff=True
. -
Shift all dates within a group by a random amount where the group is defined by an 'entity column'. of the dataset. Toggled by setting the
entity_column
argument.
Details
Example:
Load example data:
import pandas as pd
from synthesized3.mask import DateShiftMask
import synthesized_datasets as sd
data = sd.ALL.s_and_p_500_5yr.load()
data.head()
....
date open high low close volume Name
0 2013-02-08 15.07 15.12 14.63 14.75 8407500 AAL
1 2013-02-11 14.89 15.01 14.26 14.46 8882000 AAL
2 2013-02-12 14.45 14.51 14.10 14.27 8126000 AAL
3 2013-02-13 14.30 14.94 14.25 14.66 10259500 AAL
4 2013-02-14 14.94 14.96 13.16 13.99 31879900 AAL
... ... ... ... ... ... ... ...
618096 2018-02-01 76.84 78.27 76.69 77.82 2982259 ZTS
618097 2018-02-02 77.53 78.12 76.73 76.78 2595187 ZTS
618098 2018-02-05 76.64 76.92 73.18 73.83 2962031 ZTS
618099 2018-02-06 72.74 74.56 72.13 73.27 4924323 ZTS
618100 2018-02-07 72.70 75.00 72.69 73.86 4534912 ZTS
....
# Ensure that the date column is of type 'datetime'
data['date'] = pd.to_datetime(data['date'])
Using mode 1: Randomly shift the date by a random number of days for all dates in the column.
dateshiftmask = DateShiftMask(lower_bound_days=-30, upper_bound_days=30)
dateshiftmask.transform(data, col='date').head()
....
date open high low close volume Name
0 2013-02-21 15.07 15.12 14.63 14.75 8407500 AAL
1 2013-01-30 14.89 15.01 14.26 14.46 8882000 AAL
2 2013-02-16 14.45 14.51 14.10 14.27 8126000 AAL
3 2013-01-16 14.30 14.94 14.25 14.66 10259500 AAL
4 2013-01-20 14.94 14.96 13.16 13.99 31879900 AAL
....
Using mode 2: Shift all dates together by a random amount to maintain intervals between all dates in the column.
dateshiftmask = DateShiftMask(lower_bound_days=-30, upper_bound_days=30, maintain_diff=True)
dateshiftmask.transform(data, col='date').head()
....
date open high low close volume Name
0 2013-01-26 15.07 15.12 14.63 14.75 8407500 AAL
1 2013-01-29 14.89 15.01 14.26 14.46 8882000 AAL
2 2013-01-30 14.45 14.51 14.10 14.27 8126000 AAL
3 2013-01-31 14.30 14.94 14.25 14.66 10259500 AAL
4 2013-02-01 14.94 14.96 13.16 13.99 31879900 AAL
....
Using mode 3: Shift all dates within a group by a random amount where the group is defined by an 'entity column' of the dataset.
dateshiftmask = DateShiftMask(lower_bound_days=-30, upper_bound_days=30, maintain_diff=True, entity_column='Name')
# Notice how the dates are shifted by the same amount within each group, but the amounts are different between groups.
dateshiftmask.transform(data, col='date')
....
date open high low close volume Name
0 2013-02-07 15.07 15.12 14.63 14.75 8407500 AAL
1 2013-02-10 14.89 15.01 14.26 14.46 8882000 AAL
2 2013-02-11 14.45 14.51 14.10 14.27 8126000 AAL
3 2013-02-12 14.30 14.94 14.25 14.66 10259500 AAL
4 2013-02-13 14.94 14.96 13.16 13.99 31879900 AAL
... ... ... ... ... ... ... ...
618096 2018-01-31 76.84 78.27 76.69 77.82 2982259 ZTS
618097 2018-02-01 77.53 78.12 76.73 76.78 2595187 ZTS
618098 2018-02-04 76.64 76.92 73.18 73.83 2962031 ZTS
618099 2018-02-05 72.74 74.56 72.13 73.27 4924323 ZTS
618100 2018-02-06 72.70 75.00 72.69 73.86 4534912 ZTS
....
Deterministic Encryption
Uses the AES (Advanced European Standard) encryption algorithm to encrypt data in a column. The encryption is deterministic, meaning that the same value will always be encrypted to the same value preserving referential integrity of the data. This means that the encrypted data can be used for joins and other operations that require the encrypted data to be comparable. This masking is reversible using the key.
Details
Users must provide a key when initialising the mask. The key must be a 16, 24 or 32 byte string. Additionally, users may choose to add an additional 16 byte 'tweak' to the key which enhances security. The encrypted data is returned by calling transform()
.
Example:
Load example data:
import synthesized_datasets as sd
data = sd.ALL.healthcare.load()
Unnamed: 0 gender first_name last_name weight NHS_number ... postcode synchronous_tumour_indicator pathology_investigation_type lesion_size number_of_lesions outcome 0 0 Female Maude Jackson 50 256 3138 8154 ... NP132JL 0.116682 0 0.852172 11 0 1 1 Female Leanne Potter 44 640 0311 3044 ... EH146AE 0.916627 1 0.709176 6 0 2 2 Male Johnie Carney 82 113 2535 9715 ... TW7 6LG 0.312163 1 1.533713 11 1 3 3 Female Susanne Joseph 60 807 8602 0184 ... SW130EH 0.040120 0 0.400135 16 0 4 4 Non-binary Cora Blackburn 53 657 6533 0112 ... PO318HA 0.143843 0 0.116941 24 0
Apply masking:
from synthesized3.mask import DeterministicEncryptionMask
import secrets
# Generate a random key
key = secrets.token_hex(16)
deterministic_encryption_mask = DeterministicEncryptionMask(key=key)
masked_data = deterministic_encryption_mask.transform(data, col='first_name')
masked_data.head()
Unnamed: 0 gender first_name last_name weight ... synchronous_tumour_indicator pathology_investigation_type lesion_size number_of_lesions outcome 0 0 Female 3qonb1A=P2ew8fUFcmyUISbLZyiyiA== Jackson 50 ... 0.116682 0 0.852172 11 0 1 1 Female 37bmyVSDX8pZTaodaUF3rx0qBAxXEg== Potter 44 ... 0.916627 1 0.709176 6 0 2 2 Male 6TRz6LmfEnuIREDhFGu4cG+5A71rBw== Carney 82 ... 0.312163 1 1.533713 11 1 3 3 Female YYesI7S7Ag==7lobPz+yMKwTU5qdzv/wew== Joseph 60 ... 0.040120 0 0.400135 16 0 4 4 Non-binary cU3qyg==iP/YYnl6ethAzOH7JYbT3Q== Blackburn 53 ... 0.143843 0 0.116941 24 0 [5 rows x 16 columns]
Reverse masking:
deterministic_encryption_mask.inverse_transform(masked_data, col='first_name').head()
Unnamed: 0 gender first_name last_name weight NHS_number ... postcode synchronous_tumour_indicator pathology_investigation_type lesion_size number_of_lesions outcome 0 0 Female Maude Jackson 50 256 3138 8154 ... NP132JL 0.116682 0 0.852172 11 0 1 1 Female Leanne Potter 44 640 0311 3044 ... EH146AE 0.916627 1 0.709176 6 0 2 2 Male Johnie Carney 82 113 2535 9715 ... TW7 6LG 0.312163 1 1.533713 11 1 3 3 Female Susanne Joseph 60 807 8602 0184 ... SW130EH 0.040120 0 0.400135 16 0 4 4 Non-binary Cora Blackburn 53 657 6533 0112 ... PO318HA 0.143843 0 0.116941 24 0
Format Preserving Hashing
Hashes values using the SHA256 algorithm whilst preserving the format of the original values. The format is preserved by specifying an alphabet of characters that the hashed values can contain. The alphabet can be any string of characters. The alphabet must be at least 2 characters long and must not contain any duplicate characters.
Details
Example:
Load example data:
import synthesized_datasets as sd
data = sd.ALL.healthcare.load()
data = data[['first_name','last_name', 'NHS_number']]
data.head()
....
first_name last_name NHS_number
0 Maude Jackson 256 3138 8154
1 Leanne Potter 640 0311 3044
2 Johnie Carney 113 2535 9715
3 Susanne Joseph 807 8602 0184
4 Cora Blackburn 657 6533 0112
....
Apply masking:
from synthesized3.mask import FormatPreservingHashingMask
# `string.ascii_letters` is a string containing all ascii letters
import string
format_preserving_hashing_mask = FormatPreservingHashingMask(alphabet=string.ascii_letters)
# Masking both first and last name
masked_data = format_preserving_hashing_mask.transform(data, col='first_name')
format_preserving_hashing_mask.transform(masked_data, col='last_name').head()
....
first_name last_name NHS_number
0 EKjQa DsBSXaV 256 3138 8154
1 PsQrzl MldqlS 640 0311 3044
2 kbczST xetAKw 113 2535 9715
3 ZdZoamD uFAAaN 807 8602 0184
4 bBmE FLnKanShx 657 6533 0112
....
Format Preserving Encryption
Used to encrypt values of the data whilst preserving the format. The masking uses the FF3-1 algorithm. The format is preserved by specifying an alphabet of characters that the encrypted values can contain. The alphabet can be any string of characters. The alphabet must be at least 2 characters long and must not contain any duplicate characters. The FPE mask can be inverted by using the same key, tweak and alphabet and calling inverse_transform()
method.
Details
Example:
Load example data:
import synthesized_datasets as sd
data = sd.ALL.healthcare.load()
data = data[['first_name','last_name', 'NHS_number']]
data.head()
....
first_name last_name NHS_number
0 Maude Jackson 256 3138 8154
1 Leanne Potter 640 0311 3044
2 Johnie Carney 113 2535 9715
3 Susanne Joseph 807 8602 0184
4 Cora Blackburn 657 6533 0112
....
Apply masking:
from synthesized3.mask import FormatPreservingEncryptionMask
# Build a key
import secrets
key = secrets.token_hex(16)
tweak = secrets.token_hex(8)
# `string.ascii_letters` is a string containing all ascii letters
import string
format_preserving_encryption_mask = FormatPreservingEncryptionMask(key=key, tweak=tweak, alphabet=string.ascii_letters)
masked_data = format_preserving_encryption_mask.transform(data, col='first_name')
format_preserving_encryption_mask.transform(masked_data, col='last_name').head()
....
first_name last_name NHS_number
0 ZtRcF koIMzpj 256 3138 8154
1 WqMngQ ryDhPl 640 0311 3044
2 lahXhO HKCGDp 113 2535 9715
3 TGQaFOm KKNZNz 807 8602 0184
4 fhGv FISWzrfpB 657 6533 0112
....
Reverse masking using the same key, tweak and alphabet:
new_format_preserving_encryption_mask = FormatPreservingEncryptionMask(key=key, tweak=tweak, alphabet=string.ascii_letters)
new_format_preserving_encryption_mask.inverse_transform(masked_data, col='first_name').head()
....
first_name last_name NHS_number
0 Maude Jackson 256 3138 8154
1 Leanne Potter 640 0311 3044
2 Johnie Carney 113 2535 9715
3 Susanne Joseph 807 8602 0184
4 Cora Blackburn 657 6533 0112
....
Nullify
Replaces values in a column with null values.
Details
Example:
Load example data:
import synthesized_datasets as sd
data = sd.ALL.healthcare.load()
data = data[['first_name','last_name', 'NHS_number']]
data.head()
....
first_name last_name NHS_number
0 Maude Jackson 256 3138 8154
1 Leanne Potter 640 0311 3044
2 Johnie Carney 113 2535 9715
3 Susanne Joseph 807 8602 0184
4 Cora Blackburn 657 6533 0112
....
Apply masking:
from synthesized3.mask import NullMask
null_mask = NullMask()
null_mask.transform(data, col='NHS_number').head()
....
first_name last_name NHS_number
0 Maude Jackson None
1 Leanne Potter None
2 Johnie Carney None
3 Susanne Joseph None
4 Cora Blackburn None
....
Redact
Removes values in a column. There are several modes of operation for this mask:
1. The default mode of running is to redact the whole value entirely.
2. Users can choose to redact only a portion of the dataset by specifying the portion
argument. By default the redaction starts from the begginning of the value, but users can choose to redact from the end by setting mask_start=False
.
3. Users can specify a regular expression to match values to redact by setting the pattern
argument.
Details
Example:
Load example data:
data = sd.ALL.healthcare.load()
data = data[['city','postcode']]
data.head()
....
city postcode
0 ABERTILLERY NP132JL
1 CURRIE EH146AE
2 ISLEWORTH TW7 6LG
3 LONDON SW13 SW130EH
4 COWES PO318HA
....
Apply masking:
from synthesized3.mask import RedactionMask
# Default mode of operation
redaction_mask = RedactionMask()
redaction_mask.transform(data, col='postcode').head()
....
city postcode
0 ABERTILLERY
1 CURRIE
2 ISLEWORTH
3 LONDON SW13
4 COWES
....
Redact only a portion of the data:
redaction_mask_portion = RedactionMask(portion=0.3, mask_start=False)
redaction_mask_portion.transform(data, col='postcode').head()
....
city postcode
0 ABERTILLERY NP13
1 CURRIE EH14
2 ISLEWORTH TW7
3 LONDON SW13 SW13
4 COWES PO31
....
Notice how part of the postcode is included in the city column, let’s fix that using a regular expression:
redaction_mask_regex = RedactionMask(pattern=r'\s\w+$')
redaction_mask_regex.transform(data, col='city').head()
....
city postcode
0 ABERTILLERY NP132JL
1 CURRIE EH146AE
2 ISLEWORTH TW7 6LG
3 LONDON SW130EH
4 COWES PO318HA
....
Replace
Replaces values in a column with a specified value. The value, or portion of value to be replaced can be defined by passing a regex pattern. The replacement values can be specified either by passing a single (string) value, or a list of strings. If a list is passed, the replacement values will be chosen at random from the list.
Details
Example:
Load example data:
data = sd.ALL.healthcare.load()
data = data[['first_name','last_name','NHS_number']]
data.head()
....
first_name last_name NHS_number
0 Maude Jackson 256 3138 8154
1 Leanne Potter 640 0311 3044
2 Johnie Carney 113 2535 9715
3 Susanne Joseph 807 8602 0184
4 Cora Blackburn 657 6533 0112
.....
# Default behaviour is to match everything
replacement_mask = ReplacementMask(replacement_value="*")
replacement_mask.transform(data, col='NHS_number').head()
....
first_name last_name NHS_number
0 Maude Jackson **
1 Leanne Potter **
2 Johnie Carney **
3 Susanne Joseph **
4 Cora Blackburn **
....
# Replacing with a sample of values from a list
replacement_name_mask = ReplacementMask(replacement_value=['FIRST_NAME_1', 'FIRST_NAME_2', 'FIRST_NAME_3'])
replacement_name_mask.transform(data, col="first_name").head()
....
first_name last_name NHS_number
0 FIRST_NAME_3 Jackson 256 3138 8154
1 FIRST_NAME_2 Potter 640 0311 3044
2 FIRST_NAME_1 Carney 113 2535 9715
3 FIRST_NAME_3 Joseph 807 8602 0184
4 FIRST_NAME_1 Blackburn 657 6533 0112
....
# Replacing specific parts of the data via regex matching
replace_ones_mask = ReplacementMask(replacement_value="*", pattern="1")
replace_ones_mask.transform(data, col="NHS_number").head()
....
first_name last_name NHS_number
0 Maude Jackson 256 3*38 8*54
1 Leanne Potter 640 03** 3044
2 Johnie Carney **3 2535 97*5
3 Susanne Joseph 807 8602 0*84
4 Cora Blackburn 657 6533 0**2
....
Time Extraction
Extracts time information from a datetime column. The time information can be extracted in the following ways:
-
YEAR
: [0-9999] -
MONTH
: [1-12] -
DAY_OF_MONTH
: [1-31] -
DAY_OF_WEEK
: [1-7] -
WEEK_OF_YEAR
: [1-53] -
HOUR_OF_DAY
: [0-23] -
MINUTE_OF_HOUR
: [0-59] -
SECOND_OF_MINUTE
: [0-59] -
MICROSECOND_OF_SECOND
: [0-999999]
These can be extracted by passing them as the value for the argument 'time_part' upon initialisation.
Details
Example:
Load example data:
import synthesized_datasets as sd
data = sd.ALL.noaa_isd_weather_additional_dtypes_small.load()
data = data[['datetime', 'longitude','latitude']]
data.head()
....
datetime longitude latitude
0 2019-04-02 17:55:00 -170.212 57.158
1 2019-04-02 14:30:00 -170.212 57.158
2 2019-04-02 08:00:00 -170.212 57.158
3 2019-04-02 09:30:00 -102.774 33.956
4 2019-04-02 14:20:00 -117.526 47.417
....
# Ensure that the date column is of type 'datetime'
import pandas as pd
data['datetime'] = pd.to_datetime(data['datetime'])
# Extract only the month
time_extraction_mask = TimeExtractionMask(time_part='MONTH')
time_extraction_mask.transform(data, col='datetime').head()
....
datetime longitude latitude
0 4 -170.212 57.158
1 4 -170.212 57.158
2 4 -170.212 57.158
3 4 -102.774 33.956
4 4 -117.526 47.417
....
# Extract only the minute of the hour
time_extraction_mask = TimeExtractionMask(time_part='MINUTE_OF_HOUR')
time_extraction_mask.transform(data, col='datetime').head()
....
datetime longitude latitude
0 55 -170.212 57.158
1 30 -170.212 57.158
2 0 -170.212 57.158
3 30 -102.774 33.956
4 20 -117.526 47.417
....