Comparer

AddressesComparer

class deepparse.comparer.AddressesComparer(parser: AddressParser)[source]

Address comparer is used to compare addresses with each other and retrieve the differences between them. The addresses are parsed using an address parser based on one of the seq2seq pretrained networks, either with FastText or BPEmb.

The address comparer can compare already parsed addresses. The address parser first recomposes the raw addresses then suggest its own tags; then it makes a comparison with the tags of the source parsing and the newly parsed address

The address comparer is also able to compare raw addresses by first parsing the addresses using the address parser and then bring out the differences among the parsed addresses.

Parameters:: parser (AddressParser) – the AddressParser used to parse the addresses.

compare_tags(addresses_tags_to_compare: List[tuple] | List[List[tuple]], with_prob: None | bool = None) → List[FormattedComparedAddressesTags] | FormattedComparedAddressesTags[source]

Compare tags of a source parsing with the parsing from AddressParser. First, it reconstructs the raw address from the parsing, AddressParser generates tags and compares the two parsings.

Parameters:

addresses_tags_to_compare (Union[List[tuple], List[List[tuple]]]) – list of tuples that contain
a (the tags for the address components from the source. Can compare multiple parsings if passed as) –
tuples. (list of) –
with_prob (Union[None, bool]) – An option flag to either or not include probabilities in the comparison report. The probabilities are not compared but only included in the report. The default value is None, which means not taking into account.

Returns:

Either a FormattedComparedAddressesTags or a list of FormattedComparedAddressTags when there is more than one comparison to make.

Examples

first_parsed_address = [
    ("350", "StreetNumber"),
    ("rue des Lilas", "StreetName"),
    ("Ouest Québec", "Municipality"),
    ("Québec", "Province"),
    ("G1L 1B6", "PostalCode")]
second_parsed_address_with_prob = [
    ('350', ('StreetNumber', 1.0)),
    ('rue', ('StreetName', 0.9987)),
    ('des', ('StreetName', 0.9993)),
    ('Lilas', ('StreetName', 0.8176)),
    ('Ouest', ('Orientation', 0.781)),
    ('Quebec', ('Municipality', 0.9768)),
    ('Quebec', ('Province', 1.0)),
    ('G1L', ('PostalCode', 0.9993)),
    ('1B6', ('PostalCode', 1.0))]

address_parser = AddressParser(model_type="bpemb")
addresses_comparer = AddressesComparer(address_parser)

list_of_compared_addresses = addresses_comparer.compare_tags([first_parsed_address,
                                                              second_parsed_address_with_prob])
list_of_compared_addresses[0].comparison_report()
list_of_compared_addresses[1].comparison_report()

compare_raw(raw_addresses_to_compare: Tuple[str] | List[Tuple[str]], with_prob: None | bool = None) → List[FormattedComparedAddressesRaw][source]

Compare a list of raw addresses together. It starts by parsing the addresses with the parser and then return the differences between the parsed address components of the two addresses.

Parameters:

raw_addresses_to_compare (Union[Tuple[str], List[Tuple[str]]]) – List of strings that represent raw addresses to compare.
with_prob (Union[None, bool]) – An option flag to either or not include probabilities in the comparison report. The probabilities are not compared but only included in the report. The default value is None, which means not taking into account.

Returns:

Either a FormattedComparedAddressesRaw or a list of FormattedComparedAddressesRaw when given more than one comparison to make.

Examples

raw_address_original = "350 rue des Lilas Ouest Quebec Quebec G1L 1B6"
raw_address_identical = "350 rue des Lilas Ouest Quebec Quebec G1L 1B6"
raw_address_equivalent = "350  rue des Lilas Ouest Quebec Quebec G1L 1B6"
raw_address_diff_streetNumber = "450 rue des Lilas Ouest Quebec Quebec G1L 1B6"

raw_addresses_multiples_comparisons = addresses_comparer.compare_raw([(raw_address_original,
                                                                       raw_address_identical),
                                                                      (raw_address_original,
                                                                       raw_address_equivalent),
                                                                      (raw_address_original,
                                                                       raw_address_diff_streetNumber)])
raw_addresses_multiples_comparisons[0].comparison_report()
raw_addresses_multiples_comparisons[1].comparison_report()
raw_addresses_multiples_comparisons[2].comparison_report()

Formatted Compared Addresses

class deepparse.comparer.FormattedComparedAddresses(first_address: FormattedParsedAddress, second_address: FormattedParsedAddress, origin: Tuple[str, str], with_prob: bool)[source]

Abstract method that defined a comparison for addresses returned by the address comparer.

Parameters:

first_address (FormattedParsedAddress) – A formatted parsed address that contains the parsing information for the first one.
second_address (FormattedParsedAddress) – A formatted parsed address that contains the parsing information for the second one.
origin – (Tuple[str, str]): The origin of the parsing (ex : from source or a Deepparse pretrained model).

Example

address_comparer = AddressesComparer(AddressParser())
raw_identical_comparison = address_comparer.compare_raw(
                                            ("350 rue des Lilas Ouest Quebec city Quebec G1L 1B6",
                                            "450 rue des Lilas Ouest Quebec city Quebec G1L 1B6"))

property list_of_bool: List

A list of boolean that contains all the address components’ names and indicates if it is the same for the two addresses.

Returns:: A list of the boolean.

property equivalent: bool

Check if the parsing is the same for the two addresses.

Returns:: A bool.

property identical: bool

Check if the parsing is the same for the two addresses and if the raw addresses are identical.

Returns:: A bool.

comparison_report(nb_delimiters: int | None = None) → None[source]: Print a formatted comparison report of the two addresses.

Formatted Compared Addresses Raw

class deepparse.comparer.FormattedComparedAddressesRaw(first_address: FormattedParsedAddress, second_address: FormattedParsedAddress, origin: Tuple[str, str], with_prob: bool)[source]: A formatted compared address of two raw (not parsed) addresses.

Formatted Compared Addresses Tags

class deepparse.comparer.FormattedComparedAddressesTags(first_address: FormattedParsedAddress, second_address: FormattedParsedAddress, origin: Tuple[str, str], with_prob: bool)[source]: A formatted compared address of two already tagged addresses.