Comparer
AddressesComparer
- class deepparse.comparer.AddressesComparer(parser: AddressParser)[source]
Address comparer to compare addresses with each other and retrieves the differences between them. The addresses are parsed using an address parser based on one of the seq2seq pretrained networks either with fastText or BPEmb.
The address comparer can compare already parsed addresses. The address parser first recompose the raw addresses then suggests its own tags, then it makes a comparison with the tags of the source parsing and the newly parsed address
The address comparer is also able to compare raw addresses by first parsing the addresses using the address parser and then brings out the differences among the parsed addresses.
- Parameters:
parser (AddressParser) – the AddressParser used to parse the addresses.
- compare_tags(addresses_tags_to_compare: List[tuple] | List[List[tuple]], with_prob: None | bool = None) List[FormattedComparedAddressesTags] | FormattedComparedAddressesTags [source]
Compare tags of a source parsing with the parsing from AddressParser. First, it reconstructs the raw address from the parsing, then AddressParser generates tags and then compares the two parsings.
- Parameters:
addresses_tags_to_compare (Union[List[tuple], List[List[tuple]]]) – list of tuple that contains
a (the tags for the address components from the source. Can compare multiples parsings if passed as) –
tuples. (list of) –
with_prob (Union[None, bool]) – A option flag to either or not include prob in the comparison report. The probabilities are not compared but only included in the report. The default value is None, which means not taking into account.
- Returns:
Either a
FormattedComparedAddressesTags
or a list ofFormattedComparedAddressTags
when there is more than one comparison to make.
Examples
first_parsed_address = [ ("350", "StreetNumber"), ("rue des Lilas", "StreetName"), ("Ouest Québec", "Municipality"), ("Québec", "Province"), ("G1L 1B6", "PostalCode")] second_parsed_address_with_prob = [ ('350', ('StreetNumber', 1.0)), ('rue', ('StreetName', 0.9987)), ('des', ('StreetName', 0.9993)), ('Lilas', ('StreetName', 0.8176)), ('Ouest', ('Orientation', 0.781)), ('Quebec', ('Municipality', 0.9768)), ('Quebec', ('Province', 1.0)), ('G1L', ('PostalCode', 0.9993)), ('1B6', ('PostalCode', 1.0))] address_parser = AddressParser(model_type="bpemb") addresses_comparer = AddressesComparer(address_parser) list_of_compared_addresses = addresses_comparer.compare_tags([first_parsed_address, second_parsed_address_with_prob]) list_of_compared_addresses[0].comparison_report() list_of_compared_addresses[1].comparison_report()
- compare_raw(raw_addresses_to_compare: Tuple[str] | List[Tuple[str]], with_prob: None | bool = None) List[FormattedComparedAddressesRaw] [source]
Compare a list of raw addresses together, it starts by parsing the addresses with the setted parser and then return the differences between the addresses components retrieved with our model.
- Parameters:
raw_addresses_to_compare (Union[Tuple[str], List[Tuple[str]]]) – List of string that represent raw addresses to compare.
with_prob (Union[None, bool]) – A option flag to either or not include prob in the comparison report. The probabilities are not compared but only included in the report. The default value is None, which means not taking into account.
- Returns:
Either a
FormattedComparedAddressesRaw
or a list ofFormattedComparedAddressesRaw
when given more than one comparison to make.
Examples
raw_address_original = "350 rue des Lilas Ouest Quebec Quebec G1L 1B6" raw_address_identical = "350 rue des Lilas Ouest Quebec Quebec G1L 1B6" raw_address_equivalent = "350 rue des Lilas Ouest Quebec Quebec G1L 1B6" raw_address_diff_streetNumber = "450 rue des Lilas Ouest Quebec Quebec G1L 1B6" raw_addresses_multiples_comparisons = addresses_comparer.compare_raw([(raw_address_original, raw_address_identical), (raw_address_original, raw_address_equivalent), (raw_address_original, raw_address_diff_streetNumber)]) raw_addresses_multiples_comparisons[0].comparison_report() raw_addresses_multiples_comparisons[1].comparison_report() raw_addresses_multiples_comparisons[2].comparison_report()
Formatted Compared Addresses
- class deepparse.comparer.FormattedComparedAddresses(first_address: FormattedParsedAddress, second_address: FormattedParsedAddress, origin: Tuple[str, str], with_prob: bool)[source]
Abstract method that defined a comparison for addresses returned by the address comparer.
- Parameters:
first_address (FormattedParsedAddress) – A formatted parsed address that contains the parsing information for the first one.
second_address (FormattedParsedAddress) – A formatted parsed address that contains the parsing information for the second one.
origin – (Tuple[str, str]): The origin of the parsing (ex : from source or from a deepparse pretrained model).
Example
address_comparer = AddressesComparer(AddressParser()) raw_identical_comparison = address_comparer.compare_raw( ("350 rue des Lilas Ouest Quebec city Quebec G1L 1B6", "450 rue des Lilas Ouest Quebec city Quebec G1L 1B6"))
- property list_of_bool: List
A list of boolean that contains all the address components names and indicates if it is the same for the two addresses.
- Returns:
A list of the boolean.
Formatted Compared Addresses Raw
- class deepparse.comparer.FormattedComparedAddressesRaw(first_address: FormattedParsedAddress, second_address: FormattedParsedAddress, origin: Tuple[str, str], with_prob: bool)[source]
A formatted compared address of two raw (not parsed) addresses.