Comparer
AddressesComparer
- class deepparse.comparer.AddressesComparer(parser: AddressParser)[source]
Address comparer is used to compare addresses with each other and retrieve the differences between them. The addresses are parsed using an address parser based on one of the seq2seq pretrained networks, either with FastText or BPEmb.
The address comparer can compare already parsed addresses. The address parser first recomposes the raw addresses then suggest its own tags; then it makes a comparison with the tags of the source parsing and the newly parsed address
The address comparer is also able to compare raw addresses by first parsing the addresses using the address parser and then bring out the differences among the parsed addresses.
- Parameters:
parser (AddressParser) – the AddressParser used to parse the addresses.
- compare_tags(addresses_tags_to_compare: List[tuple] | List[List[tuple]], with_prob: None | bool = None) List[FormattedComparedAddressesTags] | FormattedComparedAddressesTags [source]
Compare tags of a source parsing with the parsing from AddressParser. First, it reconstructs the raw address from the parsing, AddressParser generates tags and compares the two parsings.
- Parameters:
addresses_tags_to_compare (Union[List[tuple], List[List[tuple]]]) – list of tuples that contain
a (the tags for the address components from the source. Can compare multiple parsings if passed as) –
tuples. (list of) –
with_prob (Union[None, bool]) – An option flag to either or not include probabilities in the comparison report. The probabilities are not compared but only included in the report. The default value is
None
, which means not taking into account.
- Returns:
Either a
FormattedComparedAddressesTags
or a list ofFormattedComparedAddressTags
when there is more than one comparison to make.
Examples
first_parsed_address = [ ("350", "StreetNumber"), ("rue des Lilas", "StreetName"), ("Ouest Québec", "Municipality"), ("Québec", "Province"), ("G1L 1B6", "PostalCode")] second_parsed_address_with_prob = [ ('350', ('StreetNumber', 1.0)), ('rue', ('StreetName', 0.9987)), ('des', ('StreetName', 0.9993)), ('Lilas', ('StreetName', 0.8176)), ('Ouest', ('Orientation', 0.781)), ('Quebec', ('Municipality', 0.9768)), ('Quebec', ('Province', 1.0)), ('G1L', ('PostalCode', 0.9993)), ('1B6', ('PostalCode', 1.0))] address_parser = AddressParser(model_type="bpemb") addresses_comparer = AddressesComparer(address_parser) list_of_compared_addresses = addresses_comparer.compare_tags([first_parsed_address, second_parsed_address_with_prob]) list_of_compared_addresses[0].comparison_report() list_of_compared_addresses[1].comparison_report()
- compare_raw(raw_addresses_to_compare: Tuple[str] | List[Tuple[str]], with_prob: None | bool = None) List[FormattedComparedAddressesRaw] [source]
Compare a list of raw addresses together. It starts by parsing the addresses with the parser and then return the differences between the parsed address components of the two addresses.
- Parameters:
raw_addresses_to_compare (Union[Tuple[str], List[Tuple[str]]]) – List of strings that represent raw addresses to compare.
with_prob (Union[None, bool]) – An option flag to either or not include probabilities in the comparison report. The probabilities are not compared but only included in the report. The default value is
None
, which means not taking into account.
- Returns:
Either a
FormattedComparedAddressesRaw
or a list ofFormattedComparedAddressesRaw
when given more than one comparison to make.
Examples
raw_address_original = "350 rue des Lilas Ouest Quebec Quebec G1L 1B6" raw_address_identical = "350 rue des Lilas Ouest Quebec Quebec G1L 1B6" raw_address_equivalent = "350 rue des Lilas Ouest Quebec Quebec G1L 1B6" raw_address_diff_streetNumber = "450 rue des Lilas Ouest Quebec Quebec G1L 1B6" raw_addresses_multiples_comparisons = addresses_comparer.compare_raw([(raw_address_original, raw_address_identical), (raw_address_original, raw_address_equivalent), (raw_address_original, raw_address_diff_streetNumber)]) raw_addresses_multiples_comparisons[0].comparison_report() raw_addresses_multiples_comparisons[1].comparison_report() raw_addresses_multiples_comparisons[2].comparison_report()
Formatted Compared Addresses
- class deepparse.comparer.FormattedComparedAddresses(first_address: FormattedParsedAddress, second_address: FormattedParsedAddress, origin: Tuple[str, str], with_prob: bool)[source]
Abstract method that defined a comparison for addresses returned by the address comparer.
- Parameters:
first_address (FormattedParsedAddress) – A formatted parsed address that contains the parsing information for the first one.
second_address (FormattedParsedAddress) – A formatted parsed address that contains the parsing information for the second one.
origin – (Tuple[str, str]): The origin of the parsing (ex : from source or a Deepparse pretrained model).
Example
address_comparer = AddressesComparer(AddressParser()) raw_identical_comparison = address_comparer.compare_raw( ("350 rue des Lilas Ouest Quebec city Quebec G1L 1B6", "450 rue des Lilas Ouest Quebec city Quebec G1L 1B6"))
- property list_of_bool: List
A list of boolean that contains all the address components’ names and indicates if it is the same for the two addresses.
- Returns:
A list of the boolean.
Formatted Compared Addresses Raw
- class deepparse.comparer.FormattedComparedAddressesRaw(first_address: FormattedParsedAddress, second_address: FormattedParsedAddress, origin: Tuple[str, str], with_prob: bool)[source]
A formatted compared address of two raw (not parsed) addresses.