No message

Here is Deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning.

Use deepparse to

  • parse multinational address using one of our pretrained models with or without attention mechanism,

  • parse addresses directly from the command line without code to write,

  • parse addresses with our out-of-the-box FastAPI parser,

  • retrain our pretrained models on new data to improve parsing on specific country address patterns,

  • retrain our pretrained models with new prediction tags easily,

  • retrain our pretrained models with or without freezing some layers,

  • train a new Seq2Seq addresses parsing models easily using a new model configuration.

Deepparse is compatible with the latest version of PyTorch and Python >= 3.8.

Countries and Results

We evaluate our models on two forms of address data

  • clean data which refers to addresses containing elements from four categories, namely a street name, a municipality, a province and a postal code,

  • incomplete data which is made up of addresses missing at least one category amongst the aforementioned ones.

You can get our dataset here.

Clean Data

The following table presents the accuracy on the 20 countries (using clean data) we used during training for both our models. Attention mechanisms improve performance by around 0.5% for all countries.

Country

FastText (%)

BPEmb (%)

Country

FastText (%)

BPEmb (%)

Norway

99.06

98.3

Austria

99.21

97.82

Italy

99.65

98.93

Mexico

99.49

98.9

United Kingdom

99.58

97.62

Switzerland

98.9

98.38

Germany

99.72

99.4

Denmark

99.71

99.55

France

99.6

98.18

Brazil

99.31

97.69

Netherlands

99.47

99.54

Australia

99.68

98.44

Poland

99.64

99.52

Czechia

99.48

99.03

United States

99.56

97.69

Canada

99.76

99.03

South Korea

99.97

99.99

Russia

98.9

96.97

Spain

99.73

99.4

Finland

99.77

99.76

We have also made a zero-shot evaluation of our models using clean data from 41 other countries; the results are shown in the next table.

Country

FastText (%)

BPEmb (%)

Country

FastText (%)

BPEmb (%)

Latvia

89.29

68.31

Faroe Islands

71.22

64.74

Colombia

85.96

68.09

Singapore

86.03

67.19

Réunion

84.3

78.65

Indonesia

62.38

63.04

Japan

36.26

34.97

Portugal

93.09

72.01

Algeria

86.32

70.59

Belgium

93.14

86.06

Malaysia

83.14

89.64

Ukraine

93.34

89.42

Estonia

87.62

70.08

Bangladesh

72.28

65.63

Slovenia

89.01

83.96

Hungary

51.52

37.87

Bermuda

83.19

59.16

Romania

90.04

82.9

Philippines

63.91

57.36

Belarus

93.25

78.59

Bosnia

88.54

67.46

Moldova

89.22

57.48

Lithuania

93.28

69.97

Paraguay

96.02

87.07

Croatia

95.8

81.76

Argentina

81.68

71.2

Ireland

80.16

54.44

Kazakhstan

89.04

76.13

Greece

87.08

38.95

Bulgaria

91.16

65.76

Serbia

92.87

76.79

New Caledonia

94.45

94.46

Sweden

73.13

86.85

Venezuela

79.23

70.88

New Zealand

91.25

75.57

Iceland

83.7

77.09

India

70.3

63.68

Uzbekistan

85.85

70.1

Cyprus

89.64

89.47

Slovakia

78.34

68.96

South Africa

95.68

74.829

Moreover, we also tested the performance when using attention mechanism to further improve zero-shot performance on those countries; the result are shown in the next table.

Country

FastText (%)

FastTextAtt (%)

BPEmb (%)

BPEmbAtt (%)

Country

FastText (%)

FastTextAtt (%)

BPEmb (%)

BPEmbAtt (%)

Ireland

80.16

89.11

54.44

81.84

Serbia

92.87

95.88

76.79

91.4

Uzbekistan

85.85

87.24

70.1

76.71

Ukraine

93.34

94.58

89.42

92.65

South Africa

95.68

97.25

74.82

97.95

Paraguay

96.02

97.08

87.07

97.36

Greece

87.08

86.04

38.95

58.79

Algeria

86.32

87.3

70.59

84.56

Belarus

93.25

97.4

78.59

97.49

Sweden

73.13

89.24

86.85

93.53

Portugal

93.09

94.92

72.01

93.76

Hungary

51.52

51.08

37.87

24.48

Iceland

83.7

96.54

77.09

96.63

Colombia

85.96

90.08

68.09

88.52

Latvia

89.29

93.14

68.31

73.79

Malaysia

83.14

74.62

89.64

91.14

Bosnia

88.54

87.27

67.46

89.02

India

70.3

75.31

63.68

80.56

Réunion

84.3

97.74

78.65

94.27

Croatia

95.8

95.32

81.76

85.99

Estonia

87.62

88.2

70.08

77.32

New Caledonia

94.45

99.61

94.46

99.77

Japan

36.26

46.91

34.97

49.48

New Zealand

91.25

97.0

75.57

95.7

Singapore

86.03

89.92

67.19

88.17

Romania

90.04

95.38

82.9

93.41

Bangladesh

72.28

78.21

65.63

77.09

Slovakia

78.34

82.29

68.96

96.0

Argentina

81.68

88.59

71.2

86.8

Kazakhstan

89.04

92.37

76.13

96.08

Venezuela

79.23

95.47

70.88

96.38

Indonesia

62.38

66.87

63.04

71.17

Bulgaria

91.16

91.73

65.76

93.28

Cyprus

89.64

97.44

89.47

98.01

Bermuda

83.19

93.25

59.16

93.8

Moldova

89.22

92.07

57.48

89.08

Slovenia

89.01

95.08

83.96

96.73

Lithuania

93.28

87.74

69.97

78.67

Philippines

63.91

81.94

57.36

83.42

Belgium

93.14

90.72

86.06

89.85

Faroe Islands

71.22

73.23

64.74

85.39

Incomplete Data

The following table presents the accuracy on the 20 countries we used during training for both our models but for incomplete data. We didn’t test on the other 41 countries since we did not train on them and therefore do not expect to achieve an interesting performance. Attention mechanisms improve performance by around 0.5% for all countries.

Country

FastText (%)

BPEmb (%)

Country

FastText (%)

BPEmb (%)

Norway

99.52

99.75

Austria

99.55

98.94

Italy

99.16

98.88

Mexico

97.24

95.93

United Kingdom

97.85

95.2

Switzerland

99.2

99.47

Germany

99.41

99.38

Denmark

97.86

97.9

France

99.51

98.49

Brazil

98.96

97.12

Netherlands

98.74

99.46

Australia

99.34

98.7

Poland

99.43

99.41

Czechia

98.78

98.88

United States

98.49

96.5

Canada

98.96

96.98

South Korea

91.1

99.89

Russia

97.18

96.01

Spain

99.07

98.35

Finland

99.04

99.52

Cite

@misc{yassine2020leveraging,
    title={{Leveraging Subword Embeddings for Multinational Address Parsing}},
    author={Marouane Yassine and David Beauchemin and François Laviolette and Luc Lamontagne},
    year={2020},
    eprint={2006.16152},
    archivePrefix={arXiv}
}

and this one for the package;

@misc{deepparse,
    author = {Marouane Yassine and David Beauchemin},
    title  = {{Deepparse: A State-Of-The-Art Deep Learning Multinational Addresses Parser}},
    year   = {2020},
    note   = {\url{https://deepparse.org}}
}

Contributing to Deepparse

We welcome user input, whether it is regarding bugs found in the library or feature propositions ! Make sure to have a look at our contributing guidelines for more details on this matter.

License

Deepparse is LGPLv3 licensed, as found in the LICENSE file.

Installation

Get Started

Model training

Indices and Tables