Tag your (Early) Modern French text
You can find more information on this specific model here: https://github.com/e-ditiones/LEM17
You can find more information on this specific model here: https://github.com/e-ditiones/LEM17
Orig | ||||||
---|---|---|---|---|---|---|
16 | 17 | 18 | 19 | 20 | All | |
Drama | 94.73 | 97.42 | 97.47 | 98.56 | 97.86 | 97.25 |
Varia | 96.23 | 98.09 | 98.27 | 98.23 | 97.46 | 97.66 |
Both | 95.51 | 97.76 | 97.88 | 98.39 | 97.66 | 97.46 |
Norm | ||||||
16 | 17 | 18 | 19 | 20 | All | |
Drama | 97.36 | 98.41 | 98.51 | 98.56 | 97.86 | 98.15 |
Varia | 98 | 98.4 | 98.54 | 98.23 | 97.46 | 98.13 |
Both | 97.69 | 98.4 | 98.53 | 98.39 | 97.66 | 98.14 | >
Orig | ||||||
---|---|---|---|---|---|---|
16 | 17 | 18 | 19 | 20 | All | |
Drama | 90.34 | 94.47 | 94.64 | 95.03 | 93.71 | 93.69 |
Varia | 89.85 | 93.44 | 95.98 | 92.24 | 94.03 | 93.14 |
Both | 90.08 | 93.95 | 95.33 | 93.65 | 93.87 | 93.41 | >
Norm | ||||||
16 | 17 | 18 | 19 | 20 | All | |
Drama | 93.69 | 95.75 | 95.61 | 95.03 | 93.71 | 94.76 |
Varia | 92.52 | 94.81 | 95.98 | 92.24 | 94.03 | 93.94 |
Both | 93.08 | 95.28 | 95.8 | 93.65 | 93.87 | 94.35 | >
The model is trained on normalised (i.e. "translated" into contemporary French) and non-normalised transcriptions.
The model was trained on the following corpora :
The annotations are made according to the following reference lists:
More information on the annotation practice can be found in Simon Gabay, Jean-Baptiste Camps, Thibault Clérice, Manuel d'annotation linguistique pour le français moderne (XVIe -XVIIIe siècles) 2020: https://hal.archives-ouvertes.fr/hal-02571190.
Sample from annotation:
form lemma POS morph
Pour pour PRE MORPH=empty
moi je PROper PERS.=1|NOMB.=s|GENRE=x|CAS=i
je je PROper PERS.=1|NOMB.=s|GENRE=x|CAS=n
suis être VERcjg MODE=ind|TEMPS=pst|PERS.=1|NOMB.=s
toûjours toujours ADVgen MORPH=empty
ici ici ADVgen MORPH=empty
, , PONfbl MORPH=empty
où que PROrel NOMB.=x|GENRE=x|CAS=i
, , PONfbl MORPH=empty
à à PRE MORPH=empty
des un DETndf NOMB.=p|GENRE=m
rumatismes rhumatisme NOMcom NOMB.=p|GENRE=f
près près ADVgen MORPH=empty
, , PONfbl MORPH=empty
je je PROper PERS.=1|NOMB.=s|GENRE=x|CAS=n
me je PROper PERS.=1|NOMB.=s|GENRE=x|CAS=r
suis être VERcjg MODE=ind|TEMPS=pst|PERS.=1|NOMB.=s
assez assez ADVgen MORPH=empty
bien bien ADVgen MORPH=empty
porté porter VERppe NOMB.=s|GENRE=m
. . PONfrt MORPH=empty
Several versions of the FREEM model have been released
Please remember that corpus creation and software engineering is valid research, so please cite these resources when you use this lemmatizer for your research: this includes the wonderful original research by E. Manjavacas, M. Kestemont and Á. Kádár as well as the software wrapping built to handle pre- and post-processing.
For each models, a bibliography and potentially other citable works are given, such as models and datasets are given.
@software{thibault_clerice_2020_3883590, author = {Clérice, Thibault}, title = {Pie Extended, an extension for Pie with pre-processing and post-processing}, month = jun, year = 2020, publisher = {Zenodo}, doi = {10.5281/zenodo.3883589}, url = {https://doi.org/10.5281/zenodo.3883589} } @inproceedings{manjavacas-etal-2019-improving, title = "Improving Lemmatization of Non-Standard Languages with Joint Learning", author = "Manjavacas, Enrique and K{\'a}d{\'a}r, {\'A}kos and Kestemont, Mike", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/N19-1153", doi = "10.18653/v1/N19-1153", pages = "1493--1503",}
@software{clerice_thibault_2019_3237455, author = {Gabay, Simon and Clérice, Thibault and Camps, Jean-Baptiste and Tanguy, Jean-Baptiste and Gille-Levenson, Matthias}, title = {Deucalion, Modèle Français moderne (0.1.0)}, month = jun, year = 2020, publisher = {GitHub}, version = {v1.0.0}, url = {https://github.com/e-ditiones/LEM17/releases/tag/v1} }
This lemmatizer is provided to you thanks to the data of the LASLA, the software of Emmanuel Manjavacas and Mike Kestemont and some engineering from the École nationale des chartes. If you want to cite them :