| 
 
 
 The stemming algorithmLetters in French include the following accented forms,The following letters are vowels:
    â     à     ç     ë     é     ê     è     ï     î     ô     û     ù
 Assume the word is in lower case. Then put into upper case u or i preceded
and followed by a vowel, and y preceded or followed by a vowel. u after q is
also put into upper case. For example,
    a     e     i     o     u     y     â     à     ë     é     ê     è     ï     î     ô     û     ù
 (The upper case forms are not then classed as vowels - see  note on vowel
marking.)
| jouer |  | -> |  | joUer |  | ennuie |  | -> |  | ennuIe |  | yeux |  | -> |  | Yeux |  | quand |  | -> |  | qUand | 
 
 If the word begins with two vowels, RV is the region after the third
letter, otherwise the region after the first vowel not at the beginning of
the word, or the end of the word if these positions cannot be found.
 
 For example,
 
     a i m e r     a d o r e r     v o l e r
         |...|         |.....|       |.....|
R1 is the region after the first non-vowel following a vowel, or the end of
the word if there is no such non-vowel.
R2 is the region after the first non-vowel following a vowel in R1, or the
end of the word if there is no such non-vowel.
(See  note on R1 and R2.)
 For example:
 
     f a m e u s e m e n t
         |......R1.......|
               |...R2....|
Note that R1 can contain RV (adorer), and RV can contain R1 (voler).
 Below, ‘delete if in R2’ means that a found suffix should be removed if it
lies entirely in R2, but not if it overlaps R2 and the rest of the word.
‘delete if in R1 and preceded by X’ means that X itself does not have to
come in R1, while ‘delete if preceded by X in R1’ means that X, like the
suffix, must be entirely in R1.
 
 Start with step 1
 
 Step 1: Standard suffix removal
 In steps 2a and 2b all tests are confined to the RV region.
    Search for the longest among the following suffixes, and perform the
    action indicated.
 
ance     iqUe     isme     able     iste     eux     ances     iqUes     ismes     ables     istes
        delete if in R2
 
atrice     ateur     ation     atrices     ateurs     ations
        delete if in R2
        if preceded by ic, delete if in R2, else replace by iqU
 
logie     logies
        replace with log if in R2
 
usion     ution     usions     utions
        replace with u if in R2
 
ence     ences
        replace with ent if in R2
 
ement     ements
        delete if in RV
        if preceded by iv, delete if in R2 (and if further preceded by at,
        delete if in R2), otherwise,
        if preceded by eus, delete if in R2, else replace by eux
          if in R1, otherwise,
        if preceded by abl or iqU, delete if in R2, otherwise,
        if preceded by ièr or Ièr, delete if in RV
 
ité     ités
        delete if in R2
        if preceded by abil, delete if in R2, else replace by abl,
        otherwise,
        if preceded by ic, delete if in R2, else replace by iqU, otherwise,
        if preceded by iv, delete if in R2
 
if     ive     ifs     ives
        delete if in R2
        if preceded by at, delete if in R2 (and if further preceded by ic,
        delete if in R2, else replace by iqU)
 
eaux
        replace with eau
 
aux
        replace with al if in R1
 
euse     euses
        delete if in R2, else replace by eux if in R1
 
issement     issements
        delete if in R1 and preceded by a non-vowel
 
amment
        replace with ant if in RV
 
emment
        replace with ent if in RV
 
ment     ments
        delete if preceded by a vowel in RV
 
 Do step 2a if either no ending was removed by step 1, or if one of endings
amment, emment, ment, ments was found.
 
 Step 2a: Verb suffixes beginning i
 Do step 2b if step 2a was done, but failed to remove a suffix.
    Search for the longest among the following suffixes and if found,
    delete if preceded by a non-vowel.
 
        îmes     ît     îtes     i     ie     ies     ir     ira     irai     iraIent     irais     irait     iras
            irent     irez     iriez     irions     irons     iront     is     issaIent     issais     issait
            issant     issante     issantes     issants     isse     issent     isses     issez     issiez
            issions     issons     it
 (Note that the non-vowel itself must also be in RV.)
 
 Step 2b: Other verb suffixes
 If the last step to be obeyed - either step 1, 2a or 2b - altered the word,
do step 3
    Search for the longest among the following suffixes, and perform the
    action indicated.
 
ions
        delete if in R2
 
é     ée     ées     és     èrent     er     era     erai     eraIent     erais     erait     eras     erez
        eriez     erions     erons     eront     ez     iez
        delete
 
âmes     ât     âtes     a     ai     aIent     ais     ait     ant     ante     antes     ants     as     asse
        assent     asses     assiez     assions
        delete
        if preceded by e, delete
 (Note that the ethat may be deleted in this last step must also be in
    RV.)
 
 Step 3
 Alternatively, if the last step to be obeyed did not alter the word, do
step 4
    Replace final Y with i or final ç with c
 
 Step 4: Residual suffix
 Always do steps 5 and 6.
    If the word ends s, not preceded by a, i, o, u, è or s, delete it.
 In the rest of step 4, all tests are confined to the RV region.
 
 Search for the longest among the following suffixes, and perform the
    action indicated.
 
 
ion
        delete if in R2 and preceded by s or t
 
ier     ière     Ier     Ière
        replace with i
 
e
        delete
 
ë
        if preceded by gu, delete
 (So note that ionis removed only when it is in R2 - as well as being
    in RV- and preceded by sor twhich must be in RV.)
 
 Step 5: Undouble
 Step 6: Un-accent
    If the word ends enn, onn, ett, ell or eill, delete the last letter
 And finally:
    If the words ends é or è followed by at least one non-vowel, remove
    the accent from the e.
 
    Turn any remaining I, U and Y letters in the word back into lower case.
 |