1 """
2 Finite state transducers.
3
4 A finite state trasducer, or FST, is a directed graph that is used to
5 encode a mapping from a set of I{input strings} to a set of I{output
6 strings}. An X{input string} is a sequence of immutable values (such
7 as integers, characters, or strings) called X{input symbols}.
8 Similarly, an C{output string} is a sequence of immutable values
9 called X{output symbols}. Collectively, input strings and output
10 strings are called X{symbol strings}, or simply X{strings} for short.
11 Note that this notion of I{string} is different from the python string
12 type -- symbol strings are always encoded as tuples of input or output
13 symbols, even if those symbols are characters. Also, note that empty
14 sequences are valid symbol strings.
15
16 The nodes of an FST are called X{states}, and the edges are called
17 X{transition arcs} or simply X{arcs}. States may be marked as
18 X{final}, and each final state is annotated with an output string,
19 called the X{finalizing string}. Each arc is annotated with an input
20 string and an output string. An arc with an empty input string is
21 called an I{epsilon-input arc}; and an arc with an empty output
22 string is called an I{epsilon-output arc}.
23
24 The set of mappings encoded by the FST are defined by the set of paths
25 through the graph, starting at a special state known as the X{initial
26 state}, and ending at a final state. In particular, the FST maps an
27 input string X to an output string Y iff there exists a path from the
28 initial state to a final state such that:
29
30 - The input string X is formed by concatenating the input strings
31 of the arcs along the path (in order).
32 - The output string Y is formed by concatenating the output strings
33 of the arcs along the path (in order), plus the final state's
34 output string.
35
36 The following list defines some terms that apply to finite state
37 transducers.
38
39 - The X{transduction} defined by a FST is the mapping from input
40 strings to output strings.
41
42 - An FST X{encodes a deterministic transduction} if each input
43 string maps to at most one output string. An FST X{encodes a
44 nondeterministic transduction} if any input string maps to more
45 than one output string.
46
47 - An FST is X{deterministic} if it every state contains at most one
48 outgoing arc that is consistent with any input string; otherwise,
49 the FST is X{nondeterministic}. If an FST is deterministic, then
50 it necessarily encodes a deterministic transduction; however, it
51 is possible to define an FST that is nondeterministic but that
52 encodes a deterministic transduction.
53
54 - An FST is X{sequential} if each arc is labeled with exactly one
55 input symbol, no two outgoing arcs from any state have the same
56 input symbol, and all finalizing strings are empty. (Sequential
57 implies deterministic).
58
59 - An FST is I{subsequential} if each arc is labeled with exactly
60 one input symbol, and no two outgoing arcs from any state have
61 the same input symbol. (Finalizing strings may be non-empty.)
62
63 An FSA can be represented as an FST that generates no output symbols.
64
65 The current FST class does not provide support for:
66
67 - Weighted arcs. (However, weights can be used as, or included
68 in, the output symbols. The total weight of a path can then
69 be found after transduction by combining the weights. But
70 there's no support for e.g., finding the path with the minimum
71 weight.
72
73 - Multiple initial states.
74
75 - Initializing strings (an output string associated with the initial
76 state, which is always generated when the FST begins).
77
78 Possible future changes:
79
80 - Define several classes, in a class hierarchy? E.g., FSA is a base
81 class, FST inherits from it. And maybe a further subclass to add
82 finalizing sequences. I would need to be more careful to only
83 access the private variables when necessary, and to usually go
84 through the accessor functions.
85 """
86
87 import re, os, random, tempfile
88 from subprocess import Popen, PIPE
89 from nltk_lite.draw import *
90 from nltk_lite.contrib.fst.draw_graph import *
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
115 """
116 A finite state transducer. Each state is uniquely identified by a
117 label, which is typically a string name or an integer id. A
118 state's label is used to access and modify the state. Similarly,
119 each arc is uniquely identified by a label, which is used to
120 access and modify the arc.
121
122 The set of arcs pointing away from a state are that state's
123 I{outgoing} arcs. The set of arcs pointing to a state are that
124 state's I{incoming} arcs. The state at which an arc originates is
125 that arc's I{source} state (or C{src}), and the state at which it
126 terminates is its I{destination} state (or C{dst}).
127
128 It is possible to define an C{FST} object with no initial state.
129 This is represented by assigning a value of C{None} to the
130 C{initial_state} variable. C{FST}s with no initial state are
131 considered to encode an empty mapping. I.e., transducing any
132 string with such an C{FST} will result in failure.
133 """
135 """
136 Create a new finite state transducer, containing no states.
137 """
138 self.label = label
139 """A label identifying this FST. This is used for display &
140 debugging purposes only."""
141
142
143 self._initial_state = None
144 """The label of the initial state, or C{None} if this FST
145 does not have an initial state."""
146
147 self._incoming = {}
148 """A dictionary mapping state labels to lists of incoming
149 transition arc labels."""
150
151 self._outgoing = {}
152 """A dictionary mapping state labels to lists of outgoing
153 transition arc labels."""
154
155 self._is_final = {}
156 """A dictionary mapping state labels to boolean values,
157 indicating whether the state is final."""
158
159 self._finalizing_string = {}
160 """A dictionary mapping state labels of final states to output
161 strings. This string should be added to the output
162 if the FST terminates at this state."""
163
164 self._state_descr = {}
165 """A dictionary mapping state labels to (optional) state
166 descriptions."""
167
168
169
170 self._src = {}
171 """A dictionary mapping each transition arc label to the label of
172 its source state."""
173
174 self._dst = {}
175 """A dictionary mapping each transition arc label to the label of
176 its destination state."""
177
178 self._in_string = {}
179 """A dictionary mapping each transition arc label to its input
180 string, a (possibly empty) tuple of input symbols."""
181
182 self._out_string = {}
183 """A dictionary mapping each transition arc label to its output
184 string, a (possibly empty) tuple of input symbols."""
185
186 self._arc_descr = {}
187 """A dictionary mapping transition arc labels to (optional)
188 arc descriptions."""
189
190
191
192
193
194
196 """Return an iterator that will generate the state label of
197 each state in this FST."""
198 return iter(self._incoming)
199
201 """Return true if this FST contains a state with the given
202 label."""
203 return label in self._incoming
204
206 return self._initial_state
208 if label is not None and label not in self._incoming:
209 raise ValueError('Unknown state label %r' % label)
210 self._initial_state = label
211 initial_state = property(_get_initial_state, _set_initial_state,
212 doc="The label of the initial state (R/W).")
213
215 """Return an iterator that will generate the incoming
216 transition arcs for the given state. The effects of modifying
217 the FST's state while iterating are undefined, so if you plan
218 to modify the state, you should copy the incoming transition
219 arcs into a list first."""
220 return iter(self._incoming[state])
221
223 """Return an iterator that will generate the outgoing
224 transition arcs for the given state. The effects of modifying
225 the FST's state while iterating are undefined, so if you plan
226 to modify the state, you should copy the outgoing transition
227 arcs into a list first."""
228 return iter(self._outgoing[state])
229
231 """Return true if the state with the given state label is
232 final."""
233 return self._is_final[state]
234
236 """Return the output string associated with the given final
237 state. If the FST terminates at this state, then this string
238 will be emitted."""
239
240
241 return self._finalizing_string.get(state, ())
242
244 """Return the description for the given state, if it has one;
245 or None, otherwise."""
246 return self._state_descr.get(state)
247
248
249
250
251
253 """Return an iterator that will generate the arc label of
254 each transition arc in this FST."""
255 return iter(self._src)
256
257 - def src(self, arc):
258 """Return the state label of this transition arc's source
259 state."""
260 return self._src[arc]
261
262 - def dst(self, arc):
263 """Return the state label of this transition arc's destination
264 state."""
265 return self._dst[arc]
266
268 """Return the given transition arc's input string, a (possibly
269 empty) tuple of input symbols."""
270 return self._in_string[arc]
271
273 """Return the given transition arc's output string, a
274 (possibly empty) tuple of output symbols."""
275 return self._out_string[arc]
276
278 """Return the description for the given transition arc, if it
279 has one; or None, otherwise."""
280 return self._arc_descr.get(arc)
281
283 """Return a tuple (src, dst, in_string, out_string) for the
284 given arc, where:
285 - C{src} is the label of the arc's source state.
286 - C{dst} is the label of the arc's destination state.
287 - C{in_string} is the arc's input string.
288 - C{out_string} is the arc's output string.
289 """
290 return (self._src[arc], self._dst[arc],
291 self._in_string[arc], self._out_string[arc])
292
293
294
295
296
304
317
318
319
320
321
322 - def add_state(self, label=None, is_final=False,
323 finalizing_string=(), descr=None):
324 """
325 Create a new state, and return its label. The new state will
326 have no incoming or outgoing arcs. If C{label} is specified,
327 then it will be used as the state's label; otherwise, a new
328 unique label value will be chosen. The new state will be
329 final iff C{is_final} is true. C{descr} is an optional
330 description string for the new state.
331
332 Arguments should be specified using keywords!
333 """
334 label = self._pick_label(label, 'state', self._incoming)
335
336
337 self._incoming[label] = []
338 self._outgoing[label] = []
339 self._is_final[label] = is_final
340 self._state_descr[label] = descr
341 self._finalizing_string[label] = tuple(finalizing_string)
342
343
344 return label
345
347 """
348 Delete the state with the given label. This will
349 automatically delete any incoming or outgoing arcs attached to
350 the state.
351 """
352 if label not in self._incoming:
353 raise ValueError('Unknown state label %r' % label)
354
355
356 for arc in self._incoming[label]:
357 del (self._src[arc], self._dst[arc], self._in_string[arc],
358 self._out_string[arc], self._arc_descr[arc])
359 for arc in self._outgoing[label]:
360 del (self._src[arc], self._dst[arc], self._in_string[arc],
361 self._out_string[arc], self._arc_descr[arc])
362
363
364 del (self._incoming[label], self._otugoing[label],
365 self._is_final[label], self._state_descr[label],
366 self._finalizing_string[label])
367
368
369 if label == self._initial_state:
370 self._initial_state = None
371
373 """
374 If C{is_final} is true, then make the state with the given
375 label final; if C{is_final} is false, then make the state with
376 the given label non-final.
377 """
378 if state not in self._incoming:
379 raise ValueError('Unknown state label %r' % state)
380 self._is_final[state] = is_final
381
383 """
384 Set the given state's finalizing string.
385 """
386 if not self._is_final[state]:
387 raise ValueError('%s is not a final state' % state)
388 if state not in self._incoming:
389 raise ValueError('Unknown state label %r' % state)
390 self._finalizing_string[state] = tuple(finalizing_string)
391
393 """
394 Set the given state's description string.
395 """
396 if state not in self._incoming:
397 raise ValueError('Unknown state label %r' % state)
398 self._state_descr[state] = descr
399
400 - def dup_state(self, orig_state, label=None):
401 """
402 Duplicate an existing state. I.e., create a new state M{s}
403 such that:
404 - M{s} is final iff C{orig_state} is final.
405 - If C{orig_state} is final, then M{s.finalizing_string}
406 is copied from C{orig_state}
407 - For each outgoing arc from C{orig_state}, M{s} has an
408 outgoing arc with the same input string, output
409 string, and destination state.
410
411 Note that if C{orig_state} contained self-loop arcs, then the
412 corresponding arcs in M{s} will point to C{orig_state} (i.e.,
413 they will I{not} be self-loop arcs).
414
415 The state description is I{not} copied.
416
417 @param label: The label for the new state. If not specified,
418 a unique integer will be used.
419 """
420 if orig_state not in self._incoming:
421 raise ValueError('Unknown state label %r' % src)
422
423
424 new_state = self.add_state(label=label)
425
426
427 if self.is_final(orig_state):
428 self.set_final(new_state)
429 self.set_finalizing_string(new_state,
430 self.finalizing_string(orig_state))
431
432
433 for arc in self._outgoing[orig_state]:
434 self.add_arc(src=new_state, dst=self._dst[arc],
435 in_string=self._in_string[arc],
436 out_string=self._out_string[arc])
437
438 return new_state
439
440
441
442
443
444 - def add_arc(self, src, dst, in_string, out_string,
445 label=None, descr=None):
446 """
447 Create a new transition arc, and return its label.
448
449 Arguments should be specified using keywords!
450
451 @param src: The label of the source state.
452 @param dst: The label of the destination state.
453 @param in_string: The input string, a (possibly empty) tuple of
454 input symbols. Input symbols should be hashable
455 immutable objects.
456 @param out_string: The output string, a (possibly empty) tuple
457 of output symbols. Output symbols should be hashable
458 immutable objects.
459 """
460 label = self._pick_label(label, 'arc', self._src)
461
462
463 if src not in self._incoming:
464 raise ValueError('Unknown state label %r' % src)
465 if dst not in self._incoming:
466 raise ValueError('Unknown state label %r' % dst)
467
468
469 self._src[label] = src
470 self._dst[label] = dst
471 self._in_string[label] = tuple(in_string)
472 self._out_string[label] = tuple(out_string)
473 self._arc_descr[label] = descr
474
475
476 self._incoming[dst].append(label)
477 self._outgoing[src].append(label)
478
479
480 return label
481
483 """
484 Delete the transition arc with the given label.
485 """
486 if label not in self._src:
487 raise ValueError('Unknown arc label %r' % src)
488
489
490 self._incoming[self._dst[label]].remove(label)
491 self._outgoing[self._src[label]].remove(label)
492
493
494 del (self._src[label], self._dst[label], self._in_string[label],
495 self._out_string[label], self._arc_descr[label])
496
497
498
499
500
502 """Swap all in_string/out_string pairs."""
503 fst = self.copy()
504 fst._in_string, fst._out_string = fst._out_string, fst._in_string
505 return fst
506
508 """Reverse the direction of all transition arcs."""
509 fst = self.copy()
510 fst._incoming, fst._outgoing = fst._outgoing, fst._incoming
511 fst._src, fst._dst = fst._dst, fst._src
512 return fst
513
515 fst = self.copy()
516
517 if fst.initial_state is None:
518 raise ValueError("No initial state!")
519
520
521
522 queue = [fst.initial_state]
523 path_from_init = set(queue)
524 while queue:
525 state = queue.pop()
526 dsts = [fst.dst(arc) for arc in fst.outgoing(state)]
527 queue += [s for s in dsts if s not in path_from_init]
528 path_from_init.update(dsts)
529
530
531
532 queue = [s for s in fst.states() if fst.is_final(s)]
533 path_to_final = set(queue)
534 while queue:
535 state = queue.pop()
536 srcs = [fst.src(arc) for arc in fst.incoming(state)]
537 queue += [s for s in srcs if s not in path_to_final]
538 path_to_final.update(srcs)
539
540
541
542 for state in list(fst.states()):
543 if not (state in path_from_init and state in path_to_final):
544 fst.del_state(state)
545
546 return fst
547
548 - def relabeled(self, label=None, relabel_states=True, relabel_arcs=True):
549 """
550 Return a new FST that is identical to this FST, except that
551 all state and arc labels have been replaced with new labels.
552 These new labels are consecutive integers, starting with zero.
553
554 @param relabel_states: If false, then don't relabel the states.
555 @param relabel_arcs: If false, then don't relabel the arcs.
556 """
557 if label is None: label = '%s (relabeled)' % self.label
558 fst = FST(label)
559
560
561
562 state_ids = self._relabel_state_ids(self.initial_state, {})
563 if len(state_ids) < len(self._outgoing):
564 for state in self.states():
565 if state not in state_ids:
566 state_ids[state] = len(state_ids)
567
568
569
570 arcs = sorted(self.arcs(), key=self.arc_info)
571 arc_ids = dict([(a,i) for (i,a) in enumerate(arcs)])
572
573 for state in self.states():
574 if relabel_states: label = state_ids[state]
575 else: label = state
576 fst.add_state(label, is_final=self.is_final(state),
577 finalizing_string=self.finalizing_string(state),
578 descr=self.state_descr(state))
579
580 for arc in self.arcs():
581 if relabel_arcs: label = arc_ids[arc]
582 else: label = arc
583 src, dst, in_string, out_string = self.arc_info(arc)
584 if relabel_states:
585 src = state_ids[src]
586 dst = state_ids[dst]
587 fst.add_arc(src=src, dst=dst, in_string=in_string,
588 out_string=out_string,
589 label=label, descr=self.arc_descr(arc))
590
591 if relabel_states:
592 fst.initial_state = state_ids[self.initial_state]
593 else:
594 fst.initial_state = self.initial_state
595
596 return fst
597
599 """
600 A helper function for L{relabel()}, which decides which new
601 label should be assigned to each state.
602 """
603 if state in ids: return
604 ids[state] = len(ids)
605 for arc in sorted(self.outgoing(state),
606 key = lambda a:self.in_string(a)):
607 self._relabel_state_ids(self.dst(arc), ids)
608 return ids
609
611 """
612 Return a new FST which defines the same mapping as this FST,
613 but is determinized.
614
615 The algorithm used is based on [...].
616
617 @require: All arcs in this FST must have exactly one input
618 symbol.
619 @require: The mapping defined by this FST must be
620 deterministic.
621 @raise ValueError: If the determinization algorithm was unable
622 to determinize this FST. Typically, this happens because
623 a precondition is not met.
624 """
625
626 for arc in self.arcs():
627 if len(self.in_string(arc)) != 1:
628 raise ValueError("All arcs must have exactly one "
629 "input symbol.")
630
631
632
633
634 if label is None: label = '%s (determinized)' % self.label
635 new_fst = FST(label)
636
637 initial_state = frozenset( [(self.initial_state,())] )
638 new_fst.add_state(initial_state)
639 new_fst.initial_state = initial_state
640
641 queue = [initial_state]
642 while queue:
643 new_fst_state = queue.pop()
644
645
646
647
648
649
650
651
652 finalizing_strings = [w+self.finalizing_string(s)
653 for (s,w) in new_fst_state
654 if self.is_final(s)]
655 if len(set(finalizing_strings)) > 0:
656 if not self._all_equal(finalizing_strings):
657
658 raise ValueError("Determinization failed")
659 new_fst.set_final(new_fst_state)
660 new_fst.set_finalizing_string(new_fst_state,
661 finalizing_strings[0])
662
663
664
665 arc_table = {}
666 for (s,w) in new_fst_state:
667 for arc in self.outgoing(s):
668 sym = self.in_string(arc)[0]
669 dst = self.dst(arc)
670 residual = w + self.out_string(arc)
671 arc_table.setdefault(sym,{}).setdefault(dst,set())
672 arc_table[sym][dst].add(residual)
673
674
675
676
677
678
679
680
681
682
683
684 for sym in arc_table:
685 for dst in arc_table[sym]:
686 if len(arc_table[sym][dst]) > 1:
687
688
689 raise ValueError("Determinization failed")
690
691
692 dst_residual_pairs = [(dst, arc_table[sym][dst].pop())
693 for dst in arc_table[sym]]
694
695
696
697
698
699
700 residuals = [res for (dst, res) in dst_residual_pairs]
701 prefix = self._common_prefix(residuals)
702
703
704
705
706
707 new_arc_dst = frozenset([(dst, res[len(prefix):])
708 for (dst,res) in dst_residual_pairs])
709
710
711
712 if not new_fst.has_state(new_arc_dst):
713 new_fst.add_state(new_arc_dst)
714 queue.append(new_arc_dst)
715
716
717 new_fst.add_arc(src=new_fst_state, dst=new_arc_dst,
718 in_string=(sym,), out_string=prefix)
719 return new_fst
720
722 """Return true if all elements in the list are equal"""
723 for item in lst[1:]:
724 if item != lst[0]: return False
725 return True
726
728 """Return the longest sequence that is a prefix of all of the
729 given sequences."""
730 prefix = sequences[0]
731 for seq in sequences[1:]:
732
733
734 prefix = prefix[:len(seq)]
735
736
737
738 for i in range(len(prefix)):
739 if seq[i] != prefix[i]:
740 prefix = prefix[:i]
741 break
742 return prefix
743
744
745
746
747
748 - def copy(self, label=None):
749
750 if label is None: label = '%s-copy' % self.label
751 fst = FST(label)
752
753
754 fst._initial_state = self._initial_state
755 fst._incoming = self._incoming.copy()
756 fst._outgoing = self._outgoing.copy()
757 fst._is_final = self._is_final.copy()
758 fst._finalizing_string = self._finalizing_string.copy()
759 fst._state_descr = self._state_descr.copy()
760 fst._src = self._src.copy()
761 fst._dst = self._dst.copy()
762 fst._in_string = self._in_string.copy()
763 fst._out_string = self._out_string.copy()
764 fst._arc_descr = self._arc_descr.copy()
765 return fst
766
790
791 @staticmethod
792 - def load(filename):
795
796 @staticmethod
798 fst = FST(label)
799 prev_src = None
800 lines = s.split('\n')[::-1]
801 while lines:
802 line = lines.pop().split('#')[0].strip()
803 if not line: continue
804
805
806 m = re.match(r'->\s*(\S+)$', line)
807 if m:
808 label = m.group(1)
809 if not fst.has_state(label): fst.add_state(label)
810 fst.initial_state = label
811 continue
812
813
814 m = re.match('(\S+)\s*->\s*(?:\[([^\]]*)\])?$', line)
815 if m:
816 label, finalizing_string = m.groups()
817 if not fst.has_state(label): fst.add_state(label)
818 fst.set_final(label)
819 if finalizing_string is not None:
820 finalizing_string = finalizing_string.split()
821 fst.set_finalizing_string(label, finalizing_string)
822 continue
823
824
825 m = re.match('(\S+)$', line)
826 if m:
827 label = m.group(1)
828 if not fst.has_state(label): fst.add_state(label)
829 continue
830
831
832 m = re.match(r'descr\s+(\S+?):\s*(.*)$', line)
833 if m:
834 label, descr = m.groups()
835
836 while lines and re.match(r'\s+\S', lines[-1]):
837 descr = descr.rstrip()+' '+lines.pop().lstrip()
838 if not fst.has_state(label): fst.add_state(label)
839 fst.set_descr(label, descr)
840 continue
841
842
843 m = re.match(r'(\S+)?\s*->\s*(\S+)\s*'
844 r'\[(.*?):(.*?)\]$', line)
845 if m:
846 src, dst, in_string, out_string = m.groups()
847 if src is None: src = prev_src
848 if src is None: raise ValueError("bad line: %r" % line)
849 prev_src = src
850 if not fst.has_state(src): fst.add_state(src)
851 if not fst.has_state(dst): fst.add_state(dst)
852 in_string = tuple(in_string.split())
853 out_string = tuple(out_string.split())
854 fst.add_arc(src, dst, in_string, out_string)
855 continue
856
857 raise ValueError("bad line: %r" % line)
858
859 return fst
860
862 """
863 Return an AT&T graphviz dot graph.
864 """
865
866 lines = ['digraph %r {' % self.label,
867 'node [shape=ellipse]']
868 state_id = dict([(s,i) for (i,s) in enumerate(self.states())])
869 if self.initial_state is not None:
870 lines.append('init [shape="plaintext" label=""]')
871 lines.append('init -> %s' % state_id[self.initial_state])
872 for state in self.states():
873 if self.is_final(state):
874 final_str = self.finalizing_string(state)
875 if len(final_str)>0:
876 lines.append('%s [label="%s\\n%s", shape=doublecircle]' %
877 (state_id[state], state, ' '.join(final_str)))
878 else:
879 lines.append('%s [label="%s", shape=doublecircle]' %
880 (state_id[state], state))
881 else:
882 lines.append('%s [label="%s"]' % (state_id[state], state))
883 for arc in self.arcs():
884 src, dst, in_str, out_str = self.arc_info(arc)
885 lines.append('%s -> %s [label="%s:%s"]' %
886 (state_id[src], state_id[dst],
887 ' '.join(in_str), ' '.join(out_str)))
888 lines.append('}')
889 return '\n'.join(lines)
890
891
892
893
894
897
899 """
900 This is implemented as a generator, to make it easier to
901 support stepping.
902 """
903 if not self.is_subsequential():
904 raise ValueError('FST is not subsequential!')
905
906
907
908
909
910
911 transitions = {}
912 for arc in self.arcs():
913 src, dst, in_string, out_string = self.arc_info(arc)
914 assert len(in_string) == 1
915 assert (src, in_string[0]) not in transitions
916 transitions[src, in_string[0]] = (dst, out_string, arc)
917
918 output = []
919 state = self.initial_state
920 try:
921 for in_pos, in_sym in enumerate(input):
922 (state, out_string, arc) = transitions[state, in_sym]
923 if step: yield 'step', (arc, in_pos, output)
924 output += out_string
925 yield 'succeed', output
926 except KeyError:
927 yield 'fail', None
928
931
933 """
934 This is implemented as a generator, to make it easier to
935 support stepping.
936 """
937 input = tuple(input)
938 output = []
939 in_pos = 0
940
941
942
943
944
945
946
947
948
949 frontier = []
950
951
952
953 state = self.initial_state
954 while in_pos < len(input) or not self.is_final(state):
955
956 arcs = self.outgoing(state)
957
958
959
960
961
962 for arc in arcs:
963 in_string = self.in_string(arc)
964 if input[in_pos:in_pos+len(in_string)] == in_string:
965 frontier.append( (arc, in_pos, len(output)) )
966
967
968 if len(frontier) == 0:
969 yield 'fail', None
970
971
972 arc, in_pos, out_pos = frontier.pop()
973 if step:
974 yield 'step', (arc, in_pos, output[:out_pos])
975
976
977 state = self.dst(arc)
978 assert out_pos <= len(output)
979 in_pos = in_pos + len(self.in_string(arc))
980 output = output[:out_pos]
981 output.extend(self.out_string(arc))
982
983
984
985 output += self.finalizing_string(state)
986
987 yield 'succeed', output
988
989
990
991
992
993
995 """
996 Helper function for L{add_state} and C{add_arc} that chooses a
997 label for a new state or arc.
998 """
999 if label is not None and label in used_labels:
1000 raise ValueError("%s with label %r already exists" %
1001 (typ, label))
1002
1003 if label is not None:
1004 return label
1005 else:
1006 label = 1
1007 while '%s%d' % (typ[0], label) in used_labels: label += 1
1008 return '%s%d' % (typ[0], label)
1009
1010
1011
1012
1013
1181
1182
1183
1184
1185
1248
1251 self.cf = CanvasFrame(width=580, height=600, background='#f0f0f0')
1252 self.cf._parent.geometry('+650+50')
1253
1254 self.fst_widgets = {}
1255 for fst in fsts:
1256 self.add_fst(fst)
1257
1263
1266 self.top = Tk()
1267 f1 = Frame(self.top)
1268 f2 = Frame(self.top)
1269 f3 = Frame(self.top)
1270 f4 = Frame(self.top)
1271
1272
1273 self.cf = CanvasFrame(f1, width=800, height=400,
1274 background='#f0f0f0',
1275 relief="sunken", border="2")
1276 self.cf.pack(expand=True, fill='both')
1277
1278
1279 self.state_label = Label(f4, font=('bold', -16))
1280 self.state_label.pack(side='top', anchor='sw')
1281 self.state_descr = Text(f4, height=3, wrap='word', border="2",
1282 relief="sunken", font='helvetica',
1283 width=10)
1284 self.state_descr.pack(side='bottom', expand=True, fill='x')
1285
1286
1287 font = ('courier', -16, 'bold')
1288 Label(f2,text=' Input:', font='courier').pack(side='left')
1289 self.in_text = in_text = Text(f2, height=1, wrap='none',
1290 font=font, background='#c0ffc0',
1291
1292 width=10,
1293 highlightcolor='green',
1294 highlightbackground='green',
1295 highlightthickness=1)
1296 in_text.tag_config('highlight', foreground='white',
1297 background='green4')
1298 in_text.insert('end', 'r = reset; <space> = step')
1299 in_text.tag_add('read', '1.0', '1.4')
1300 in_text.pack(side='left', expand=True, fill="x")
1301
1302
1303 Label(f3,text='Output:', font='courier').pack(side='left')
1304 self.out_text = out_text = Text(f3, height=1, wrap='none',
1305 font=font, background='#ffc0c0',
1306
1307 width=10,
1308 highlightcolor='red',
1309 highlightbackground='red',
1310 highlightthickness=1)
1311 out_text.tag_config('highlight', foreground='white',
1312 background='red4')
1313 out_text.pack(side='left', expand=True, fill="x")
1314
1315 f1.pack(expand=True, fill='both', side='top', padx=5, pady=5)
1316 f4.pack(expand=False, fill='x', side='bottom', padx=5, pady=5)
1317 f3.pack(expand=False, fill='x', side='bottom', padx=5, pady=5)
1318 f2.pack(expand=False, fill='x', side='bottom', padx=5, pady=5)
1319
1320 self.top.title('FST')
1321 self.top.geometry('+650+50')
1322 self.top.bind('<Control-p>', lambda e: self.cf.print_to_file())
1323 self.top.bind('<Control-x>', self.destroy)
1324 self.top.bind('<Control-q>', self.destroy)
1325 self.top.bind('<space>', self.step)
1326 self.top.bind('r', lambda e: self.transduce(self.stepper_input))
1327
1328 self.stepper = None
1329
1330 self.graph = None
1331 self.set_fst(fst)
1332
1351
1352 - def step(self, *e):
1353 if self.stepper is None: return
1354
1355
1356 try: result, val = self.stepper.next()
1357 except StopIteration: return
1358
1359 if result == 'fail':
1360 self.stepper = None
1361 self.out_text.insert('end', ' (Failed!)')
1362 elif result == 'succeed':
1363 self.stepper = None
1364 self.out_text.delete('1.0', 'end')
1365 self.out_text.insert('end', ' '.join(val))
1366 self.out_text.tag_add('highlight', '1.0', 'end-1c')
1367 self.out_text.insert('end', ' (Finished!)')
1368 elif result == 'backtrack':
1369 self.out_text.insert('end', ' (Backtrack)')
1370 for state, widget in self.graph.state_widgets.items():
1371 if state == val: self.graph.mark_state(state, '#f0b0b0')
1372 else: self.graph.unmark_state(state)
1373 else:
1374 (arc, in_pos, output) = val
1375
1376
1377 in_pos += len(fst.in_string(arc))
1378 output = list(output)+list(fst.out_string(arc))
1379 self.in_text.delete('1.0', 'end')
1380 self.in_text.insert('end', ' '.join(self.stepper_input[:in_pos]))
1381 self.in_text.tag_add('highlight', '1.0', 'end-1c')
1382 if in_pos > 0:
1383 self.in_text.insert('end', ' ')
1384 l,r= self.in_text.xview()
1385 if (r-l) < 1:
1386 self.in_text.xview_moveto(1.0-(r-l)/2)
1387 self.in_text.insert('end', ' '.join(self.stepper_input[in_pos:]))
1388
1389
1390 self.out_text.delete('1.0', 'end')
1391 self.out_text.insert('end', ' '.join(output))
1392 self.out_text.tag_add('highlight', '1.0', 'end-1c')
1393
1394
1395 self.state_label['text'] = 'State = %s' % fst.dst(arc)
1396 self.state_descr.delete('1.0', 'end')
1397 state_descr = fst.state_descr(fst.dst(arc))
1398 self.state_descr.insert('end', state_descr or '')
1399
1400
1401 for state, widget in self.graph.state_widgets.items():
1402 if state == fst.dst(arc):
1403 self.graph.mark_state(state, '#00ff00')
1404 elif state == fst.src(arc):
1405 self.graph.mark_state(state, '#b0f0b0')
1406 else: self.graph.unmark_state(state)
1407
1408
1409 for a, widget in self.graph.arc_widgets.items():
1410 if a == arc: self.graph.mark_arc(a)
1411 else: self.graph.unmark_arc(a)
1412
1413
1414 l,r= self.out_text.xview()
1415 if (r-l) < 1:
1416 self.out_text.xview_moveto(1.0-(r-l)/2)
1417 self.out_text.insert('end', ' '*100)
1418
1426
1428 if self.top is None: return
1429 self.top.destroy()
1430 self.top = None
1431
1432 - def mainloop(self, *args, **kwargs):
1433 self.top.mainloop(*args, **kwargs)
1434
1435
1436
1437
1438
1439 if __name__ == '__main__':
1440
1441
1442 fst = FST.parse("test", """
1443 -> start
1444 start -> vp [john:john]
1445 start -> vp [mary:mary]
1446
1447 # Delay production of the determiner until we know the gender.
1448 start -> subj_noun [the:]
1449 subj_noun -> vp [dog:le chien]
1450 subj_noun -> vp [cow:la vache]
1451
1452 vp -> obj [eats:mange]
1453 obj -> obj_noun [the:]
1454 obj -> obj_noun [:]
1455 obj_noun -> end [grass:de l'herbe]
1456 obj_noun -> end [bread:du pain]
1457 end ->
1458 """)
1459
1460 print "john eats the bread ->"
1461 print ' '+ ' '.join(fst.transduce("john eats the bread".split()))
1462 rev = fst.inverted()
1463 print "la vache mange de l'herbe ->"
1464 print ' '+' '.join(rev.transduce("la vache mange de l'herbe".split()))
1465
1466 demo = FSTDemo(fst)
1467 demo.transduce("the cow eats the bread".split())
1468 demo.mainloop()
1469