text-hyphen

Description

Text::Hyphen is a Ruby library to hyphenate words in various languages using Ruby-fied versions of TeX hyphenation patterns. It will properly hyphenate various words according to the rules of the language the word is written in. The algorithm is based on that of the TeX typesetting system by Donald E. Knuth.

This is originally based on the Perl implementation of TeX::Hyphen and the Ruby port. The language hyphenation pattern files are based on the sources available from CTAN as of 2004.12.19 and have been manually translated by Austin Ziegler.

This is a small feature release adding Russian language support and fixing a bug in the custom hyphen support introduced last version. This release provides both Ruby 1.8.7 and Ruby 1.9.2 support (but please read below). In short, Ruby 1.8 support is deprecated and I will not be providing any bug fixes related to Ruby 1.8. New features will be developed and tested against Ruby 1.9 only.

Where

Synopsis

require 'text/hyphen'
hh = Text::Hyphen.new(:language => 'en_us', :left => 2, :right => 2)
# Defaults to the above
hh = Text::Hyphen.new

word = "representation"
points = hh.hyphenate(word)  #=> [3, 5, 8, 10]
puts hh.visualize(word)      #=> rep-re-sen-ta-tion

Both visualize and hyphenate_to methods allow choosing a custom hyphen:

puts hh.visualize(word, "­") #=> rep­re­sen­ta­tion

Text::Hyphen is truly multilingual, with 29 languages or language variants supported. As an example, consider the difference between the following:

require 'text/hyphen'
# Using left and right minimum values of 0 ensures that you will see all
# possible hyphenation points, not just those that meet the minimum width
# requirements.
en = Text::Hyphen.new(:left => 0, :right => 0)
fr = Text::Hyphen.new(:language => "fr", :left => 0, :right => 0)

puts en.visualise("organiser")      #=> or-gan-iser
puts fr.visualise("organiser")      #=> or-ga-ni-ser

As you can see, the hyphenation is distinct between the two hyphenators. Additional improvements over TeX::Hyphen include thread safety (except for debug control) and support for UTF-8 under Ruby 1.9.

Install

gem install text-hyphen

A Note About Ruby 1.8 Support

I’ve been saying for a couple of releases that “this is the last major release supporting Ruby 1.8 interpreters. Future versions will only work with Ruby 1.9 or later interpreters.” Let me clarify my position on this, because removing Ruby 1.8 support requires effort that I am not putting in as of yet.

Developers

After checking out the source, run:

$ rake newb

This task will install any missing dependencies, run the tests/specs, and generate the RDoc.

License

Licensing for Text::Hyphen is unfortunately complex because of the various copyrights and licenses of the source hyphenation files that have been converted to Ruby format. Some of these files are available only under the TeX license and others are available only under the GNU GPL while others are public domain. Each language file has these licenses embedded within the file. Please consult each file’s license to ensure that it is compatible with your application.

The Text::Hyphen library software, the application ruby-hyphen, and the library (gem) as a compilation is licensed under the terms of the MIT license. The files in this distribution covered by this license are in the list below called “Library Files”.

Individual language hyphenation files (in the list called “Language Files”) are maintained under the license described in the language file itself; the copyright for these original files is held by the original authors; any mistakes made in conversion of these files to Ruby is attributable to the contributors of the Text::Hyphen package only. If license information is not present in a given Language File, it should be considered under the terms of TeX.

Library License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

The copyright on the Text::Hyphen application/library and the Ruby translations of hyphenation files belongs to Austin Ziegler. All other copyrights on original versions still stand; Text::Hyphen is a derivative work of these and other projects.

Library Files

Note that while this list appears to include language files, these are “loader” files only and do not contain the hyphenation patterns themselves.

Language Files