Class: I2CE Hyphen: Difference between revisions

From IHRIS Wiki
No edit summary
No edit summary
Line 19: Line 19:
===HyphenateWord()===
===HyphenateWord()===
Hyphenates a word according to the loaded dictionary
Hyphenates a word according to the loaded dictionary
WARNING the word is assumed to be only letters.  if you need something more general
see getWordParts()
*Signature: public function HyphenateWord($word,$supress)
*Signature: public function HyphenateWord($word,$supress)
*Returns: [http://www.php.net/manual/en/language.types.array.php array ] of int containing the hyphenation points.  the hyphenation points are the offsets for begining of each
*Parameters:
subword.  of course, 0 is a hyphenation point.
** [http://www.php.net/manual/en/language.types.string.php string ] $word <br/>the word to be hyphenated WARNING the word is assumed to be only letters.  if you need something more general see getWordParts()
Parameters:
** [http://www.php.net/manual/en/language.types.boolean.php bool ] $supress <br/>true (default)to suppress hyphenation points at the beginning/end of a word.
* [http://www.php.net/manual/en/language.types.string.php string ] $word<br/>the word to be hyphenated
***Default Value: true
* [http://www.php.net/manual/en/language.types.boolean.php bool ] $supress<br/>true (default)to suppress hyphenation points at the beginning/end of a word.
*Returns: [http://www.php.net/manual/en/language.types.array.php array ]<br/>of int containing the hyphenation points.  the hyphenation points are the offsets for begining of each subword.  of course, 0 is a hyphenation point.
**Default Value: true
===LoadHyphenDictionary()===
===LoadHyphenDictionary()===
Load the hyphenation dictionary.
Load the hyphenation dictionary. The file is expected to be a 'mashed up' version of a .tex hyphenation dictionary geneareted by using substrings.pl as in the stand-along hyphenation code of http://lingucomponent.openoffice.org/hyphenator.html
 
The file is expected to be a 'mashed up' version of a .tex
hyphenation dictionary geneareted by using substrings.pl
as in the stand-along hyphenation code of
http://lingucomponent.openoffice.org/hyphenator.html
*Signature: public function LoadHyphenDictionary($file)
*Signature: public function LoadHyphenDictionary($file)
Parameters:
*Parameters:
* [http://www.php.net/manual/en/language.types.string.php string ] $file<br/>file containing the dictionary
** [http://www.php.net/manual/en/language.types.string.php string ] $file <br/>file containing the dictionary
===Visualize()===
===Visualize()===
Visualize a hyphenation for a word
Visualize a hyphenation for a word
WARNING the word is assumed to have no whitespace or periods and to be only one word
no digits or other special characters (unless they are already in your hypehnation dictionary)
*Signature: public function Visualize($word,$supress)
*Signature: public function Visualize($word,$supress)
*Returns: [http://www.php.net/manual/en/language.types.string.php string ] the hyphenated word
*Parameters:
Parameters:
** [http://www.php.net/manual/en/language.types.string.php string ] $word <br/>the word that is to be hyphenated WARNING the word is assumed to have no whitespace or periods and to be only one word no digits or other special characters (unless they are already in your hypehnation dictionary)
* [http://www.php.net/manual/en/language.types.string.php string ] $word<br/>the word that is to be hyphenated
** [http://www.php.net/manual/en/language.types.boolean.php bool ] $supress <br/>true (default)to suppress hyphenation points at the beginning/end of a word.
* [http://www.php.net/manual/en/language.types.boolean.php bool ] $supress<br/>true (default)to suppress hyphenation points at the beginning/end of a word.
***Default Value: TRUE
**Default Value: TRUE
*Returns: [http://www.php.net/manual/en/language.types.string.php string ]<br/>the hyphenated word
===__construct()===
===__construct()===
to the specified  encoding.
*Signature: public function __construct($enc)
*Signature: public function __construct($enc)
Parameters:
*Parameters:
* [[Class: I2CE_Encoding | I2CE_Encoding]] $enc<br/>specify the encoding the internal storage of this hyphenation dictionaty
** [[Class: I2CE_Encoding | I2CE_Encoding]] $enc <br/>specify the encoding the internal storage of this hyphenation dictionaty to the specified  encoding.
===getWordParts()===
===getWordParts()===
Get the  parts of a word which breaks along hyphenation points or any non-letter.
Get the  parts of a word which breaks along hyphenation points or any non-letter.
*Signature: public function getWordParts($word,$supress)
*Signature: public function getWordParts($word,$supress)
*Returns: an  the associative array has
*Parameters:
a string 'Subword' which tells what the subword is, the int 'Offset' tells where the subword started,
** [http://www.php.net/manual/en/language.types.string.php string ] $word <br/>the word we wish to break up
the int 'Length' the length of the subword, and the boolean 'IsLetter' which tells us if the
** [http://www.php.net/manual/en/language.types.boolean.php bool ] $supress <br/>true (default)to suppress hyphenation points at the beginning/end of a word.
subword is a composed of letters (by the Unicode convention) or not.
***Default Value: true
Parameters:
*Returns: an<br/>the associative array has a string 'Subword' which tells what the subword is, the int 'Offset' tells where the subword started, the int 'Length' the length of the subword, and the boolean 'IsLetter' which tells us if the subword is a composed of letters (by the Unicode convention) or not.
* [http://www.php.net/manual/en/language.types.string.php string ] $word<br/>the word we wish to break up
* [http://www.php.net/manual/en/language.types.boolean.php bool ] $supress<br/>true (default)to suppress hyphenation points at the beginning/end of a word.
**Default Value: true




[[Category:Class Documentation]]
[[Category:Class Documentation]]

Revision as of 23:43, 16 October 2009

This article desrcibes the class I2CE_Hyphen.

PHP script implement Knuth's and Liang's hyphenation algorithm as described in http://lingucomponent.openoffice.org/hyphenator.html In particular it uses the 'mashed up' dictionary files Note: Internally, by default, all strings are encoded as UTF-8. This is highly recommended to enable the unicode preg to work quickly (without having to covert to UTF=8 and then back). Note: Does not (yet) support the non-standard hyphenation of hungarian, swedish, etc.

Variables

$enc

protected @var I2CE_Encoding $enc the encoding used for internal storage of strings

  • Type: protected $enc

$patterns

An associative array contating the hyphenation patterns

  • Type: protected $patterns

$trans

  • Type: protected $trans

Methods

HyphenateWord()

Hyphenates a word according to the loaded dictionary

  • Signature: public function HyphenateWord($word,$supress)
  • Parameters:
    • string $word
      the word to be hyphenated WARNING the word is assumed to be only letters. if you need something more general see getWordParts()
    • bool $supress
      true (default)to suppress hyphenation points at the beginning/end of a word.
      • Default Value: true
  • Returns: array
    of int containing the hyphenation points. the hyphenation points are the offsets for begining of each subword. of course, 0 is a hyphenation point.

LoadHyphenDictionary()

Load the hyphenation dictionary. The file is expected to be a 'mashed up' version of a .tex hyphenation dictionary geneareted by using substrings.pl as in the stand-along hyphenation code of http://lingucomponent.openoffice.org/hyphenator.html

  • Signature: public function LoadHyphenDictionary($file)
  • Parameters:
    • string $file
      file containing the dictionary

Visualize()

Visualize a hyphenation for a word

  • Signature: public function Visualize($word,$supress)
  • Parameters:
    • string $word
      the word that is to be hyphenated WARNING the word is assumed to have no whitespace or periods and to be only one word no digits or other special characters (unless they are already in your hypehnation dictionary)
    • bool $supress
      true (default)to suppress hyphenation points at the beginning/end of a word.
      • Default Value: TRUE
  • Returns: string
    the hyphenated word

__construct()

  • Signature: public function __construct($enc)
  • Parameters:
    • I2CE_Encoding $enc
      specify the encoding the internal storage of this hyphenation dictionaty to the specified encoding.

getWordParts()

Get the parts of a word which breaks along hyphenation points or any non-letter.

  • Signature: public function getWordParts($word,$supress)
  • Parameters:
    • string $word
      the word we wish to break up
    • bool $supress
      true (default)to suppress hyphenation points at the beginning/end of a word.
      • Default Value: true
  • Returns: an
    the associative array has a string 'Subword' which tells what the subword is, the int 'Offset' tells where the subword started, the int 'Length' the length of the subword, and the boolean 'IsLetter' which tells us if the subword is a composed of letters (by the Unicode convention) or not.