Class: I2CE Hyphen: Difference between revisions

From IHRIS Wiki
(Created page with 'This article desrcibes the class '''I2CE_Hyphen''' It is contained in the module textlayout in the package [https://launchpad.net/textlayout Text…')
 
No edit summary
Line 1: Line 1:
This article desrcibes the class '''I2CE_Hyphen'''
This article desrcibes the class '''I2CE_Hyphen'''
It is contained in the module [[iHRIS Module List#textlayout|textlayout]] in the package [https://launchpad.net/textlayout TextLayout Tools]
It is contained in the module [[iHRIS Module List#textlayout|textlayout]] in the package [https://launchpad.net/textlayout TextLayout Tools]
The class is defined in the file: [http://bazaar.launchpad.net/~intrahealth+informatics/textlayout/4.0.0-release/files/head:lib/I2CE_Hyphen.php lib/I2CE_Hyphen.php]
 
The class is defined in the file: [http://bazaar.launchpad.net/~intrahealth+informatics/textlayout/4.0.0-release/files/head:/lib/I2CE_Hyphen.php lib/I2CE_Hyphen.php]
 
PHP  script implement Knuth's and Liang's hyphenation algorithm
PHP  script implement Knuth's and Liang's hyphenation algorithm
as described in  http://lingucomponent.openoffice.org/hyphenator.html
as described in  http://lingucomponent.openoffice.org/hyphenator.html
In particular it uses the 'mashed up' dictionary files
In particular it uses the 'mashed up' dictionary files


Note: Internally, by default, all strings are encoded as UTF-8.
Note: Internally, by default, all strings are encoded as UTF-8.
This is highly recommended to enable the unicode preg to work
This is highly recommended to enable the unicode preg to work
quickly (without having to covert to UTF=8 and then back).
quickly (without having to covert to UTF=8 and then back).


Note:  Does not (yet) support the non-standard hyphenation of hungarian,
Note:  Does not (yet) support the non-standard hyphenation of hungarian,
swedish, etc.
swedish, etc.


@package I2CE
@package I2CE
@subpackage TextLayout
@subpackage TextLayout
@author Carl Leitner <litlfred@ibiblio.org>
@author Carl Leitner <litlfred@ibiblio.org>


@version 0.1
@version 0.1
@access public
@access public
==Variables==
==Variables==

Revision as of 20:22, 16 October 2009

This article desrcibes the class I2CE_Hyphen It is contained in the module textlayout in the package TextLayout Tools

The class is defined in the file: lib/I2CE_Hyphen.php

PHP script implement Knuth's and Liang's hyphenation algorithm

as described in http://lingucomponent.openoffice.org/hyphenator.html

In particular it uses the 'mashed up' dictionary files


Note: Internally, by default, all strings are encoded as UTF-8.

This is highly recommended to enable the unicode preg to work

quickly (without having to covert to UTF=8 and then back).


Note: Does not (yet) support the non-standard hyphenation of hungarian,

swedish, etc.


@package I2CE

@subpackage TextLayout

@author Carl Leitner <litlfred@ibiblio.org>


@version 0.1

@access public

Variables

$enc

protected @var I2CE_Encoding $enc the encoding used for internal storage of strings

  • Type: protected $enc

$patterns

An associative array contating the hyphenation patterns

  • Type: protected $patterns

$trans

  • Type: protected $trans

Methods

HyphenateWord()

Hyphenates a word according to the loaded dictionary WARNING the word is assumed to be only letters. if you need something more general see getWordParts()

  • Signature: public function HyphenateWord($word,$supress)
  • Returns: array of int containing the hyphenation points. the hyphenation points are the offsets for begining of each

subword. of course, 0 is a hyphenation point. Parameters:

  • string $word
    the word to be hyphenated
  • bool $supress
    true (default)to suppress hyphenation points at the beginning/end of a word.
    • Default Value: true

LoadHyphenDictionary()

Load the hyphenation dictionary.

The file is expected to be a 'mashed up' version of a .tex hyphenation dictionary geneareted by using substrings.pl as in the stand-along hyphenation code of http://lingucomponent.openoffice.org/hyphenator.html

  • Signature: public function LoadHyphenDictionary($file)

Parameters:

  • string $file
    file containing the dictionary

Visualize()

Visualize a hyphenation for a word WARNING the word is assumed to have no whitespace or periods and to be only one word no digits or other special characters (unless they are already in your hypehnation dictionary)

  • Signature: public function Visualize($word,$supress)
  • Returns: string the hyphenated word

Parameters:

  • string $word
    the word that is to be hyphenated
  • bool $supress
    true (default)to suppress hyphenation points at the beginning/end of a word.
    • Default Value: TRUE

__construct()

to the specified encoding.

  • Signature: public function __construct($enc)

Parameters:

  • I2CE_Encoding $enc
    specify the encoding the internal storage of this hyphenation dictionaty

getWordParts()

Get the parts of a word which breaks along hyphenation points or any non-letter.

  • Signature: public function getWordParts($word,$supress)
  • Returns: an the associative array has

a string 'Subword' which tells what the subword is, the int 'Offset' tells where the subword started, the int 'Length' the length of the subword, and the boolean 'IsLetter' which tells us if the subword is a composed of letters (by the Unicode convention) or not. Parameters:

  • string $word
    the word we wish to break up
  • bool $supress
    true (default)to suppress hyphenation points at the beginning/end of a word.
    • Default Value: true