Importing Data: Difference between revisions

From IHRIS Wiki
No edit summary
No edit summary
Line 63: Line 63:


The getDate() function is a utility function to get the current data and format it, so that we can use it in the name of the bad records that we will create later for storing unsuccessful imports.
The getDate() function is a utility function to get the current data and format it, so that we can use it in the name of the bad records that we will create later for storing unsuccessful imports.
<pre>
Carl Leitner: ok.  you wanted to discuss importing?
[10:30:20] Sovello Hildebrand Mgani: yes
[10:30:45] Carl Leitner: ok.  what's best?  do you want to look at the import script for infinium and talk our way through it?
[10:31:39] Sovello Hildebrand Mgani: yes. that would do. but if you have time, we could take one from tanzania http://bazaar.launchpad.net/~ihris-tanzania/pmoralg/aggregate/view/head:/tools/importCSV.php
[10:31:47] Sovello Hildebrand Mgani: and then later move to Infinium
[10:32:09] Sovello Hildebrand Mgani: as am familiar with Tanzania data, it would make more sense and the lesson clearer before we jump into that
[10:32:30] Carl Leitner: the infinium has some new useful features.
[10:32:36] Carl Leitner: i think it is also clearer.
[10:32:55] Carl Leitner: (useful features include: only process a row once even if the script is run multiple times)
[10:33:17] Sovello Hildebrand Mgani: ok! let's do it, no problem! i'll try my best not to get lost
[10:33:37] Sovello Hildebrand Mgani: you are referring to this, http://bazaar.launchpad.net/~ihris+botswana/ihris-botswana/trunk/view/head:/tools/import_infinium.php
[10:33:48] Carl Leitner: yep
[10:34:46] Carl Leitner: first let's do a step through of each "chunk" of code to explain what it is doing.
[10:34:58] Sovello Hildebrand Mgani: yes!
[10:35:11] Carl Leitner: line 1 means that we can run this a script from the command line.
[10:36:04] Carl Leitner: so if we have done "chmod +x import_infinfium.php" we would be able to do "./import_infinium.php /path/to/excel.xls"  instead of "php import_infinium.php /path/to/excel.xls"
[10:36:20] Carl Leitner: you may have seen something this for "#!/bin/bash"
[10:36:28] Sovello Hildebrand Mgani: yes. that is clear
[10:36:56] Carl Leitner: line 2 starts the php script, line 3-32 are comments we can ignore.
[10:37:08] Carl Leitner: line $34 is for "test mode"
[10:37:31] Sovello Hildebrand Mgani: why setting that to true?
[10:37:42] Carl Leitner: it should be false.
[10:38:07] Carl Leitner: take a look at line 335-350
[10:38:22] Carl Leitner: if test mode is false the save function doesn't do anything.
[10:38:33] Carl Leitner: so I am wondering if Montty has not actually been importing data
[10:38:50] Sovello Hildebrand Mgani: i can see that
[10:39:07] Carl Leitner: let me check something...
[10:40:58] Carl Leitner: hmm.  because port 22 is closed they have been doing things "differently"
[10:41:10] Carl Leitner: and their production server may be out of sync.
[10:41:23] Carl Leitner: let me see what i can understand.... just a minute
[10:41:57] Sovello Hildebrand Mgani: ok!
[10:42:29] Carl Leitner: oh.  i think i see the issue, i haven't pushed changes to launchpad (i think)
[10:42:34] Carl Leitner: please hold on.
[10:44:11] Carl Leitner: ok.  here we go  http://bazaar.launchpad.net/~ihris+botswana/ihris-manage/4.0/view/head:/tools/import_infinium.php
[10:44:19] Carl Leitner: we were looking at the wrong branch
[10:45:09] Carl Leitner: ok.  starting at line 35
[10:45:12] Sovello Hildebrand Mgani: ooh!
[10:45:21] Sovello Hildebrand Mgani: yes, we are now in sync.
[10:45:22] Carl Leitner: we are requiting "import_base.php"
[10:45:36] Sovello Hildebrand Mgani: seen that
[10:45:47] Carl Leitner: this defines the Processor class and sets up some stuff for us.
[10:46:11] Carl Leitner: http://bazaar.launchpad.net/~ihris+botswana/ihris-manage/4.0/view/head:/tools/import_base.php
[10:46:26] Carl Leitner: let's just look at this quickly
[10:46:54] Carl Leitner: line 36 -  67 look almost like what you have seen in index.php right?
[10:47:26] Sovello Hildebrand Mgani: yes
[10:48:03] Carl Leitner: the only difference is that this script  (import_infinium really) is intended to be run in the "tools" directory rather than the pages directory
[10:48:33] Carl Leitner: so lines 36-37 handles changing directory to the pages directory and keeping all the paths in order for the configuration files
[10:48:36] Sovello Hildebrand Mgani: yes
[10:49:18] Carl Leitner: ok.  so once we have connected to the database, on line 71 we include CLI.php (which is in I2Ce/tools/CLI.php)
[10:49:31] Carl Leitner: CLI.php provides the functions to prompt the user for input
[10:50:00] Sovello Hildebrand Mgani: i think i passed through this CLI.php and it made some sense
[10:50:19] Carl Leitner: it's here http://bazaar.launchpad.net/~intrahealth+informatics/i2ce/4.1-dev/view/head:/tools/CLI.php  but we don't need to go into it now.
[10:50:41] Sovello Hildebrand Mgani: ok
[10:51:17] Carl Leitner: now there are two sets of classes that are defined.  one to handle the import data files and one to be the processor of those files
[10:51:43] Carl Leitner: line 75-210 is for the data file, and after that is the processor
[10:51:56] Carl Leitner: there are three classes defined for the data files.
[10:52:30] Carl Leitner: first is an abstract class DataFile, and then classes extending this.  one to handle CSV files, one to handle Excel files
[10:52:41] Sovello Hildebrand Mgani: i have seen those
[10:53:08] Carl Leitner: the DataFile class has three sets of functions:
[10:53:26] Carl Leitner: 1) get the header row data
[10:53:40] Carl Leitner: 2) iterate through the non-header rows
[10:53:51] Carl Leitner: 3) get the name of the file
[10:54:23] Sovello Hildebrand Mgani: 2) isn't clear
[10:54:57] Carl Leitner: In the processor, what we want to do is 1) get the header row for the spreadsheet to make sure it is what is expected (that all the columns we need are there)
[10:54:58] Sovello Hildebrand Mgani: cos I see one checks to see if - hasDataRow() and get the headers
[10:55:30] Carl Leitner: then 2) loop through each row of the spreadsheet and process the data from that row
[10:55:43] Carl Leitner: lets take a look at where this is in the processor:
[10:56:18] Carl Leitner: line 268-285 in the function run()
[10:56:40] Sovello Hildebrand Mgani: yes
[10:56:45] Carl Leitner: (in the import_infinium.php we define a processor for the spreadsheet and then call its run() function)
[10:56:52] Carl Leitner: so this run() is the main looping logic.
[10:57:32] Carl Leitner: when run() is called, we have already checked that the data file is valid, so we do a "while" loop while the spreadsheet hasDataRow()
[10:57:51] Carl Leitner: when we reach the end of the file, hasDataRow() will return false
[10:58:00] Sovello Hildebrand Mgani: ok, now I get it!
[10:59:03] Carl Leitner: hold on a minute please
[11:00:04] Carl Leitner: ok. line 271-277
[11:00:48] Carl Leitner: that is there because sometimes spreadsheets from gnumeric were saved with 65,000 rows with only the first 100 or so having data (it would populate all the rest of the rows with blank data)
[11:01:06] Sovello Hildebrand Mgani: ok!
[11:01:15] Carl Leitner: if it runs across 10 blank rows, it will ask the user if they want to stop processing.
[11:01:42] Carl Leitner: line 278-283 is where we actually process the data row.
[11:01:52] Sovello Hildebrand Mgani: yes. the simple_prompt() is in the CLI.php for this case
[11:02:30] Carl Leitner: so at his point we know we have a row of data from the spreadsheet and we call the processRow() funciton in the Processor class
[11:02:46] Carl Leitner: this is defined on lines 294-330
[11:03:10] Carl Leitner: first on line 295-270 , we have a (redundant) check to make sure there is data available
[11:03:46] Carl Leitner: line 298 is where we get the acutal data from the DataFile class by calling the getDataRow() function
[11:04:20] Sovello Hildebrand Mgani: yes
[11:04:55] Carl Leitner: take a look at where these are defined in the CSVDataFile ... i will be just a minute
[11:06:19] Sovello Hildebrand Mgani: yes
[11:06:23] Sovello Hildebrand Mgani: done.
[11:07:20] Carl Leitner: ok.
[11:07:37] Carl Leitner: so that is simply the fgetcsv function
[11:07:52] Carl Leitner: which gets a row of data from a CSV file and returns it as an array
[11:08:18] Carl Leitner: here is the reference http://www.php.net/manual/en/function.fgetcsv.php
[11:08:35] Sovello Hildebrand Mgani: a minute: what does this property hold $in_file_sep ?
[11:09:18] Carl Leitner: it is the third argument, if you look at the link I just sent, it is the delimiter between columns
[11:09:53] Carl Leitner: $this->in_file_sep is where we hold the value of it, which is probably a comma ","
[11:10:21] Carl Leitner: where this is set is actually in the getHeaders()
[11:10:43] Carl Leitner: look at lines 125-133
[11:11:08] Carl Leitner: we try to read in the header row using "\t" then "," and then ";" as the column separator
[11:11:35] Sovello Hildebrand Mgani: oook. i now have seen it
[11:11:40] Sovello Hildebrand Mgani: line 131
[11:11:53] Sovello Hildebrand Mgani: i was confused cos it was initialized to false
[11:12:01] Sovello Hildebrand Mgani: line 103 and 123
[11:12:06] Carl Leitner: yes.  we try each of those column separators and ask the user to ensure that the headers look correct
[11:13:11] Carl Leitner: (as an aside, you can see that the getHeaders() is called by the mapHeaders() function which is called by the __construct() function, the Process object constructor)
[11:13:44] Carl Leitner: (we will get to mapHeaders() later)
[11:14:15] Sovello Hildebrand Mgani: ok. i was about to ask
[11:14:30] Carl Leitner: ok.  back to the getHeaders() in CSVDataFile
[11:14:52] Carl Leitner: on line 134-135, if we don't find the headers, we die()
[11:15:13] Carl Leitner: (this is bad programming,  i really should throw an exception here, but let's not worry about that)
[11:15:20] Sovello Hildebrand Mgani: ok
[11:15:28] Sovello Hildebrand Mgani: only that it works
[11:15:57] Carl Leitner: line 137-139 will clean any whitespace on the headers so that "Title " becomes "Title" for example
[11:16:23] Sovello Hildebrand Mgani: yes. am familiar to this!
[11:16:48] Carl Leitner: I think you have seen "foreach ($data as $value)" before
[11:16:59] Carl Leitner: have you seen "foreach ($data as &$value)" ?
[11:17:12] Sovello Hildebrand Mgani: yes i use it quite often
[11:17:28] Sovello Hildebrand Mgani: why do we unset($header) ine 140 then?
[11:17:45] Carl Leitner: OK.... Let's do two examples.
[11:17:48] Carl Leitner: Example 1)
[11:18:20] Carl Leitner: $data = array(2,3);
foreach ($data as $value) {
  $value = $value + 1;
}
print_r($data)
[11:18:47] Carl Leitner: if we did this, we would have  the print_r statment print out
  array(0=>2,1=>3)
[11:19:21] Carl Leitner: now let's do the same but with the &:
foreach ($data as &$value) {
  $value = $value +1;
}
print_r($data)
[11:19:47] Carl Leitner: then the print_r statment would print out:
array (0=>3, 1=>2)
[11:20:36] Carl Leitner: what is going on here is that $value is simple the value of the array element,  the way PHP is set up is if we modify $value, then it creates a copy of it and then changes it
[11:21:04] Carl Leitner: on the other hand &$value contains a "reference" to the $value.
[11:21:41] Sovello Hildebrand Mgani: like pointers in C
[11:21:45] Carl Leitner: So when do the "$value = $value + 1;"  we do not make a copy of value.  instead we lookup the refence to where it is stored in memory and modify that
[11:21:47] Carl Leitner: yes.
[11:22:37] Carl Leitner: http://php.net/manual/en/language.references.php
[11:22:54] Carl Leitner: getting back to your question... what does unset($header) do here?
[11:23:06] Sovello Hildebrand Mgani: this concept wan't not clear yet!
[11:23:37] Carl Leitner: it this case, unset($header) does nothing.  however it is good programming practice in case you have a nested foreach loop
[11:24:02] Carl Leitner: or if you accidently reuse the $header variable later.
[11:24:07] Sovello Hildebrand Mgani: i think we don't need it because the values in $headers are already changed. it was used like a "place holder"
[11:24:07] Carl Leitner: two examples of this:
[11:25:09] Carl Leitner: foreach ($data as &$value) {
  $value = $value + 1;
}
//do some stuff.  forget that we had used the variable $value
$value = '25';
print_r($data)
[11:25:31] Carl Leitner: if we do this, we do not get array( 0=>2, 1=>3) as we did last time.
[11:26:02] Carl Leitner: instead we would get:  array(0=>2, 1=>'25')
[11:26:39] Carl Leitner: that is because the variable $value is still a reference to the variable/place in memory of $data[1]
[11:26:51] Carl Leitner: If we did instead:
[11:27:36] Carl Leitner: foreach ($data as &$value) {
  $value = $value + 1;
}
unset($value);
//do some stuff. we can forget we had used the variable $value without worries
$value = '25';
print_r($data)
[11:27:50] Carl Leitner: then we would still have array(0=>2,1=>3)
[11:28:08] Carl Leitner: the unset($value) "de-refererences" $value from the memory location of $data[1]
[11:28:23] Sovello Hildebrand Mgani: thanks. i'll hack deeper into this
[11:28:28] Sovello Hildebrand Mgani: we can proceed i think
[11:29:07] Carl Leitner: ok.  we have done getHeaders() and getDataRow() for DataFileCSV.
[11:29:17] Carl Leitner: do you want to look at it for DataFileExcel?
[11:29:53] Sovello Hildebrand Mgani: i think they should look more or less the same and apply the same logic
[11:30:45] Carl Leitner: yes.... the DataFile class is a wrapper so that the Processor class does not have to care if the file is coming from CSV or Excel
[11:31:49] Carl Leitner: the Excel one uses the PHPExcel class which is a bit annoying to use.
[11:32:15] Carl Leitner: in any case, the Processor class only cares about the header row and being able to loop through the data rows
[11:32:39] Carl Leitner: in this way, in the import_infinium script, we don't have to worry if it is an excel spreadsheet or a CSV file
[11:32:58] Sovello Hildebrand Mgani: ok!
[11:33:12] Carl Leitner: do you want to look now at the Processor class?
[11:33:38] Sovello Hildebrand Mgani: yes, just to get an overview!
[11:34:22] Carl Leitner: it is abstract class because there are two functions which are not defined: getExpectedHeaders() and _processRow()
[11:35:16] Carl Leitner: these two functions will be defined based on the data we are importing.  for example in infinium they are defined on lines:
[11:35:49] Carl Leitner: 68 for getExpectedHeaders and 123 for _processRow()
[11:36:32] Carl Leitner: these are the minimum functions that any sub-class of Processor would need to define in order to define a valid processor of an import file.
[11:37:22] Carl Leitner: the getExpectedHeaders() is used to describe the columns that we expect to have in our data file, and _processRow() is what takes that row and transforms in into form and fields as needed
[11:38:05] Carl Leitner: let's look quickly at line 68-95 of import_infinium.php so we can see what getExpectedHeaders() returns
[11:38:54] Carl Leitner: it is an associative array.  the keys are a way of referencing the columns easily in code.  the values are the titles of the columns it expects to find header row of the spreadsheet
[11:39:10] Sovello Hildebrand Mgani: ok.
[11:39:24] Carl Leitner: for example on line 91 you see surname.
[11:39:27] Sovello Hildebrand Mgani: so iti s just for comparison to make sure all the expected headers are present
[11:39:39] Carl Leitner: yes.  and to easily refernce the data in the _processRow()
[11:40:01] Sovello Hildebrand Mgani: ok
[11:40:11] Carl Leitner: on line 140 of import_infinium you see for example:
[11:40:16] Carl Leitner:  $this->mapped_data['surname']
[11:41:00] Carl Leitner: this is in the _processRow() function and it is the value of the  data in the "Surname" column for that row
[11:41:01] Sovello Hildebrand Mgani: yes
[11:41:41] Carl Leitner: One reason to do this in this way is if the export changed suddenly to have "Last Name" instead of "Surname" in the header row, we would only need to make a change in the getExpectedHeaders()
[11:42:02] Carl Leitner: and all of the $this->mapped_data['surname'] would still be fine
[11:42:07] Sovello Hildebrand Mgani: ok!
[11:42:21] Sovello Hildebrand Mgani: ahaaa!
[11:42:49] Carl Leitner: OK.  So now that we have this example in mind, let's go back to Processor to see where getExpectedHeaders() is used and how the data is mapped into $this->mapped_data
[11:43:02] Sovello Hildebrand Mgani: ok
[11:43:43] Carl Leitner: first, let's look at the constructor for Processor
[11:44:08] Carl Leitner: on line 238-254
[11:44:54] Carl Leitner: this is called when you do somehting like:
  $my_processor = new InfiniumProcessor('/path/to/my/spreadsheet.xls');
[11:45:14] Sovello Hildebrand Mgani: yes.. this is clear
[11:45:27] Carl Leitner: line 239  just records the file path sent in the constructor for later reference
[11:45:48] Carl Leitner: line 241-245 uses the file's extension to guess its data type
[11:46:00] Sovello Hildebrand Mgani: ok
[11:46:22] Carl Leitner: if it is csv file it creates an instanceof CSVDataFile for the file in order to loop through the data rows as we discussed above
[11:46:41] Carl Leitner: otherwise it assumes it is an excel file
[11:46:50] Sovello Hildebrand Mgani: yes
[11:46:56] Carl Leitner: and creates an ExcelDataFile
[11:47:42] Carl Leitner: note: that the PHPExcel library can also handle multiple file types, including CSV, but we should try a specific tool if we think it is that
[11:48:14] Sovello Hildebrand Mgani: ok, i can also see you commented that!
[11:48:33] Carl Leitner: (i am not reading my own comments!) line 247 creates an instance of I2CE_User..  it will be the admin with user id = 0 (i think)
[11:49:07] Carl Leitner: this is used when ever we save a form object:  for example: $personObj->save($this->user);
[11:49:20] Carl Leitner: (that's b/c I2CE keeps a record of who modifies the data)
[11:49:47] Carl Leitner: line 248 just sets up a reference to the database in case we need to use it
[11:50:12] Carl Leitner: line 249 is the form factory, the thing we use to create the form objects, such as the $personObj
[11:50:38] Carl Leitner: you will see it used like:  $personObj = $this->ff->createContainer('person')
[11:50:48] Carl Leitner: which will create a new person with blank data.
[11:51:12] Carl Leitner: line 250 is the call to mapHeaders() which we will go into in a moment
[11:51:34] Carl Leitner: line 251 sets up the file we use to store "rejects" of the data import
[11:51:44] Carl Leitner: the "bad file"
[11:52:41] Carl Leitner: next, we ask the user if this is a test run (no saves will happen, but we will see if we can process the file)  this is used for debugging purposes.  but you should also probably also do this each time you want to run the import script to identify any issues ahead of time
[11:53:18] Carl Leitner: last, on line 253, we create a "logger"  which will keep track (in the database) of each row that we have processed.
[11:53:40] Carl Leitner: this is useful so that if the script is rerun on the sample spreadsheet we don't re-import the same data.
[11:54:03] Carl Leitner: we will see how this is used in the processRow() in a bit
[11:54:13] Carl Leitner: Is that OK for the constructor?
[11:54:18] Sovello Hildebrand Mgani: yes
[11:54:33] Sovello Hildebrand Mgani: except for line 249
[11:54:37] Sovello Hildebrand Mgani: it is not very clear
[11:54:56] Carl Leitner: OK.
[11:55:02] Carl Leitner: let me check something..
[11:56:30] Carl Leitner: I don't see anything already written explaining about the form factory...
[11:56:58] Carl Leitner: let's look at the 'person' example more carefully.
[11:57:23] Carl Leitner: so the 'person' form has a formClass associated to it in magic data right?
[11:57:33] Sovello Hildebrand Mgani: yes
[11:57:33] Carl Leitner:  /modules/forms/forms/person/class = 'iHRIS_Person'
[11:58:05] Carl Leitner: and 'iHRIS_Person' is defined under /modules/forms/formClasses/iHRIS_Person  which contains information about the fields it has (and more)
[11:58:17] Sovello Hildebrand Mgani: yes
[11:58:58] Carl Leitner: so the iHRIS_Person class is defined something like "iHRIS_Person extends I2CE_Form{ /*blah*/ }"
[11:59:08] Carl Leitner: where /*blah*/ may be blank
[11:59:23] Sovello Hildebrand Mgani: yes
[11:59:24] Carl Leitner: in any case, iHRIS_Person is an actual PHP Classs
[11:59:29] Carl Leitner: (but person isn't)
[11:59:49] Carl Leitner: so let's look at what  $this->ff->createContainer('person') does
[12:00:06] Carl Leitner: very roughly it does:
[12:00:26] Carl Leitner:    $formObj = new iHRIS_Person();
[12:00:53] Carl Leitner:      $formObj->addField('firstname',I2CE_String_LINE,'First Name')
[12:01:05] Sovello Hildebrand Mgani: the container here person is the form,
[12:01:07] Carl Leitner:  $formObj->addField('surname',I2CE_STRING_LINE,'Surname')
[12:01:54] Carl Leitner:  yes,  the container is a form (I2CE_Form extends I2CE_FieldContainer.  we are planning on making  reports a subclass of I2CE_FIeldContainer as well)
[12:02:17] Carl Leitner: the above code with the addFields() is not correct, but it gives you a sense of what it is doing.
[12:02:27] Sovello Hildebrand Mgani: ok
[12:02:36] Carl Leitner: then createContainer() returns the $formObj
[12:02:58] Carl Leitner: so you can now do things like:  $personObj->getField('surname')->setValue("Leitner");
[12:03:12] Sovello Hildebrand Mgani: wait:
[12:03:42] Carl Leitner: Oops... time has flown!  I am supposed to be in a meeting.
[12:03:45] Sovello Hildebrand Mgani: so here, iHRIS_Person is instantiated within createContainer() and the fields are added within the same
[12:03:52] Carl Leitner: sorry! sorry!
[12:04:08] Carl Leitner: we will need to continue with mapHeaders() next time.
[12:04:15] Sovello Hildebrand Mgani: but i think you took me to some point, i'll proceed and see
[12:04:22] Carl Leitner: ok.  sounds good.
[12:04:26] Carl Leitner: talk to you later.
[12:04:41] Sovello Hildebrand Mgani: do you think you can have time after the meeting?
[12:05:05] Sovello Hildebrand Mgani: after how long?
[12:05:37] Carl Leitner: probably not today.  we can continue on monday i think.
[12:05:51] Sovello Hildebrand Mgani: OK! same time?
[12:05:53] Carl Leitner: but yes to your question about createContainer()
[12:06:07] Carl Leitner: yes.
[12:06:20] Carl Leitner: can you send an invite so I don't foget?
[12:06:46] Sovello Hildebrand Mgani: OK. I'll do! am going to contact Brooke also for instructions for the trip
[12:08:26] Carl Leitner: for reference:  I2CE_FormFactory extends I2CE_FieldContainter_Factory
[12:08:26] Carl Leitner: http://wiki.ihris.org/wiki/Class:_I2CE_FieldContainer_Factory_%284.1.1%29#createContainer.28.29_2
[12:08:39] Carl Leitner: is where createContainer() is defined
[12:09:31] Sovello Hildebrand Mgani: thanks.
[12:09:54] Carl Leitner: which calls the _createContainer() method.
[12:10:20] Carl Leitner: http://bazaar.launchpad.net/~intrahealth+informatics/i2ce/4.1.1-release/annotate/head:/modules/Forms/lib/I2CE_FormFactory.php#L243
Monday, 25 June 2012
[10:18:24] Carl Leitner: hi
[10:21:41] Sovello Hildebrand Mgani: hello Carl! Hi!
[10:22:05] Carl Leitner: sorry for being a bit late -- meetings!
[10:22:27] Carl Leitner: before we get started, there is one thing you should check.
[10:22:33] Sovello Hildebrand Mgani: don't mind
[10:22:40] Carl Leitner: you should check to see if you need a visa to enter botswana.
[10:22:43] Sovello Hildebrand Mgani: it is not that late anyways...
[10:23:25] Sovello Hildebrand Mgani: i went through the Forms/lib classes and i now have grasped a bit of sense
[10:23:37] Carl Leitner: ok.  good.
[10:24:22] Carl Leitner: http://www.botswanaembassy.org/index.php?page=countries-that-do-not-require-visa
[10:24:37] Sovello Hildebrand Mgani: thanks.
[10:24:42] Carl Leitner: you are probably OK w/ the visa, but you should call the Botswana embassy in Tanzania to be sure.
[10:25:13] Carl Leitner: have you used the new Skype on Ubuntu yet?
[10:25:15] Sovello Hildebrand Mgani: I am travelling to Dar tomorrow, so I'll pass by their. though it seems they don't have an embassy in Dar
[10:25:49] Sovello Hildebrand Mgani: not yet. mine is still 2.2.0.35
[10:26:16] Carl Leitner: some weird stuff happens, but I don't know if its just me.
[10:26:18] Carl Leitner: anyways...
[10:26:30] Carl Leitner: should we look at mapHeaders() ?
[10:26:32] Sovello Hildebrand Mgani: we ended with createContainer();
[10:27:32] Carl Leitner: ok.  do you have any questions on the createContainer() / form factory stuff?
[10:27:51] Sovello Hildebrand Mgani: for not i don't.
[10:28:00] Sovello Hildebrand Mgani: for now i don't have!
[10:28:10] Carl Leitner: ok.  then back to the processor...
[10:28:11] Carl Leitner: http://bazaar.launchpad.net/~ihris+botswana/ihris-manage/4.0/view/head:/tools/import_base.php#L223
[10:28:12] Sovello Hildebrand Mgani: we can just go through mapHeaders() just quickly
[10:28:25] Carl Leitner: so mapHeaders() is called in the construtor for the processor.
[10:28:38] Carl Leitner: on line 250
[10:29:02] Carl Leitner: and is defined on line 257-372
[10:29:33] Carl Leitner: line 358:  this is where we read in the headers from the DataFile object.
[10:29:53] Sovello Hildebrand Mgani: yes
[10:29:59] Carl Leitner: (we looked at that last time for CSV when we were looking at guessing the column separator)
[10:30:20] Sovello Hildebrand Mgani: yes
[10:30:26] Carl Leitner: $thiso->headers is (probably) just an array with values the name of the columns it read in the header
[10:31:02] Carl Leitner: line 360 we trim the white space and make everything lower case (so we are more likely to match against the expected headers)
[10:31:25] Carl Leitner: you see here the foreach ($headers as &$header)  .... unset($header) just like we talked about last time
[10:31:34] Sovello Hildebrand Mgani: yes
[10:31:55] Carl Leitner: so after line 362, $this->headers should be an array of the column titles in lower case with the space trimmed.
[10:32:35] Carl Leitner: line 363 we are initializing the array which will contain the mapping between column numbers and where the expected headers/columns are going to be
[10:33:02] Carl Leitner: line 364 we are asking for the expected headers.
[10:33:21] Carl Leitner: last time we talked about how each subclass of Processor needs to implement this method,
[10:33:32] Carl Leitner: and for import_infinium, we see this defined here:
[10:33:34] Sovello Hildebrand Mgani: yes
[10:33:51] Carl Leitner: http://bazaar.launchpad.net/~ihris+botswana/ihris-botswana/trunk/view/head:/tools/import_infinium.php#L521
[10:34:12] Carl Leitner: so it is returning an array of "short names" and Column Names.
[10:34:43] Carl Leitner: now looking at line 365, we are looping through each of the expected headers to see if we could find a match in $this->headers
[10:35:17] Carl Leitner: line 367 we die() if we do not find the match
[10:35:37] Carl Leitner: (again this is bad programming, we should really be throwing an exception rather than die())
[10:36:33] Carl Leitner: line 370, we are filling in the header map.  the key is the shortname (the header reference)  the value is the column number at which that column was found
[10:36:56] Carl Leitner: so at the end of this foreach() loop, $this->header_map should look sometihng like:
[10:37:12] Carl Leitner:  array('surname'=>2, 'firstname'=>4,...)
[10:37:26] Carl Leitner: depending on the order of the columns in the spreadsheet.
[10:37:54] Carl Leitner: any questions on what is going here?
[10:38:26] Sovello Hildebrand Mgani: am a little confused with the key=>value pairs here.
[10:38:29] Sovello Hildebrand Mgani: line 366
[10:39:15] Sovello Hildebrand Mgani: when we search, we are searching for the $expected_header in $this_headers which are obtained fromt the $dataFile
[10:39:39] Carl Leitner: correct.
[10:39:43] Sovello Hildebrand Mgani: i was thinking $this->dataFile->getHeaders returns the strings and not the column numbers...
[10:40:50] Carl Leitner: $this->dataFile->getHeaders() will return an array like (0=>'National ID', 1=>'Passprot Number',2=>'Surname',3=>'Other Name',4=>'First Name',....
[10:41:00] Carl Leitner: so the keys of the array are the column numbers.
[10:41:10] Carl Leitner: the values of this array are the column titles (or headers)
[10:41:31] Carl Leitner: then array_search(strtolower($expected_header),$this->headers))
[10:42:03] Carl Leitner: will return the key/column number at which the strtolower("First Name") was found in the $this->headers array
[10:42:33] Sovello Hildebrand Mgani: i missed the point that array_search() returns the key not the value. THANKS. we now can proceed
[10:42:59] Carl Leitner: ok.  so we are setting $header_col to the result of array_search
[10:43:15] Carl Leitner: and note:  array_search returns false if it is not found.
[10:43:25] Sovello Hildebrand Mgani: yes
[10:43:32] Carl Leitner: there is a reason that we do the === here instead of the ==
[10:43:50] Carl Leitner: === also checks the type of the value that is returned.  == does not.
[10:43:52] Sovello Hildebrand Mgani: i think === is used when we compare strings for equality
[10:43:59] Sovello Hildebrand Mgani: yes
[10:44:11] Carl Leitner: some examples:
[10:44:14] Carl Leitner:  0 == false
[10:44:21] Carl Leitner:  false === false
[10:44:24] Carl Leitner:    1 == true
[10:44:33] Carl Leitner: these are all true statements.  but!
[10:44:38] Carl Leitner:  0 === false
[10:44:41] Carl Leitner:  1 === true
[10:44:45] Carl Leitner: are false statments.
[10:44:49] Sovello Hildebrand Mgani: yes
[10:45:29] Carl Leitner: that's because on the ===, 0 is a int, and false is a bool.  on the ==, false is "cast" to be 0 so we are really doing:  0 == (int) false
[10:45:35] Carl Leitner: which is true.
[10:45:41] Carl Leitner: ok.  going on..
[10:45:53] Carl Leitner: are there any more questions on this?
[10:46:01] Sovello Hildebrand Mgani: it is now very clear
[10:46:06] Carl Leitner: ok.
[10:46:19] Carl Leitner: so since we are already here, we should look a the mapData function
[10:46:38] Carl Leitner: defined on line 375
[10:47:03] Sovello Hildebrand Mgani: ok
[10:47:04] Carl Leitner: it is called by the processRow() function which is defined on line 294
[10:47:28] Carl Leitner: in processRow() we have two things happening that are important for us right now:
[10:47:51] Carl Leitner:    $this->data = $this->dataFile->getRow()
[10:48:04] Carl Leitner:  and
[10:48:06] Carl Leitner:  if ( ! ($this->mapped_data = $this->mapData())) {
            return false;
        }
[10:48:26] Carl Leitner: so first we are reading in a row of data from the spreadsheet and storing it in $this->data
[10:48:58] Carl Leitner: then we try to mapData().  if the result of mapData is false, we return false meaning we cannot process the row.
[10:49:22] Carl Leitner: so that's how mapData is used.  now let's look at where it is defined on line 375
[10:49:53] Carl Leitner: first we create an array $mapped_data to return our "mapped data"
[10:50:29] Carl Leitner: next we go through our $this->header_map which looks like array('surname'=>2, 'firstname'=>4)
[10:51:08] Carl Leitner: on line 378-379, if the column is in the mapped data, we add it to the mapped data.
[10:51:12] Carl Leitner: we end up with something like
[10:51:29] Carl Leitner:  $mapped_data = array('surname'=>'Leitner', 'firstname'=>Carl', ....)
[10:51:51] Sovello Hildebrand Mgani: yes
[10:52:14] Carl Leitner: so the $mapped_data we return maps the row of data  so we don't need to use column numbers (which can easiily change) to access the data
[10:52:50] Carl Leitner: so this function is clear now?
[10:52:54] Sovello Hildebrand Mgani: yes
[10:53:14] Carl Leitner: do you want to look at processRow() a bit more now?
[10:53:56] Carl Leitner: or do you want to look at the _processRow() in import_infinium?
[10:54:06] Sovello Hildebrand Mgani: i think I can pass through it later as most of its building blocks are clear now.
[10:54:11] Carl Leitner: ok. cool
[10:54:18] Sovello Hildebrand Mgani: may be we do the _processRow()
[10:54:25] Carl Leitner: ok.
[10:54:52] Carl Leitner: (as an aside, if you want to go back and add in documentaiton to the import_base class, that would be great and would be something you could bill to Botswana)
[10:55:39] Sovello Hildebrand Mgani: put a separate documentation or comment in the import_base.php file?
[10:56:24] Carl Leitner: (if you do so, use the phpdoc style: http://en.wikipedia.org/wiki/PHPDoc  and http://phpdocu.sourceforge.net/howto.php)
[10:56:30] Carl Leitner: comment in import_base.php
[10:56:34] Sovello Hildebrand Mgani: well... i think I'll have separate questions on this.
[10:57:14] Sovello Hildebrand Mgani: we can just proceed!
[10:57:14] Carl Leitner: it would be a good exercise and something that we need to do before we put import_base into I2CE
[10:57:17] Carl Leitner: ok.
[10:57:21] Carl Leitner: _processRow()....
[10:58:16] Carl Leitner: http://bazaar.launchpad.net/~ihris+botswana/ihris-manage/4.0/view/head:/tools/import_infinium.php#L123
[10:58:34] Carl Leitner: this is where we get into the logic of the infinium import for iHRIS Botswana
[10:58:56] Carl Leitner: have you seen a sample excel spreadsheet that we are importing from?
[10:59:15] Sovello Hildebrand Mgani: maybe not!
[10:59:33] Carl Leitner: OK.  let me see if I can find one to send you as it will help to explain with an example in front of us
[10:59:50] Sovello Hildebrand Mgani: sure!
[11:00:22] * Carl Leitner sent file List MOH events_WITH CITIZEN_NOV3.xls.
[11:01:07] Sovello Hildebrand Mgani: received!
[11:02:57] Carl Leitner: first off... if any of the rows fails to import (when processRow() returns false) , the import script will create a spreadsheet called  something like List MOH Events_with CITIZEN_NOV3.bad.mm_dd_yyyy_hh:mm.xls
[11:03:18] Carl Leitner: (which is why the DataFile class needed to know keep the file name around)
[11:03:35] Carl Leitner: anyways, looking at the .xls there is one important column:
[11:03:46] Carl Leitner: the "Transaction Type"
[11:04:07] Sovello Hildebrand Mgani: yes, seen that
[11:04:31] Carl Leitner: this excel spreadsheet has a whole bunch of "transactions" that orccured in the system.  these can be employee hire, empployee fire, employee detail change etc.
[11:04:51] Carl Leitner: so the logic on how we process each row is first determined by the "Transaction Type" column
[11:05:29] Carl Leitner: so looking at line 131 we are checking $this->mapped_data['transaction_type'] to see what to do
[11:05:39] Carl Leitner: have you see switch/case statements before?
[11:06:00] Sovello Hildebrand Mgani: yes
[11:06:25] Carl Leitner: ok.  so transaction EE is an employeeExit(), ST is an emplyee status change.
[11:06:54] Carl Leitner: PS I don't recall but we can probably figure it out from the code
[11:07:00] Carl Leitner: NE is new employee.
[11:07:16] Carl Leitner: all othere transaction codes are ignored.
[11:08:09] Carl Leitner: we can go into each of these individaul transactions in a moment, but before we do that, take a look at line 124-130
[11:08:41] Carl Leitner: on line 124, we first verifyData, meaning that we ensure that we have all the data we need to process this transaction (we can look at this in a moment)
[11:09:17] Carl Leitner: line 127 gets the "date" of the transaction.  for example, if a person changed position, this would be the data of the position change.
[11:09:32] Carl Leitner: if there is no date, we again fail (return false) in processing the row
[11:09:49] Carl Leitner: is this OK so far?
[11:09:56] Sovello Hildebrand Mgani: yes
[11:10:14] Carl Leitner: ok.  let's take a quick look at verifyData() defined on line 320
[11:10:43] Carl Leitner: first we look at the transaction type, which we store in $transaction for conveinence.
[11:11:16] Carl Leitner: next we look at the required columsn that we need for each transaction to see if they are there -- if not we fail
[11:11:22] Carl Leitner: the required columns are defined here:  http://bazaar.launchpad.net/~ihris+botswana/ihris-manage/4.0/view/head:/tools/import_infinium.php#L96
[11:11:41] Carl Leitner: for example for a New Employee, NE, we have:
[11:11:42] Carl Leitner:  'NE'=>array('omang','date','surname','forename','new_stat_code','dpsm_code','new_pos_title'),
[11:12:21] Carl Leitner: so we need the omang (which is like the national ID), the data, the surname, forename  the status code (new_stat_code) a dpsm_code and a position title
[11:12:39] Sovello Hildebrand Mgani: yes
[11:13:16] Carl Leitner: (DPSM= Department of Public Sector Mangamenet and is resposbible for all the paid-public sector employees in Botswana.  The code is the facility/department code that DPSM uses for budgetting/establishemnts)
[11:13:27] Carl Leitner: DPSM is kinda like PMORALG
[11:13:36] Sovello Hildebrand Mgani: ok!
[11:13:46] Carl Leitner: so.  verifyData() is OK?
[11:13:56] Sovello Hildebrand Mgani: yes
[11:14:08] Carl Leitner: ok.  so let's look at the processing of a New Employee.
[11:14:24] Carl Leitner: starting at line 164 http://bazaar.launchpad.net/~ihris+botswana/ihris-manage/4.0/view/head:/tools/import_infinium.php#L164
[11:15:03] Carl Leitner: line 165 tries to create a person object (instance of iHRIS_Person) for the new employee.
[11:15:20] Carl Leitner: (we will look at that in a minute)
[11:15:32] Carl Leitner: if it can't then _processRow() will fail.
[11:16:13] Carl Leitner: (on line 166 the break is a break out of the switch() statment .  we return $sucess which was set to false on line 130)
[11:16:42] Carl Leitner: line 168-172  is to handle the status code
[11:17:25] Sovello Hildebrand Mgani: yes
[11:17:27] Carl Leitner: what is happening here, is we first try to look to see if the status code is arleady in iHRIS Botswana.  If not then we ask the user if they want to add it on line 171)
[11:18:11] Carl Leitner: line 173 is to fail processing the row if for example the status code was not there and the user did not want to add it (there may have been a typo in the imported data or something)
[11:19:07] Carl Leitner: it is intended that the addStatCode() is proably only going to be called the first time or few that the script is run -- once all the used status codes are added it will not ask the user to add a new code
[11:19:28] Carl Leitner: we can come back to the stat codes in a minute.
[11:20:01] Carl Leitner: on line 175 -- we have failed adding the stat code and we call $personObj->cleanup()
[11:20:23] Carl Leitner: the I2CE_Form->cleanup() method is used to free the memory used in creating the person object
[11:20:43] Sovello Hildebrand Mgani: ok
[11:21:44] Carl Leitner: (a I2CE_Form object links to I2CE_FormFields objects, and I2CE_FormField objects link to the I2CE_Form object.  because of this circularity, the PHP garbage collector can not reclaim memory.  the cleanup() method is used to delete the links between I2CE_Form and I2CE_FormField instances)
[11:22:33] Carl Leitner: some details are here: http://www.php.net/manual/en/features.gc.collecting-cycles.php  and it is better in PHP 5.3, but in PHP 5.2 we need to be careful
[11:22:50] Carl Leitner: ok.  anways.
[11:23:21] Carl Leitner: line 178 calls setNewPosition()  which will create a position for that person, craete a person_positoin for that person and link everything together.
[11:24:22] Carl Leitner: notice that we set $success to the value that setNewPosition() returns so if this returns true, we have sucesffully processed the row.  otherwise it goes into our "bad" spreadsheet
[11:25:08] Carl Leitner: so we have three things to look at in more detail: the stat codes, the createPerson()  and the setNewPosition()
[11:25:13] Carl Leitner: which do you want to look at first?
[11:29:23] Sovello Hildebrand Mgani: the createPerson()
[11:30:05] Carl Leitner: ok.
[11:30:51] Carl Leitner: so createNewPerson() is being called without any arguments.
[11:31:22] Carl Leitner: (igore that)
[11:31:29] Carl Leitner: http://bazaar.launchpad.net/~ihris+botswana/ihris-manage/4.0/view/head:/tools/import_infinium.php#L492
[11:31:36] Carl Leitner: this is where createNewPerson() is defined
[11:32:00] Carl Leitner: first on line 493 we try to find the person by the Omang (national ID)
[11:32:57] Carl Leitner: since this is supposed to be a new person, line 494-497 returns false (so processRow returns false) as we found the person in the system who was not supposed to be there
[11:33:23] Sovello Hildebrand Mgani: ok
[11:33:36] Carl Leitner: line 500-508 is some small trickery.
[11:34:05] Carl Leitner: the column in the spreadsheet labeled Omang, is not just the omang number.
[11:34:16] Carl Leitner: the omang number is for Botswana citizens.
[11:34:57] Carl Leitner: however if the person is not an Botswana citizen, then the Omang column has an EXPATRIATE number
[11:35:14] Carl Leitner: the omang number is only digits.  the exapatriate number has a slash in it
[11:35:35] Sovello Hildebrand Mgani: seen that in the .xls
[11:35:44] Carl Leitner: so on line 500, we check to see if the omang number is really just digits, otherwise we assume it is an expatriate number.
[11:36:10] Carl Leitner: the id_type is the value of the field id_type in the person_id form
[11:36:58] Carl Leitner: omang_id_type has already been defined in iHRIS Botswana  and it happens to be id_type|1 http://bazaar.launchpad.net/~ihris+botswana/ihris-manage/4.0/view/head:/tools/import_infinium.php#L231
[11:37:03] Carl Leitner: (is this OK)
[11:39:07] Sovello Hildebrand Mgani: yes
[11:39:08] Carl Leitner: i mis-spoke above.  the "excemption" ID is the id number with a slash (see row 238 of the spreadsheet i sent for an example)
[11:39:44] Carl Leitner: but excemptions are for expatriates as well (if i recall) i think it depends if they have registered properly or not.
[11:39:58] Sovello Hildebrand Mgani: ok
[11:40:19] Carl Leitner: anyways.... line 235-243 we try to ensure we have a valid ID.
[11:40:43] Carl Leitner: sorry.  in the wrong place.
[11:41:09] Carl Leitner: i meant line 499-508
[11:41:23] Sovello Hildebrand Mgani: ok
[11:41:54] Carl Leitner: if we have a valid id, we can go a head and create a person object for that row
[11:42:08] Carl Leitner: this is what happens on 510-512
[11:42:40] Sovello Hildebrand Mgani: we create a person object including those with  exemption ids (line 506) no?
[11:43:20] Carl Leitner: line 506 (and line 501) is just recording the id type we have for the person.
[11:43:57] Carl Leitner: $id_type is used on line 521-523 where we create the person_id form for the person we just created
[11:44:45] Carl Leitner: 521:  we create the person_id form object using the form factory createContainer()
[11:45:41] Carl Leitner: line 522: we set the id_type field of the person_id form to be array('id_type',1)  for the omang number or array('id_type',2) for an excempetion id
[11:46:12] Carl Leitner: line 523:  we the the value of the id_number field of the person_id form to be whatever was in the "Omang" column for that row
[11:48:02] Carl Leitner: note:  id_type is a I2CE_FormField_MAP (or MAP) field.  it stores its value  in PHP as an array with two values.  the first value is the name of the form it is being mapped to, and the second is the value of the id for that form we are mapping to
[11:49:05] Carl Leitner: and on line 522 since $id_type = 'id_type|1' or $id_type = 'id_type|2',  we have that explode('|',$id_type) = array('id_type',1)
[11:49:23] Carl Leitner: http://no.php.net/manual/en/function.explode.php
[11:49:26] Carl Leitner: is this OK?
[11:49:41] Sovello Hildebrand Mgani: a minute...
[11:52:08] Sovello Hildebrand Mgani: yes
[11:52:13] Sovello Hildebrand Mgani: we now can proceed
[11:52:45] Carl Leitner: ok.
[11:53:11] Carl Leitner: so jumping back to line 514-518.
[11:53:28] Carl Leitner: we try to set the nationality if the nationality column exists.
[11:53:33] Carl Leitner: if not, we won't fail.
[11:54:20] Carl Leitner: $this->existing_codes is populated on line 189-196
[11:55:05] Carl Leitner: it contains an array of all of the two letter country codes in the country form.  these two letter codes are also how infinium exports the country.  it is a stanard:
[11:55:19] Carl Leitner: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
[11:56:43] Carl Leitner: so the $this->existing_codes is a map between the two digirt alpha codes (stored in the alpha_two field of the country form) and the id ( for example "country|1") of the country form with that code.
[11:57:23] Carl Leitner: so on line 516, we do an array search see if we can get the country|XXXX for the two letter alpha code
[11:57:37] Carl Leitner: is this OK?
[11:58:19] Carl Leitner: sorry.  the keys of the existing_codes are array are the XXX not he country|XXX
[11:58:20] Sovello Hildebrand Mgani: ye
[11:58:22] Sovello Hildebrand Mgani: yes
[11:58:43] Carl Leitner: so one line 517:  we set the value to be array('country', $countr_id)
[11:59:00] Carl Leitner: where $country_id = 1  from the array search.
[11:59:19] Sovello Hildebrand Mgani: ok
[11:59:22] Carl Leitner: ok.  so we now have set the values of all the fields of the two forms that we created.
[11:59:26] Carl Leitner: we need to do two things:
[11:59:41] Carl Leitner: 1) save the forms, 2) make sure that we set person_id as a child form of person
[12:00:14] Carl Leitner: so let's talk a minute about what happens when we make a form a child form of another form....
[12:00:36] Carl Leitner: what is happening is there is a "special" field called 'parent' for each form.
[12:00:45] Sovello Hildebrand Mgani: yes
[12:00:46] Carl Leitner: which work very much like a MAP field.
[12:01:14] Carl Leitner: it's value looks like "$form|$id"  for the parent form
[12:01:51] Carl Leitner: so at this point, just after line 524, we created the form objects but not saved them.
[12:02:13] Carl Leitner: for objects do not get unqiue ids until you save them
[12:02:23] Carl Leitner: (they all have an id of 0 until you save them)
[12:02:49] Carl Leitner: so on line 525, we save the $personObj which will create an id for the person.
[12:03:33] Carl Leitner: on line 526, we set the parent of the $idObj to be that coming from $personObj (which now has a non-zero id)
[12:04:00] Sovello Hildebrand Mgani: ok
[12:04:03] Carl Leitner: on line 527: now that we have set the parent of the $idObj, we can go ahead and save the $idObj
[12:05:00] Carl Leitner: we then return the $personObj we createed and which is used in the processRow()
[12:05:05] Carl Leitner: is this logic OK?
[12:11:06] Sovello Hildebrand Mgani: yes
[12:11:25] Carl Leitner: ok.
[12:12:00] Carl Leitner: there is one more thing we should explore in the createNewPerson(), the findPersonByOmang()
[12:12:22] Carl Leitner: which is used in a couple other places as well and is defined here:
[12:12:23] Carl Leitner:  http://bazaar.launchpad.net/~ihris+botswana/ihris-manage/4.0/view/head:/tools/import_infinium.php#L233
[12:13:01] Carl Leitner: line 234-243 gets the omang number and tries to figure out if it is Omang or Excemption just as we did before
[12:13:26] Carl Leitner: line 265 is where we search for a person with the given id.
[12:13:54] Carl Leitner:    $persons = I2CE_FormStorage::listFields('person_id',array('parent'),true,$where);
[12:14:05] Carl Leitner: here 'person_id' means we are looking for the peron_id form
[12:14:34] Carl Leitner: the array('parent') means we do the search, we want to return the values of the 'parent' field
[12:14:50] Carl Leitner: so listFields() is returning an array that looks like:
[12:15:49] Carl Leitner: array( 'person_id|1312' => array('parent'=>'person|231) , 'person_id|4534'=>array('parent'=>'person|88'))
[12:17:09] Sovello Hildebrand Mgani: ok
[12:17:15] Carl Leitner: if  listFields() returns an empty arary (with 0 things in it) the person was not found so we want to return false.  if the arry return by listFields contains more than one thing (as was with this example) that means more than one person was found with that id.  so that's bad too and return false
[12:17:35] Carl Leitner: so on line 267 we check that we have exactly one match.
[12:18:40] Carl Leitner: line 244-264 is where we defined the search limits sent to listFields() in the $where variable.  here we want to limit it to the id number and type that we read in from the spreadsheet
[12:19:51] Carl Leitner: is this OK?
[12:20:31] Carl Leitner: (here is the API for listFields() http://wiki.ihris.org/wiki/Class:_I2CE_FormStorage_%284.1.1%29#listFields.28.29 )
[12:22:10] Carl Leitner: the $where array is defined like here http://wiki.ihris.org/wiki/Limiting_Forms
[12:22:19] Sovello Hildebrand Mgani: yes, and I suggest we end up here.
</pre>

Revision as of 23:23, 6 January 2013

This article describes the import tools developed for Botswana. The same can be adopted for other projects and be edited as necessary depending on the structure of the data you have. The import tools for Botswana can be accessed in launchpad from here. Here there are many files, however we have one important file the import_base.php which forms the basis for all the other tools. We will refer to this file in this article as simply the Import Base. The Import Base defines the classes which are then implemented by all the other tools.

Import Base

This file can be organized into different sections. The first section being lines 39-74. This is the part where we set the system wide variables like database credentials, where we find the other tools like I2CE and Command Line Interfacing tools (line 74). If you check closely, this first part resembles the site index.php file.

DataFile Class

The DataFile class is the second part of import base. The DataFile class (lines 85-114) defines the abstract methods for the real classes to implement. It defines methods like getDataRow(), hasDataRow(), getHeaders() to fetch one data row at a time, to check if the file we are reading has data rows in it and to get the headers from the file respectively. We are assuming the file we are going to work with has its data organized into rows. The constructor initializes the file that we will be working on and we have a method to fetch the file name and to close the file after we have finished reading it.

CSVDataFile Class

This is the first implementation of the DataFile class defined in the second part of the import base.

It implements the DataFile class depending on the features of the CSV File. Its constructor calls the parent constructor to initialize the file we are going to import data from, reads the size of this file and makes sure the file is open for reading otherwise it fails. (lines 123-125).

The hasDataRow() methods checks to see if there is some data in the file we just opened in the constructor. ([php.net/manual/en/function.ftell.php ftell() method]). Should there be no data, we return false otherwise we return true that the file has data in it.

Then we go and read the headers from this file. We assume the headers to be in the first row of the file [php.net/manual/en/function.fseek.php fseek() method) and that the separators could be one of a tab, comma or semicolon. (lines 147 and 148)

Then we read the header fields into a variable $headers line 149. Lines 150-153 we check if the $headers array has no data in it, or there are only two headers and if the user (one running the script) replies no the the displayed headers for confirmation, then we remain at the first line of the file and skip.

We assign the separator found in the file to the class feature 'in_file_sep' and stop checking for more. Lines 157 - 159 we are making sure we have the separator used in the CSV file and then we remove any white spaces before and after each of the headers we have found. ([php.net/manual/en/function.trim.php trim() method]).

Then finally the function returns the headers we have found from the file.

The getDataRow() (lines 167-169) returns an array of data read from the file based on the separator we found above. ([php.net/manual/en/function.fgetcsv.php fgetcsv() function]). After we have done everything we close the file we opened.

ExcelDataFile Class

We also assume if the data file we are reading is not a CSV then it could be an Excel spreadsheet. This is accomplished using the PHPExcel library which must be installed before this is done lines 187-189.

What this does in the constructor is initialization of the file we are going to work on like creating the reader resource, setting the file to readonly as we process, loading the file and the active sheet from the worksheet, and setting the row iterator in this worksheet.

The other methods perform the same function like in the CSVDataFile class above, except we have added a helper method _readRow, which reads the data from a row into an array and later this is used by the getHeaders() and getDataRow() methods. [This is due to the limited versatility of the PHPExcel library to process stuff more easily). More about PHPExcel library can be read from [phpexcel.codeplex.com/ here].

Processor Class

This is the main class which does all the processing of the importing. Whenever you define a class to import some data, make sure you implement this class.

The convert() function (lines 263-267) is utility function to convert file file sizes into bytes, megabytes, gigabytes etc. depending on the size value we have.

The constructor of the class reads the file and determines whether it is a CSV or an Excel spreadsheet and initializes them accordingly lines 287-294 of the import base. Then it initializes the database user and the Form Factory instance, the bad file and test mode and creates the import logger. The import logger is a database table which keeps track of where the last import ended in case we didn't finish. The logger is somewhat for a specific file so that if we started processing a file and something happened in between before we finish we can proceed from where we ended.

The getCurrentRow() fetches one row at a time from the file and returns it for processing, the hasDataRow() makes sure we have rows in the file we are reading and returns true on success.

The run() method handles all the import processes giving the user choices whether to run the script in test or production mode giving options to skip some of the rows. It also checks to see if there are more than ten consecutive empty rows and asks the user whether to proceed or stop processing. It also checks to see, if the script is run in test mode (meaning no data is saved in the database) it echoes back Success on Test meaning if we were running it in production mode this data from this row would be imported successfully into the database.

The getStats() is one of the utility method to collect statistics for the import process. At the end of the import process, the script gives statistics for the success and failed imports.

The processRow() processes each row encountered in the file and returns true on success and false on failure. If a row is processed (lines 393-395) we return a success (true) and proceed. Lines 399-404 we check if processing of the row has been successful we mark that as being processed and set that in the import logger (markProcessed()) and return true, otherwise we return false: failure.

The setTestMode() is a helper function to help set the mode of the script whether to run in test or production mode.

The save() method saves the data into the database and echoes back the row number and the form where the data is being saved and the memory used so far. If the script is run in test mode then nothing gets saved into the database lines 429-431 otherwise it saves the data with the user who saved that data and asks whether to continue or stop.

The getHeaderMap() method does some mapping of the headers from the file we are importing data from and the internal headers that you will have defined for reference. 'Check its use here. The indices of the array are the internal references to the headers found in the file to be imported. And the mapHeaders() this now maps the internal references to the column headers in the file.

The initBadFile() method (lines 520-528) initializes a file where we keep record of the import process. Every row we fail to process gets saved in this file together with the reasons for failure by adding a column at the end of the import file we are working on.

The addBadRecord() method adds all the failures into the file which holds all the unsuccessful imports.

The createLogger() method creates the database table into which we keep track of all the rows that have been processed so that next time we run the same script on the same file we proceed from where we last ended.

The alreadProcessed() method confirms that the row we are now working on has been processed or not.

The markProcessed() method marks each row as processed into the logger table.

The getDate() function is a utility function to get the current data and format it, so that we can use it in the name of the bad records that we will create later for storing unsuccessful imports.