Applied regular expressions in PHP: Provisioning the Linksys PAP2T

190

Author: Colin Beckingham

The Linksys PAP2T is an analog telephone adapter (ATA) widely used in VoIP applications to connect an analog phone to a digital IP network. Some PAP2T units are locked and dedicated to a particular VoIP service. Others are capable of using a process called provisioning to ensure that important parameters remain fixed despite local attempts to change them. By employing open source tools such as PHP and MySQL, you can manage these latter kinds of units while they are out in the field.

Linksys is helpful in giving the overall approach to provisioning these units. The process consists of creating a special XML file that sits on a server, and instructing the unit to provision itself from that location. The XML file specifies the parameters that control the unit’s operation, and their values.

It sounds simple, but there are a couple of hurdles to be cleared. First, the parameter names need to be retrieved from the unit, and secondly there are more than 500 of them, so the process could be very tedious. We’ll use the scripting power of PHP and regular expressions to extract the parameters from the PAP2T unit along with the current values, store the values in a MySQL database, then use the database to generate the XML files required for units to reprovision themselves.

Extracting the parameters

Linksys instructs administrators to examine the source code of the HTTP interface to the unit for the parameter names. The names as displayed there are not exactly in the format we need in an XML context, so our script will have to do some editing. The source code also gives us the current values selected in the unit for each of the parameters. To extract them we will use PHP from the command line, although you can achieve the same effect with another scripting language.

First we need to cut out only the section of the HTML source from the ATA that refers to the parameters. We can get this by opening the unit interface in a browser, navigating to the Administrator and Advanced views, right-clicking on the page, and selecting the “View source” option for your browser. On my unit the relevant section begins at:

<TD align=right bgcolor=#0 height="25px"><FONT class=labelft>System Information</FONT>

and ends at:

<td>Ring On No New VM:<td><select class="inputw" name="59247"><option value="1">yes</option><option selected value="0">no</option></select>

If you can find the first string above but not the second, then one thing to check is that you are in Administrator and Advanced mode. We can extract the code, retaining whole HTML tags, from opening angle bracket to closing angle bracket, and save it in a file named cbata.txt. “cbata” will become a marker for this unit, but as a label it is quite arbitrary.

The script proceeds in a number of steps:

  1. Read the file contents into a string variable
  2. Remove all the labeling information
  3. Remove unnecessary fixed strings
  4. Remove unnecessary HTML, tidy up the result, and deal with data in two columns
  5. Change the parameter names to the format required by the unit and output the file to be read into MySQL

After each step the string gets smaller until it contains only the critical data, one parameter per line. The result will look something like:

NULL~DHCP~~Disabled~cbata NULL~Current_IP~~192.168.0.3~cbata NULL~Host_Name~~LinksysPAP~cbata ....... NULL~VMWI_Ring_Splash_Len_2_~59439~0~cbata NULL~VMWI_Ring_Policy_2_~59055~New VM Available~cbata NULL~Ring_On_No_New_VM_2_~59247~no~cbata

The fields are a record ID, the name of the parameter, the parameter ID number or tag (if specified), the value set in the unit for that parameter, and a marker for the unit. The tilde character (“~”) separates the fields.

You can see the entire script at the end of the article. You can save it on your machine as process_ata.php and run it with the command php process_ata.php.

Breaking the script into pieces to see what it does, here is the part for the first stage, which does nothing more than read the file into a variable. I have also showed the end of the script where the output file is generated. By calling the endnow() function at different places in the script you can halt the script early to check that the result at that point is as expected:

<?php // Extract parameters from PAP2T - Colin Beckingham 2008 $stem = "cbata"; echo "startn"; $newstr = file_get_contents($stem.".txt"); ... //--- more script to come in here --- ... endnow(); // slide this function call up and down the script as required to end the process early // function section function endnow() { global $stem, $newstr; if ($by = file_put_contents($stem."out.txt",$newstr)) { echo "OK $by n"; } else { echo "problemn"; } die("endn"); } ?>

This section uses simple string replacements for section titles:

// take out section titles $titles = array( 'System Information','Product Information','System Status','Line 1 Status','Line 2 Status', 'System Configuration','Internet Connection Type','Optional Network Configuration', 'SIP Parameters','SIP Timer Values (sec)','Response Status Code Handling','RTP Parameters', 'SDP Payload Types','NAT Support Parameters', 'Configuration Profile','Firmware Upgrade','General Purpose Parameters', 'Call Progress Tones','Distinctive Ring Patterns','Distinctive Call Waiting Tone Patterns', 'Distinctive Ring/CWT Pattern Names','Ring and Call Waiting Tone Spec','Control Timer Values (sec)', 'Vertical Service Activation Codes','Vertical Service Announcement Codes','Outbound Call Codec Selection Codes', 'Miscellaneous', 'Streaming Audio Server (SAS)','NAT Settings','Network Settings','SIP Settings','Call Feature Settings', 'Proxy and Registration','Subscriber Information','Supplementary Service Subscription','Audio Configuration', 'Dial Plan</FONT>','FXS Port Polarity Configuration', 'Selective Call Forward Settings','Call Forward Settings','Speed Dial Settings','Supplementary Service Settings', 'Distinctive Ring Settings','Ring Settings' ); foreach ($titles as $title) { $pattern = $title; $replacement = ''; $newstr = str_replace($pattern, $replacement, $newstr); }

Apart from section titles there are other fixed strings that we don’t need:

// take out irrelevant fixed strings $strs = array( '&nbsp;','</option>','</TR>','<TD>','</TD>','</font>','</FONT>',' colspan="3"','<COLGROUP>','</COLGROUP>', '</table>','</div>',' maxlength=255',' maxlength=2047',' size="15"',' size="50"', '<TD width=8 align=left height="100%" background="/UI_05.gif">', '<TD colspan=2 align=right bgcolor=#e7e7e7 height="100%">', '<TD align=right bgcolor=#0 height="25px">','<TD bgcolor=#0>', '<TD width=142 bgColor=#6666cc height="100%">'); foreach ($strs as $str) { $pattern = $str; $replacement = ''; $newstr = str_replace($pattern, $replacement, $newstr); }

Now we can take out the other unnecessary HTML. This section uses regular expressions in PHP to find the strings to be removed. We have to be careful here, since even though the HTML property “selected” shows where the default value for each parameter lies, the position of the “selected” property varies according to context. I am using some combinations of special characters such as “@@” and “%%” to act as placeholders; they will eventually be removed. I have also chosen the tilde as the field delimiter, since this character does not seem to be used in the parameter names or values as actual data. If this character were important in the data, we would require a different delimiter. This section also does some cleanup with regard to extra newlines:

// take out html and tidy up $pattern = '/<(IMG|COL|TR|FONT|TABLE|DIV)[^>]*>/i'; $replacement = ''; // line A $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = "</select>"; $replacement = "%%"; $newstr = str_replace($pattern, $replacement, $newstr); $pattern = ":<td>"; $replacement = "~"; $newstr = str_replace($pattern, $replacement, $newstr); $pattern = "/<td>/"; $replacement = "n"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = '/<(input|select) class="input(c|w)" name="/'; $replacement = "##"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = '" value="'; $replacement = "%%%"; $newstr = str_replace($pattern, $replacement, $newstr); $pattern = '/"><option.*selected[^>]*>/'; $replacement = "@@"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = '/<option.*%%/'; $replacement = "$$"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = "/(n)+/"; $replacement = "n"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = "/ns*n/"; $replacement = "n"; $newstr = preg_replace($pattern, $replacement, $newstr);

As an example of a regular expression, line A in the above code looks for all the IMG, COL, TR, FONT TABLE, and DIV tags, including the initial angle bracket, which can be followed by any number of other characters as long as it is not a right angle bracket. At the first right angle bracket the substring ends, and the whole thing is replaced with a zero-length string. The final ‘i’ outside the pattern delimiters “/…/” indicates that the included characters can be either upper or lower case.

The final step deals with temporary placeholders and edits the parameter names to replace spaces with underscores. Since the PAP2T has facilities and therefore parameters for two lines and two users, it also deals with these issues by adding suffixes to the parameter names where necessary:

// $tempsa = array('@@','~##','%%%'); foreach ($tempsa as $ta) { $pattern = $ta; $replacement = '~'; $newstr = str_replace($pattern, $replacement, $newstr); } $tempsb = array('">','$$','%%'); foreach ($tempsb as $tb) { $pattern = $tb; $replacement = ''; $newstr = str_replace($pattern, $replacement, $newstr); } $arr = explode("n",$newstr); $temp = array_shift($arr); $temp = array_pop($arr); $brr = array(); $line = 0; $user = 0; foreach ($arr as $a) { $pos = strpos($a,"~"); $aa = substr($a,0,$pos); $aa = str_replace(" ","_",$aa); if ($aa == "Line_Enable") { echo "LEn"; $line++; } if ($aa == "Cfwd_All_Dest") { echo "CADn"; $line = 0; $user++; } if ($line > 0) $suff = "_".$line."_"; if ($user > 0) $suff = "_".$user."_"; if (strpos(substr($a,$pos + 1),"~")) { $brr[] = "NULL~".$aa.$suff.substr($a,$pos)."~$stem"; } else { $brr[] = "NULL~".$aa.$suff."~".substr($a,$pos)."~$stem"; } } $newstr = implode("n",$brr);

Running this script should output a line-by-line list of the unit’s parameters in the file cbataout.txt. The list will contain two types of lines: one with a parameter ID number in the third field, and those with no parameter number in that field. For provisioning we are only interested in entries with a parameter number, since only these can be understood by the unit and reset.

No doubt there are other quicker ways to achieve the same objective in fewer steps, but taking lots of small steps allows us to examine the output in incremental stages to see that the output is as required.

Transferring the data to MySQL

Now that we have the raw data, we can move on to putting it into MySQL. Here is a suggested table outline; we can create this table in a new or existing database according to need:

CREATE TABLE IF NOT EXISTS `pap` ( `id` int(11) NOT NULL auto_increment, `name` varchar(100) NOT NULL, `intid` varchar(20) NOT NULL, `value` varchar(255) NOT NULL, `ataid` varchar(20) NOT NULL, `include` tinyint NOT NULL default 0, PRIMARY KEY (`id`) ) ENGINE=MyISAM;

The fields represent, in order, a unique identifier for the benefit of MySQL, the name of the parameter, the Linksys identification number of the parameter (where there is one), the value the parameter should have, the identifier for the ATA unit, and whether we should include this parameter and value in the output or not, with 0 for no and 1 for yes. Note that the fields apart from the ID are declared as strings. This is because some of the data coming in, while it appears to be consistently numeric, occasionally contains some non-numeric character information that would otherwise not be imported correctly. We can save the above schema in a file such as pap_schema.sql. Now from the bash prompt we run mysql -u xxxxx -p name_of_pap_database, which gets us into the MySQL interactive prompt, and create the table with the above command. Then:

mysql> load data infile '/path..to..file/cbataout.txt' into table pap fields terminated by '~';

loads the data. We may get some warnings generated since we are not explicitly filling the “include” field but allowing the default to enter a zero.

With the data in a table we can edit values as necessary. Also, if other ATA units are examined in the same way and added to the database with a different identifier in the last column, we have a means of subsetting the data as required.

Output to the XML provisioning file

All that remains now is to output the data from the table into an XML file that follows the Linksys format. Since all the database records are marked by default as include=0 (do not include in output file), we have to modify a select few items to include=1 to generate some example entries. My suggestion is to proceed slowly and carefully, including a few parameters at a time in the XML file to see that the effect is as desired. Some general-purpose parameters whose names begin with “GPP” are handy for initial testing. These can be seen from the database with the query:

select * from pap where ataid='cbata' and intid != '' and name like "GPP%";

and edited with your usual database editor, such as PHPMyAdmin.

While PHP contains an extensive set of functions for dealing with XML, since the format of the required XML is simple I have taken the straightforward strings route to generate the needed file. I have called this script pap2xml.php:

<?php // get info from pap ata table and make xml file for provisioning $ata = 'cbata'; $content = ''; $link = mysql_connect('server','user','password'); if ($link) { if (mysql_select_db($_name_of_your_pap2t_database_)) { // fetch only rows where there is a parameter id and we ask for it with include is yes $sql = "select * from pap where ataid='$ata' and intid != '' and include=1"; if ($result = mysql_query($sql)) { $content .= "<?xml version='1.0' encoding='ISO-8859-1'?>n"; $content .= "<flat-profile>n"; while ($row = mysql_fetch_array($result)) { $tagname = preg_replace("/[/()]/","_",$row[1]); echo $tagname."n"; //echo $content."n"; $content .= " <".$tagname.">"; $content .= $row[3].""; $content .= "</".$tagname.">n"; } $content .= "</flat-profile>"; $ataxml = $ata.".xml"; if ($by = file_put_contents($ataxml,$content)) { echo "Wrote $by bytes to $ataxmln"; } else { echo "Write problemn"; } } else { die(mysql_error()); } } else { die(mysql_error()); } } else { die(mysql_error()); } ?>

This script should produce a file in XML format ready to be stored on the server in a publicly accessible place. When we instruct the ATA to read this file on a regular basis, we have achieved autoprovisioning.

Conclusion

PHP regular expressions can be a useful way of reducing a complex string to its essential components. Depending on the complexity of the string, the script needs to be carefully constructed, and the order of events is particularly important. In the case of provisioning the ATA units, with this script in place an administrator has a lot more control over the behavior of ATA units out in the field. It may be possible to extract parameter values from other units in a similar manner.

If you decide to follow this route, check that your result will be as anticipated — I’d hate to have you hold me responsible for “bricking” your ATA! And if you have suggestions for improvement of this approach, let us know with a comment.


Here is the entire first section:

<?php // Extract parameters from PAP2T - Colin Beckingham 2008 $stem = "cbata"; echo "startn"; $newstr = file_get_contents($stem.".txt"); // take out section titles $titles = array( 'System Information','Product Information','System Status','Line 1 Status','Line 2 Status', 'System Configuration','Internet Connection Type','Optional Network Configuration', 'SIP Parameters','SIP Timer Values (sec)','Response Status Code Handling','RTP Parameters', 'SDP Payload Types','NAT Support Parameters', 'Configuration Profile','Firmware Upgrade','General Purpose Parameters', 'Call Progress Tones','Distinctive Ring Patterns','Distinctive Call Waiting Tone Patterns', 'Distinctive Ring/CWT Pattern Names','Ring and Call Waiting Tone Spec','Control Timer Values (sec)', 'Vertical Service Activation Codes','Vertical Service Announcement Codes','Outbound Call Codec Selection Codes', 'Miscellaneous', 'Streaming Audio Server (SAS)','NAT Settings','Network Settings','SIP Settings','Call Feature Settings', 'Proxy and Registration','Subscriber Information','Supplementary Service Subscription','Audio Configuration', 'Dial Plan</FONT>','FXS Port Polarity Configuration', 'Selective Call Forward Settings','Call Forward Settings','Speed Dial Settings','Supplementary Service Settings', 'Distinctive Ring Settings','Ring Settings' ); foreach ($titles as $title) { $pattern = $title; $replacement = ''; $newstr = str_replace($pattern, $replacement, $newstr); } // take out irrelevant fixed strings $strs = array( '&nbsp;','</option>','</TR>','<TD>','</TD>','</font>','</FONT>',' colspan="3"','<COLGROUP>','</COLGROUP>', '</table>','</div>',' maxlength=255',' maxlength=2047',' size="15"',' size="50"', '<TD width=8 align=left height="100%" background="/UI_05.gif">', '<TD colspan=2 align=right bgcolor=#e7e7e7 height="100%">', '<TD align=right bgcolor=#0 height="25px">','<TD bgcolor=#0>', '<TD width=142 bgColor=#6666cc height="100%">'); foreach ($strs as $str) { $pattern = $str; $replacement = ''; $newstr = str_replace($pattern, $replacement, $newstr); } // take out html $pattern = '/<(IMG|COL|TR|FONT|TABLE|DIV)[^>]*>/i'; $replacement = ''; $newstr = preg_replace($pattern, $replacement, $newstr); // make adjustments to find data $pattern = "</select>"; $replacement = "%%"; $newstr = str_replace($pattern, $replacement, $newstr); $pattern = ":<td>"; $replacement = "~"; $newstr = str_replace($pattern, $replacement, $newstr); $pattern = "/<td>/"; $replacement = "n"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = '/<(input|select) class="input(c|w)" name="/'; $replacement = "##"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = '" value="'; $replacement = "%%%"; $newstr = str_replace($pattern, $replacement, $newstr); $pattern = '/"><option.*selected[^>]*>/'; $replacement = "@@"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = '/<option.*%%/'; $replacement = "$$"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = "/(n)+/"; $replacement = "n"; $newstr = preg_replace($pattern, $replacement, $newstr); $pattern = "/ns*n/"; $replacement = "n"; $newstr = preg_replace($pattern, $replacement, $newstr); // $tempsa = array('@@','~##','%%%'); foreach ($tempsa as $ta) { $pattern = $ta; $replacement = '~'; $newstr = str_replace($pattern, $replacement, $newstr); } $tempsb = array('">','$$','%%'); foreach ($tempsb as $tb) { $pattern = $tb; $replacement = ''; $newstr = str_replace($pattern, $replacement, $newstr); } $arr = explode("n",$newstr); $temp = array_shift($arr); $temp = array_pop($arr); $brr = array(); $line = 0; $user = 0; foreach ($arr as $a) { $pos = strpos($a,"~"); $aa = substr($a,0,$pos); $aa = str_replace(" ","_",$aa); if ($aa == "Line_Enable") { echo "LEn"; $line++; } if ($aa == "Cfwd_All_Dest") { echo "CADn"; $line = 0; $user++; } if ($line > 0) $suff = "_".$line."_"; if ($user > 0) $suff = "_".$user."_"; if (strpos(substr($a,$pos + 1),"~")) { $brr[] = "NULL~".$aa.$suff.substr($a,$pos)."~$stem"; } else { $brr[] = "NULL~".$aa.$suff."~".substr($a,$pos)."~$stem"; } } $newstr = implode("n",$brr); endnow(); function endnow() { global $stem, $newstr; if ($by = file_put_contents($stem."out.txt",$newstr)) { echo "OK $by n"; } else { echo "problemn"; } die("endn"); } ?>

Category:

  • PHP