In this post we will focus on how to modify the contents of an HTML file by jogging a Perl script on it.
The file we are going to course of action is named file1.htm:
Notice: To make sure that the code is shown effectively, in the case in point code proven in this post, square brackets ‘[..]’ are utilised in HTML tags in its place of angle brackets ”.
[head]Applying Perl and Normal Expressions to Procedure Html Files – Section 2Sample HTML File[/title]
[p]Welcome to the entire world of Perl and normal expressions[/p]
[table border=”1″ width=”400″]
[tr][th colspan=”two”]Programming Languages[/th][/tr]
[tr][td]Perl[/td][td]Processing HTML data files[/td][/tr]
Think about that we will need to modify each occurrences of [h1]heading[/h1] to [h1 class=”massive”]heading[/h1]. Not a massive modify and a thing that could be very easily finished manually or by carrying out a simple lookup and switch. But we are just finding begun right here.
To do this, we could use the adhering to Perl script (script1.pl):
1 open (IN, “file1.htm”)
two open (OUT, “>new_file1.htm”)
three when ($line = [IN])
4 $line =~ s/[h1]/[h1 class=”massive”]/
five (print OUT $line)
7 near (IN)
8 near (OUT)
Notice: You you should not will need to enter the line numbers. I have incorporated them only so that I can reference individual strains in the script.
Let’s glimpse at each and every line of the script.
In this line file1.htm is opened so that it can be processed by the script. In get to course of action the file, Perl employs a thing named a filehandle, which presents a form of backlink involving the script and the operating process, that contains information and facts about the file that is staying processed. I have named this “opening” filehandle ‘IN’, but I could have utilised nearly anything within just explanation. Filehandles are typically in capitals.
This line results in a new file named ‘new_file1.htm’, which is composed to by working with yet another filehandle, OUT. The ‘>’ just before the filename suggests that the file will be composed to.
This line sets up a loop in which each and every line in file1.htm will be examined separately.
This is the normal expression. It lookups for a single prevalence of [h1] on each and every line of file1.htm and, if it finds it, changes it to [h1 class=”massive”].
On the lookout at Line 4 in much more depth:
- $line – This is a variable that is made up of a line of text. It receives modified if the substitution is successful.
- =~ is named the comparison operator.
- s is the substitution operator.
- [h1] is what demands to be substituted (replaced).
- [h1 class=”massive”] is what [h1] has to be altered to.
This line requires the contents of the $line variable and, by way of the OUT file deal with, writes the line to new_file1.htm.
This line closes the ‘while’ loop. The loop is recurring right up until all the strains in file1.htm have been examined.
Traces 7 and 8
These two strains near the two file handles that have been utilised in the script. If you missed off these two strains the script would even now operate, but it can be fantastic programming apply to near file handles, as a result releasing up the file deal with names so they can be utilised, for case in point, by yet another file.
Managing the Script
As the goal of this post is to make clear how to use normal expressions to course of action HTML data files, and not automatically how to use Perl, I you should not want to shell out too long describing how to operate Perl scripts. Suffice to say that you can operate them in several strategies, for case in point, from within just a text editor this kind of as TextPad, by double-clicking the perl script (script1.pl), or by jogging the script from an MS-DOS window.
(The spot of the Perl interpreter will will need to be in your Path statement so that you can operate Perl scripts from any spot on your pc and not just from within just the listing wherever the interpreter (perl.exe) by itself is put in.)
So, to operate our script we could open an MS-DOS window and navigate to the spot wherever the script and the HTML file are situated. To keep life simple I have assumed that these two data files are in the identical folder (or listing). The command to operate the script is:
If the script does operate (and hopefully it will), a new file (new_file1.htm) is designed in the identical folder as file1.htm. If you open the file you can expect to see the the two strains that contained [h1] tags have been modified so that they now study [h1 class=”massive”].
In Section three we are going to glimpse at how to deal with various data files.