Migrating Code from Perl to Python

About

Getting to work on small perl scripts

Intro to the series

This is starting out as a journal of transforming my large collection of perl code into Python. I also intend that over time, it will be more and more useful as a guide for others.

At the beginning, at least, there will be a lot of plodding through manual conversions of various initially small programs, developing a methodology as I go along.

I try the program LOGS0:

    #!/usr/local/bin/perl -w
    my $LogName = $0;   # <-- $0== Name by which the program was called,
                #   possibly /usr/local/bin/{daily|Notebook|reading}
    $LogName =~ s{.*/}{};   # <-- This reduces it to the basename.
    exec "/usr/local/bin/LOGS $LogName @ARGV";
     # : Invoke (e.g.) $LB/LOGS Notebook where Notebook is the basename
    There are several links to LOGS0 representing logs for different
    purposes.

A quick look shows this file has 4 real (not symbolic) links; as I began to suspect, it is one of a family of programs (in UNIX, this was the original practice for mv, cp, ln, so as to give them much the same argument syntax. A program that has several real links can tell the one by which it was called by looking at $0.

    $ ls -li $LB/LOGS0
    3178007 -rwxrwxr-x 4 hal hal 110 Nov  3 19:31 /usr/local/bin/LOGS0
    [hal@DeWitt 22]$ ls -li $LB|grep 3178007
    3178007 -rwxrwxr-x  4 hal  hal      110 Nov  3 19:31 daily
    3178007 -rwxrwxr-x  4 hal  hal      110 Nov  3 19:31 LOGS0
    3178007 -rwxrwxr-x  4 hal  hal      110 Nov  3 19:31 Notebook
    3178007 -rwxrwxr-x  4 hal  hal      110 Nov  3 19:31 reading
  • First, perl2python completely fails, revealing an embarassing bug.
    $ perl2python LOGS0
    Use of uninitialized value $_ in pattern match (m//) at /usr/local/bin/perl2python line 179, <> line 6.
  Here is the referenced line and some of its context
    175 #ARRAY HANDLING
    176 #array conversion
    177         }elsif ($line =~ /^\s*(.*)\s*\@(.*)\s*(.*)\s*;$/) { #array in the line  
    178                 my $arrayName = $2;
    179                 if (/^\s*(.*)\s*\@(.*)\s*=\s*((.*))\s*;$/) { #declaring the array # <--- <---
    180                         $line =~ s/\@//;

Looking closely (and maybe with the aid of some debugging tricks), we see:

        177         }elsif ($line =~ /^\s*(.*)\s*\@(.*)\s*(.*)\s*;$/) { #array in the line
  then
    179                 if (/^\s*(.*)\s*\@(.*)\s*=\s*((.*))\s*;$/) { #declaring the array

It's odd to be applying a RE explicitly to $line (with the $line =~ /RE/ idiom) and then then invoke a very similar RE IMPLICITLY on the default line: $. And the error message plainly says $ was uninitialized. Was that an oversight? Should line 179 look more like line 177.

Stepping through the code will reveal that l.177 did something reasonable, while at l.179, we were applying an RE to an uninitialized $_. The fix might be, and is, to add "$line =~ " just after the "(" on line 179

Now we get a more reasonable approximation of what we'd want:

    #!/usr/bin/python2.7 -u

    #my $LogName = $0;

    LogName =~ s{.*//}{}

    exec "/usr/local/bin/LOGS $LogName ARGV"

BUT "my $LogName = $0" is commented out. THIS MEANS that perl2python threw up its hands, probably at "$0".

I've examined a lot of cases, and found nothing that perl2python can really translate. E.g. in SortPG:

  • $/ not understood
  • <> not understood, or maybe understood in some context.
  • my not understood - at least in cases I've seen, it is retained.
  • Next 2 lines are commented out:

      #$Lines[-1] .= "\n" unless $Lines[-1] =~ /\n\n\Z/;
      #print('"', $Lines[-1], '"'); exit;
    
  • RE not understood
  • LAST LINE OK, maybe (print Lines).

SO, is there any point in trying to salvage such a project?


OK, I'm going to look for good candidates for manual conversion, among my scripts, starting with the easy cases:

pggrep

    #!/usr/bin/perl -w

    $/ = '';        # Record unit is a paragraph. SEE POWERFUL PYTHON.

    my $REGEX = shift;

    while(<>) {
            print if /$REGEX/m;
    }

It selects PARAGRAPHS, rather than lines that match the pattern. A paragraph is a run of text with no empty line. Empty lines demarcate paragraphs. I am already working on a regression test for this in $BD/PERL2PYTHON.

In pggrep, we must deal with 4 tricky things: $/ = '', <>, shift, and /$REGEX/

<>: The original perl2python handles the special case of while(<>) ,

 13  # Insert in place of "<>" -- Under the right conditions that is
 14 sub UniversalInput {
 15 
 16 $HeaderOup .=
 17 '
 18 import fileinput
 19 
 20 def UniversalInput:
 21   yield (fileinput.input())
 22 
 23 ';

this is an attempt to replace <> whereever it occurs with (UniversalInput()), an extension (eventually) of fileinput.input() after we "import fileinput". Like <>, it will read multiple files successively, or read from STDIN if there are no files to read from.

$/ = '': $/ is one of those weird perl special variables which modifies how input is read. It seems like we should handle it by modifying now UniversalInput() works.

/$REGEX/ will be a major challenge. Maybe one way to get around the weird complexities of perl is to add helpful comments for the parser to read. Maybe in some cases if /....../ or s/.../.../ appears and it's not an RE match, we say "# NOT RE". Then we can treat the '/'s as division signs or whatever. Then we have the variants like s!fr/om!to| for case when the RE or a replacement field contains a '/'. And the even more exotic:

    m{...} but instead of {}, there can be <>, (), [] (at least!)
AND s{...}{...}.

shift means one thing inside of a subroutine, namely to pick off the 1st element of @_ inside of a subroutine. When NOT inside of a subroutine, it means pick off the next element of @ARGV.

OK next victim; what will perl2python make of:

1.  $" = '","';
2.  chomp (my @SetMembers = <>);
3.  print '("', "@SetMembers", '")';
????ANSWER:
1.  " = '","'
2.  chomp [my SetMembers = (UniversalInput(]))
3.  print '("', "SetMembers", '")'

Line 1 is another magic special variable. It specifies the separator between elements when an array like @SetMembers is expanded inside of double quotes (or equivalent). What we want it to input:

    ab
    cd
    ef

and output ("ab","cd","ef")

In a perl print statement with multiple items separated by ',', it looks like we should subsitute '+' for the ","s. EXCEPT we'll have to treat any number like 100 as str(100); in general if we find a number in string context we should put it inside str(). Easier said than done.