Cameron Laird's personal notes on "Regular Expressions"
Table of contents
Let's distinguish three meanings of "regular expressions" (REs):
- Kleene [Church, Turing, ...] ... [I still need to explain (categorically!)]
- related pattern grammar used in
computing languages, especially
as realized by such pioneers as
Henry
Spencer; standard
references include
Incidentally,
there are a few interesting and useful tools
for practicing or understanding specific REs:
- Expresso [.NET?]
- a handful of [JavaScript]-oriented tools, including:
- jsregex
- Kodos is coded
in Python
- Komodo's Rx Toolkit
- redet
- regex101
- Proprietary
RegexBuddy
is good enough
to support such testimonials as
this and
this (both from the same person, as it happens).
RegexPal
is a Web-based version of RegexBuddy, oriented to the JavaScript
(XRegExp) RE syntax. There is an associated
highlighter.
- Regex
Tester tests, debugs, visualizes, and explains regular expressions,
with a selection for JavaScript, Perl-PHP, Python, or Ruby dialects.
It seems to be well-maintained, as of 2020. Notice that sponsor
ExtendsClass
provides dozens of other useful online tools for developers.
- regextester
is an entirely different on-line application with a far-too-similar
name.
- James Hague, author of
"Programming
in the 21st Century", recommends
RegEx Coach.
- RegExExpress
is a Web application, as is
- Regex Hero. The latter
emphasizes ".NET regular expressions", code generation, and
a lot of interactivity.
- Regular Expressions Checker is an add-on
for Chrome.
- Flex-based
RegExr
- regexplained
- Regular
Expressions Tester is an add-on for Firefox.
Here are comments on use of the 3.0 release.
- Regulator
- The Regulazy Web announcement
has clever displays.
- Rubular
- The motto of strfriend
is to "visualize regular expressions simply".
- Wes Bailey's tkREM
- TREV
"demonstrates how a regular expression matches text ..."
- txt2regex
- txt2re:
headache relief for programmers
- Visual REGEXP
- ....
A related, but distinct, question is,
"What tutorials for REs
are available?"
I might review these, along with the tools,
some day. In the meantime, the question
often arises, "Which is best?" The short answer: they all are.
Every one of these has advantages in one situation or another ...
Still more categories of reference we'll add here
soon:
- clever regex tricks;
- regex on Twitter, including the daily RegexTip;
- ...
- The name of a mostly-monthly column on
scripting languages and
effective programming
written since 1998 mostly by
Cameron Laird
and Kathryn Soraiz.
The column does not focus exclusively on REs, although
there's usually one installment devoted to the topic every
second year or so. Why the name, then? It seemed like a
good idea at the time: we publish periodically or regularly,
we think a lot about the expressiveness of different
computing languages, and most of the languages we cover happen
to employ REs prominently.
There are also, as it happens, several of
"Irregular Expressions":
The column originally began as "Regular Expressions" [tell the story
of the name selection] for
SunWorld Online beginning in mid-1998.
SunWorld Online, arguably the first professional
Web magazine, folded into UnixInsider in mid-2000.
UnixInsider changed its business model several times,
most dramatically in early 2001, and last ran an installment of
the column in March 2001.
ITworld.com picked up three final installments in April and May 2001.
We took a
vacation from "Regular Expressions" for the summer, then relaunched
the column
in August 2001 for
UnixReview.com.
UnixReview changed its business model in mid-2007, and
we took a sabbatical until
Linux
Developer Network began to host
"RE" at the end of summer 2008.
UnixReview.com did a good job
indexing the RE columns it has published.
SunWorld Online
maintained--somewhat erratically--a minimal
index
to the twice-a-month column. Or maybe it doesn't; the
previous URL seems to have gone badly stale, and
this
isn't much of a substitute. In any case,
Jean-Claude Wippler
agitated for a more extensively annotated table of contents
to the column [IMPORTANT! As of April 2001, several of these
hyperlinks have gone bad. I'm aware of it, and doggedly working to
restore them.
Write me
if there's one in particular you need]:
- September 2009: ...
- August 2009: "Simple Programmable Splash Screen"
- July 2009: "The
Latest JavaScript"
- June 2009:
"Untaught
XML Schema" and
"Learn
SQL";
- May 2009: this month seems to be about things that can go wrong, with
"The
Importance of Being Valid", on RNC and XSD as
improvements on DTD, as well as
"Database
Defects";
- April 2009: "More than a Whim"
- February 2009: "D-Bus makes for application teamwork and
"E-mail Disintegrating";
- January 2009: "Debuggers and Debugging" and
"Amplify Web Apps with Æjaks"
- December 2008: "High-level
device-driver development" and
"News
from the world of Python",
mention of which also appeared
here;
- November 2008: "Scripting
Languages Play Role in LSB" and
singleton applications;
- October 2008: "Python
gets physical" and
"OIP
an Electrical Engineer's Open-Source Dream";
- September 2008: "What's wrong with Erlang?" and
"Tcl Simplifies Kernel Programming", about
tcl-fuse;
- July 2007-August 2008: we took a break;
- June 2007: "Python's Mechanization" mentions
Mechanize
as an example of good object orientation;
- May 2007: "Good
Works with Real Databases";
- April 2007: "Tuple spaces help organize concurrency solutions";
- March 2007: "Stored Data Need Protection";
- February 2007: "Tcl
Scores High in RE Performance" is another debunking attempt; this
one illustrates several of the mistaken beliefs circulating
around regular expressions;
- January 2007: "Sprints" also touches on data quality;
- December 2006: "Reliably multithread with Erlang";
- November 2006: "An Imagined JavaScript Conversation";
- October 2006: "Tile makes Tcl look good";
- September 2006: "PHP
multi-tasks". Note that improvements on the code and
more detailed explanations appear in
"Develop
multitasking applications with PHP V5";
- August 2006: "wxPython makes GUI programming simple and fun";
- July 2006: "Hello, world";
- June 2006: "Dictionary Skills";
- May 2006: "High-level languages are configuration languages";
- April 2006: "CherryPy proves its worth" introduces
a Python-based Web framework and templater;
- March 2006: "Experience Teaches Lessons for Team Projects"
emphasizes that "real-world" projects need logging, introspection,
and configurability more than your project partners will realize;
- February 2006: "Subtleties of good style", on writing language X in language Y;
- January 2006: "Rexx
Still Going Strong";
- November 2005: "Networking's Easier than Programmers Realize";
- October 2005: "Getting Started with SCons";
- September 2005: "Two Easy Steps Better Than One Hard One", on "partial
evaluation" or templating or ...;
- August 2005: "Don't fear reliability";
- June 2005: "Giving New Life to Legacy Code Using SWIG, on which Miki Tebeka
took the lead;
- April 2005: "Software by the People, for the People" reflects on PyConDC2005;
- March 2005: "Ten
Years for Overnight Success" mentions "Ajax" and other
modern uses of PHP and JavaScript;
- February 2005: "Resource Management";
- January 2005 and December 2004 saw no installments;
- November 2004: "Better in Every Way"
proclaims the "strong form" of Scripting;
- October 2004: "Ethnographic fallibility"
- September 2004: "As Time Goes By";
- August 2004: "Three
Reasons to Pay Attention to Applescript", what one reader called
"... a useful intro to AS";
- July 2004: "Object-Oriented
Tcl";
- June 2004: "Pyrex
Gives Best of Two Worlds";
- May 2004: "Lua
Shines" comments on The Lua Book;
- April 2004: "Rapid
Development of An Assembler Using Python", co-authored
with Miki Tebeka;
- February 2004: "Programming Down to the Silicon" introduces
the HLA assembly-language system;
- October 2003: "Fit
makes for good tests", on Dave Thomas' Rubified implementation of the
XP-oriented testing framework Fit;
- September 2003: "Tcl renewal", on Tcl'2003, the Tenth Tcl Conference;
- August 2003: "Primitive uploads";
- July 2003: "Is
factorization of scripts different?";
- June 2003: "Think about form, write better applications", about
small-scale application architecture;
- May 2003: "Inline Web images";
- April 2003: "Low-cost PDF", on the PDF::API2 from Perl's CPAN;
- March 2003: "Catching up", a collection of references
to items on ...;
- February 2003: "Web Scraping is Easy";
- January 2003: "Compromises", on how to do things wrong;
- December 2002: "Yorick Plays a Role";
- November 2002: "The
Limits of Regular Expressions", read by some as an
amplification of Jamie Zawinski's famous 1999 Wildean
observation
about how RE's create problems;
- October 2002: "Be Good to Your Objects", on making the
most of object orientation, by using it minimally;
- September 2002: "What is embedding?;
- August 2002: "Yes You Can"
is about technologies that can improve your presentations;
- July 2002: "Economy of Means" focuses attention on software engineer
Richard Suchenwirth, the Tcl-ers Wiki, and what they can teach
about "light-weight programming";
- June 2002: "PHP Handy off the Web, Too";
- May 2002: "Lua Lights Up Telecom Testing", which hints
at what scripting really means;
- April 2002: "Syntax Checking the Scripting Way",
about static syntax analysis, with particular
attention to Pychecker;
- March 2002: "Erlang is worth a look";
- February 2002: "What You Should Know about Tk";
- January 2002: "curl Simplifies Web Retrieval";
- November 2001: "What You Should Know About Perl 6";
- September 2001: "What's so special about Python 2.2?";
- August 2001: "Tcl and Database Managers -- A Survey", our first
RE column for UnixReview;
- May 2001: "Manage CORBA with
scripting", on the theme of wrapping complexity
with scripting interfaces;
- mid-April 2001: "Simple Email Filters You Can Script"
introduces
.forward as a key concept for filtering
inbound personal items;
- April 2001: "Simple email server tricks",
automation of e-mail transmissions, including attachments. Also,
an update on TkGS;
- mid-March 2001: "ksh keeps up",
KornShell, Ruby threads, and ActiveState awards;
- March 2001: "More than just English", about starting to
use alphabets other than the Latin one, with Unicode;
- mid-February 2001: "Pulling different threads", which
illustrates the point that languages differ in esoteric
capabilities with an examination of threading models and
implementation in Perl, Python, and Tcl;
- February 2001: "Which language is right for you?";
- mid-January 2001: "Scripted wrappers for legacy applications, Part 3:
Extending scripting languages with C", ...;
- January 2001: "Scripted
wrappers for legacy applications, Part 2",
including comments on progressbars, script decomposition,
and other matters related to long-running (sub)processes;
- mid-December 2000: "Scripted wrappers for legacy applications"
is about controlling existing command-line applications
written in C, Fortran, Java, ... with Perl, Python, Tcl, ...;
- December 2000: "Better living through scripting" praises
David Roth's book on the use of Perl for Win32 system
administrators, and salutes SourceForge's takeover of
open-source culture;
- mid November 2000: "Scripting systems unite"
provides more details on Scheme, and especially on
Java-scripting Schemes;
- early November 2000: "Specialty scripting languages", that
is, BeanShell, NQL, Simkin, and Slate;
- early October 2000: "Successful Scheme";
- late September 2000: "REBOL rolls forward";
- mid-September 2000: "Scripting Qt"
describes how Perl, Python, and Tcl connect to Qt. Also,
Piper is GNU's answer to .NET;
- September 2000: "Tk footnotes" waves hands in the
direction of what makes Tk programming special.
SWANK is among the names it drops;
- mid-August 2000: "Scripting with C"
compares the benefits of CINT, EiC, ElastiC, ICI, LPC, and Pike;
- August 2000: ".NET is real"
presents Dick Hardt's observations that lead to the
conclusion Microsoft's .NET initiative has real content;
- mid-July 2000: "Option
database options", sometimes titled, "Options for the Tk option
database", the third installment of the series
with Allen Flick. We also enumerate several of the virtues
of M&M's
Effective Tcl/Tk Programming book;
- early July 2000: "Successful
evaluations", on scripting's code-data duality and the
use of
eval (however spelled). Digita Script
makes a brief appearance;
- mid-June 2000: "Making
the most of the option database", on platform
considerations of option database use;
- June 2000: "Individualize
your apps", on the Tk "option database", with guest
columnist Allen Flick;
- mid-May 2000: "Phil
Thompson puts Python and Qt together", a profile of
PyQt's inventor and maintainer;
- May 2000: "Unicode
common to latest scripting language versions",
how scripting languages count (versions), and Unicode;
- mid-April 2000: "The good, the bad, and the beautiful world of scripting", on scripted documents;
- April 2000: "The
ongoing proliferation of scripting languages":
brief remarks on Avail, Elegant, e-speak, GODL, Pnuts, and Sash;
- mid-March 2000, "Reading
Microsoft: Lessons in Redundancy": is SIS a bad joke?
Also, a warm-up to scripted documents;
- March 2000, "Scripting
Languages in the Marketplace": reports
from the PHP, Python and (mostly) Tcl Conferences;
- mid-February 2000, "Getting Control of Push" and
- February 2000, "Pushing the Web" present "push"
techniques in a unified way I've seen nowhere else
- January 2000, "Can
your favorite scripting language take it to the next level?":
comparisons of languages' (Perl, Python, Tcl, REBOL)
capacities for programming "in the large";
- mid-December 1999, "Python
reaches for stardom": Python news, including Job Board,
Python Consortium, and one of the first technically-grounded
profiles of e-speak. Also, Tk for all languages,
XRexx, Tcl pre-processing, and monoids. Missing item:
more Tk pointers;
- December 1999, "Programming
in the comfort zone": Lingo and why Bruce Epstein's books
are special;
- mid-November 1999, "Pike
delivers peak performance": high-performance Roxen server
platform, interpreted C, Pike;
- November 1999, "Programming
events": language-neutral tutorial on event-based multitasking;
- mid-October 1999, "Let
the REBOLlion Begin", on the REBOL language. There's
also brief mention of XML-RPC, scripting for Apache,
and several indications that
The Enterprise is taking scripting a bit more seriously;
- October 1999, "CVS in
the scripting landscape", recalls a time when
CVS
still seemed like a new idea.
- August 1999, "Cinderella languages":
Tom Poindexter tells about the early days of using
Perl, Tcl and other languages with DBMSs including
Sybase and Oracle. Also, a brief profile of CPU's
IntelliPlant;
- mid-July 1999, "It's
a good time to be a polyglot": getting different
scripting languages to play together nicely.
This installment
mentions such technologies for cross-language development as
SWIG, Lua, XML-RPC, message-passing, TclKit, REBOL, and many
more;
- July 1999, "Serious programming doesn't have to be difficult";
- mid-June 1999, "PHP
and JavaScript make easy work of hard problems",
reactions to the Eighth World Wide Web Conference;
- June 1999, "Unraveling threads"
discusses advances Perl and Tcl are making in supporting
threads, and compares them to Python, Rexx, and Lua;
- [was May 1999 about MetaKit?]
- mid-April 1999, "Lightweight
persistence" profiles Aaron Waters' Gadfly;
- April 1999, "Scripting
with C": Canon's OpenPage is one of several industrial
products which employs Tim Long's ICI. Also, an explanation
of our aims for "Regular Expressions";
- mid-February 1999, "Scripting's
challenges"
concludes ...;
- February 1999, "New
choices for scripting"
introduces Ficl, FIJI, REBOL, Ruby, and WebL;
- mid-December 1998, "Why Eiffel?";
- December 1998, "Batteries
Included", on the previous month's
Python Conference, significantly propagated the meme that
constitutes its title;
- November 1998, "What's
going on with Guile?" presents GNU's Scheme dialect. Also,
one paragraph on the New South Wales Wholesale State Electricity
Market (SEM);
- mid-October 1998, "Catching up with JavaScript and Python"
- October 1998, "The
safety of scripting", on whether scripting languages are safe
for serious use;
- September 1998, "Plenty of headroom left for Perl";
- mid-August 1998, "Report from Pythonia" was published just under five weeks after release of 1.0 of JPython;
- August 1998, "Breakthrough year for scripting" focuses on
Lua, Perl, WSH, and Tcl;
- [over two years of other columns still to index]
Cameron
Laird's personal notes on "Regular
Expressions"/claird@phaseit.net