LinuxDevCenter.com

oreilly.comSafari Books Online.Conferences.

We've expanded our Linux news coverage and improved our search! Search for all things Linux across O'Reilly!

Search
Search Tips

advertisement

Listen Print Subscribe to Linux Subscribe to Newsletters

Unix Power Tools
Looking for Closure

by Tim O'Reilly
02/24/2000

A common problem in text processing is making sure that items that need to occur in pairs actually do so.

Most UNIX text editors include support for making sure that elements of C syntax such as parentheses and braces are closed properly. There's much less support for making sure that textual documents, such as troff source files, have the proper structure. For example, tables must start with a .TS macro, and end with .TE. HTML documents that start a list with <UL> need a closing </UL>.

UNIX provides a number of tools that might help you to tackle this problem. Here's a shell script written by Dale Dougherty that uses awk to make sure that .TS and .TE macros come in pairs:

#! /usr/local/bin/gawk -f
BEGIN {
    inTable = 0
    TSlineno = 0
    TElineno = 0
    prevFile = ""
}
# check for unclosed table in first file, when more than one file
FILENAME != prevFile {
    if (inTable)
     printf ("%s: found .TS at File %s: %d without .TE before end of file\n",
            $0, prevFile, TSlineno)
    inTable = 0
    prevFile = FILENAME
}
# match TS and see if we are in Table
/^/.TS/ {
    if (inTable) {
        printf("%s: nested starts, File %s: line %d and %d\n",
            $0, FILENAME, TSlineno, FNR)
        }
    inTable = 1
    TSlineno = FNR
}
/^/.TE/ {
    if (! inTable)
        printf("%s: too many ends, File %s: line %d and %d\n", 
            $0, FILENAME, TElineno, FNR)
    else
        inTable = 0
    TElineno = FNR
}
# this catches end of input
END {
    if (inTable)
        printf ("found .TS at File %s: %d without .TE before end of file\n",
            FILENAME, TSlineno)
}

You can adapt this type of script for any place you need to check for something that has a start and finish.

A more complete syntax checking program could be written with the help of a lexical analyzer like lex. lex is normally used by experienced C programmers, but it can be used profitably by someone who has mastered awk and is just beginning with C, since it combines an awk-like pattern-matching process using regular expression syntax, with actions written in the more powerful and flexible C language. (See O'Reilly & Associates' lex & yacc.)

And of course, this kind of problem could be very easily tackled in perl.


Back More Unix Power Tools

 




Tagged Articles

Be the first to post this article to del.icio.us

Sponsored Resources

  • Inside Lightroom
Advertisement

Sponsored by:

O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
O'Reilly FYI
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com