I’ve independently talked about [Python](http://www.python.org) with [Harry](http://fester.pandemonium.de) and [Lars](http://www.vernetzt.org/lars) last week. Conclusion: Python seems to be “the” script language to learn if you want something versatile and hate [Perl](http://www.perl.org).
Today I needed a small script: Find me all the 4-lettered family names from the US Census data. Usually I’d simply do that with [awk](http://www.gnu.org/software/gawk/manual/gawk.html). But what better opportunity to actually start using Python?
I have the [family name list](http://www.census.gov/genealogy/names/dist.all.last) in ASCII form. Nicely formatted in columns with whitespace separators. All I need is a small Python script that will loop over stdin and match as required. No huhu:
import sys
import re
while True:
line=sys.stdin.readline()
m = re.match(‘^[A-Z]{,4} ‘,line)
if m != None:
print m.string[m.start():m.end()]
if line == “”:
break
That wasn’t so difficult.