Python File Slicing

From Sfvlug

pyFileSlice is a simple utility that will chop out a section of a file that has common starting and ending tags. Pulling out the page referrers section in an Awstats data file for further analysis prompted this little bit of research. After trying 3 methods (one involving regex pattern checking over each element in a list, one involving startswith(), and one that uses startswith() and doesn't read the file all at once) the one presented here works the fastest with the least amount of memory used. Source Code

#!/usr/bin/env python
# Simple tool to spit out referrer information from an awstats database
# for later searching an analysis.  A good example of file slicing!

__author__ = "Nick Guy & Brian Guy"
__license__ = "GPL"

import sys, string;

# lolz, no argc it seems.  :P
argc = len(sys.argv)

if argc > 2 :
	print sys.argv[0] + " [filename]"
	print "[filename] is optional, leave out to use stdin"

# variables instantiated here to keep them in file scope.
awsdata = []
infile = False

if argc == 2:
		infile = open( sys.argv[1], 'r' )
	except IOError:
		print "Can't open " + sys.argv[1] + " for reading."

if argc == 1:
	infile = sys.stdin

# fastest method.  Note that the strings inside startswith() are
# the start and end block tokens we need.  Note also that the strings
# used to delimit the block we want are NOT included in the final output.
while not infile.readline().startswith("BEGIN_PAGEREFS"):

# This is a syntactic hack to implement do/while loops.
while not line.startswith("END_PAGEREFS"):
	line=infile.readline()[:-1]	# remove trailing \

, similar to chomp in perl.


# send data to stdout.
for line in awsdata:
	print line
Personal tools