"…mais ce serait peut-être l'une des plus grandes opportunités manquées de notre époque si le logiciel libre ne libérait rien d'autre que du code…"

Generators

Référence: David Beazley, http://www.dabeaz.com/generators/, http://www.dabeaz.com/generators-uk/

see http://www.python.org/dev/peps/pep-0008/

# gengrep.py
#
# Grep a sequence of lines that match a re pattern

import re
def gen_grep(pat,lines):
     patc = re.compile(pat)
     for line in lines:
     if patc.search(line):
         yield line

# Example use

if __name__ == '__main__':
     from genfind import  gen_find
     from genopen import  gen_open
     from gencat  import  gen_cat
     lognames = gen_find("access-log*","www")
     logfiles = gen_open(lognames)
     loglines = gen_cat(logfiles)

 # Look for ply downloads (PLY is my own Python package)
    plylines = gen_grep(r'ply-.*\.gz',loglines)
    for line in plylines:
        print line,

# gencat.py
#
# Concatenate multiple generators into a single sequence

def gen_cat(sources):
    for s in sources:
       for item in s:
           yield item

# Example use

if __name__ == '__main__':
    from genfind import  gen_find
    from genopen import  gen_open

    lognames = gen_find("access-log*","www")
    logfiles = gen_open(lognames)
    loglines = gen_cat(logfiles)
    for line in loglines:
        print line

# genopen.py
#
# Takes a sequence of filenames as input and yields a sequence of file
# objects that have been suitably open

import gzip, bz2

def gen_open(filenames):
     for name in filenames:
         if name.endswith(".gz"):
             yield gzip.open(name)
         elif name.endswith(".bz2"):
             yield bz2.BZ2File(name)
         else:
             yield open(name)

# Example use

if __name__ == '__main__':
    from genfind import  gen_find
    lognames = gen_find("access-log*","www")
    logfiles = gen_open(lognames)
    for f in logfiles:
        print f

# genfind.py
#
# A function that generates files that match a given filename pattern

import os
import fnmatch

def gen_find(filepat,top):
    for path, dirlist, filelist in os.walk(top):
    for name in fnmatch.filter(filelist,filepat):
        yield os.path.join(path,name)

# Example use

if __name__ == '__main__':
    lognames = gen_find("access-log*","www")
    for name in lognames:
        print name

# bytesgen.py
#
# An example of chaining together different generators into a processing
# pipeline.    

from genfind import *
from genopen import *
from gencat import *
from gengrep import *

pat    = r'ply-.*\.gz'
logdir = 'www'

filenames = gen_find("access-log*",logdir)
logfiles  = gen_open(filenames)
loglines  = gen_cat(logfiles)
patlines  = gen_grep(pat,loglines)
bytecol   = (line.rsplit(None,1)[1] for line in patlines)
bytes     = (int(x) for x in bytecol if x != '-')

print "Total", sum(bytes)

David M. Beazley
http://www.dabeaz.com

Presented at PyCon UK 2008, September 12, 2008.

Introduction

This tutorial discusses various techniques for using generator functions and generator expressions in the context of systems programming. This topic loosely includes files, file systems, text parsing, network programming, and programming with threads.

Support Data Files

The following file contains some supporting data files that are used by the various code samples. Download this to your machine to work the examples that follow.

This download also includes all of the code samples that follow below.

Code Samples

Here are various code samples that are used in the course. You can cut and paste these to your own machine to try them out. The order in which these are listed follow the course outline. These examples are written to run inside the « generators » directory that gets created when you unzip the above file containing the support data.

Part 2 : Processing Data Files

  • nongenlog.py. Calculate the number of bytes transferred in an Apache server log using a simple for-loop. Does not use generators.
  • genlog.py. Calculate the number of bytes transferred in an Apache server log using a series of generator expressions.
  • makebig.py. Make a large access-log file for performance testing. This will create a file « big-access-log ». For the numbers used in the presentation, I used python makebig.py 2000.

Part 3 : Fun with Files and Directories

  • genfind.py. A generator function that yields filenames matching a given filename pattern.
  • genopen.py. A generator function that yields open file objects from a sequence of filenames.
  • gencat.py. A generator function that concatenates a sequence of generators into a single sequence.
  • gengrep.py. A generator that greps a series of lines for those that match a regex pattern.
  • bytesgen.py. Example that finds out how many bytes were transferred for a specific file in a whole directory of log files.

Part 4 : Parsing and Processing Data

  • retuple.py. Parse a sequence of lines into a sequence of tuples using regular expressions.
  • redict.py. Parse a sequence of lines into a sequence of dictionaries with named fields.
  • fieldmap.py. Remap fields in a sequence of dictionaries.
  • linesdir.py. Generate lines from files in a directory.
  • apachelog.py. Parse an Apache log file.
  • query404.py. Find the set of all documents that are broken (404).
  • largefiles.py. Find all requests that transferred over a megabyte.
  • largest.py. Find the largest document.
  • hosts.py. Find unique host IP addresses.
  • downloads.py. Find number of downloads of a specific file.
  • robots.py. Find out who has been hitting robots.txt.
  • robotsfast.py. Find out who has been hitting robots.txt (faster version).

Part 5 : Processing Infinite Data

  • follow.py. Follow a log-file in real-time like tail -f in Unix. To run this program, you need to have a log-file to work with. Run the program runservers.py to start a simulated web-server. This will write a series of log lines for you to follow.
  • realtime404.py. Print all 404 requests as they happen in real-time on a log file.

Part 6 : Feeding the Pipeline

Part 7 : Extending the pipeline

  • genpickle.py. Turn sequences of objects into a sequence of pickles.
  • sendto.py. Send a sequence of items to a remote machine via a socket. Uses genpickle above.
  • receivefrom.py. Receive a sequence of items from a socket. Uses genpickle above.
  • genqueue.py. Consume items on a queue.

Part 8 : Advanced Data Routing

  • genmultiplex.py. Multiplex many different generators into a single real-time stream using threads.
  • broadcast.py. Broadcast a sequence of items to a collection of consumers.
  • netsend.py. Send items to another host on the network. Requires a receiver (use receivefrom.py above).
  • consthread.py. Broadcasting a generator to a collection of consumer threads.

Part 9 : Various Programming Tricks (And Debugging)

  • gentrace.py. Example of debugging a generator component.
  • storelast.py. Store the last value of a generator (for access later in the processing pipeline)
  • genshutdown.py. Simple example of shutting down a generator.
  • shutdownevt.py. Shutting down a generator with an event.

Part 10 : Parsing and Printing

Part 11 : Co-routines

Laisser un commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l'aide de votre compte WordPress.com. Déconnexion / Changer )

Image Twitter

Vous commentez à l'aide de votre compte Twitter. Déconnexion / Changer )

Photo Facebook

Vous commentez à l'aide de votre compte Facebook. Déconnexion / Changer )

Photo Google+

Vous commentez à l'aide de votre compte Google+. Déconnexion / Changer )

Connexion à %s

 
%d blogueurs aiment cette page :