pivoting in Python

I needed to do some pre-processing of some data which involved transposing column names to values. The condition was that the value for each respective column (frequency count) had to be > 1.

My input was a csv file, and my goal was an output csv file which would feed into a batch database import process.

ID,DA,NL,PHENOM1,PHENOM2,PHENOM3,PHENOM4
233,99,44,0.00,27.00,12.00,0.00

The other interesting bit was that only a range of columns applied to the condition; the other columns represented ancillary data.

Enter Python:

#!/usr/bin/python

import sys
import csv

# open file and read headers
fPhenomenon = open("phenomenon.txt","r")
sHeaders    = fPhenomenon.readline().replace(r'"','')
aHeaders    = sHeaders.split(",")

# feed the rest to csv
csvIn  = csv.reader(fPhenomenon)
csvOut = csv.writer(sys.stdout)

for sRowIn in csvIn:
    aRowOut = []
    aPhenomenon = []
    aRowOut.append(sRowIn[0]) # procedure ID
    aRowOut.append(sRowIn[1]) # major drainage area ID
    for nIndexTupleVal, tupleVal in enumerate(sRowIn[3:-1]):
        if (float(tupleVal) > 0): # phenomenon measured at least once
            # add phenomenon name to list
            aPhenomenon.append(aHeaders[nIndexTupleVal+3])
        # add phenomenon list to record
        aRowOut.append(",".join(aPhenomenon))
    csvOut.writerow(aRowOut)

Notes

  • hooray for raw strings!
  • enumerate() is great and saves you the trouble of declaring your own counter
  • like any language, modules/libraries makes things so easy to work with
  • I wish the header stuff was a bit cleaner (I should look further into the csv module w.r.t. headers

That’s my hack for the day. Have a good weekend!

UPDATE: ah, the csv module has a .next() method, which can be used instead of the shoemaker attempt I made above to regularize / split / store the header list.

Leave a Comment

Name: (Required)

E-mail: (Required)

Website:

Comment:

Modified: 13 October 2008 12:13:08 EST