pivoting in Python « tommy's scratchpad

pivoting in Python

29 Aug, 2008 open source, technology, website

I needed to do some pre-processing of some data which involved transposing column names to values. The condition was that the value for each respective column (frequency count) had to be > 1.

My input was a csv file, and my goal was an output csv file which would feed into a batch database import process.

ID,DA,NL,PHENOM1,PHENOM2,PHENOM3,PHENOM4
233,99,44,0.00,27.00,12.00,0.00

The other interesting bit was that only a range of columns applied to the condition; the other columns represented ancillary data.

Enter Python:

#!/usr/bin/python

import sys
import csv

# open file and read headers
fPhenomenon = open("phenomenon.txt","r")
sHeaders    = fPhenomenon.readline().replace(r'"','')
aHeaders    = sHeaders.split(",")

# feed the rest to csv
csvIn  = csv.reader(fPhenomenon)
csvOut = csv.writer(sys.stdout)

for sRowIn in csvIn:
    aRowOut = []
    aPhenomenon = []
    aRowOut.append(sRowIn[0]) # procedure ID
    aRowOut.append(sRowIn[1]) # major drainage area ID
    for nIndexTupleVal, tupleVal in enumerate(sRowIn[3:-1]):
        if (float(tupleVal) > 0): # phenomenon measured at least once
            # add phenomenon name to list
            aPhenomenon.append(aHeaders[nIndexTupleVal+3])
        # add phenomenon list to record
        aRowOut.append(",".join(aPhenomenon))
    csvOut.writerow(aRowOut)

Notes

hooray for raw strings!
enumerate() is great and saves you the trouble of declaring your own counter
like any language, modules/libraries makes things so easy to work with
I wish the header stuff was a bit cleaner (I should look further into the csv module w.r.t. headers

That’s my hack for the day. Have a good weekend!

UPDATE: ah, the csv module has a .next() method, which can be used instead of the shoemaker attempt I made above to regularize / split / store the header list.