5 Reading CSV files and transforming data

The basic function of PyChart is to plot sample data in a variety of ways. Sample data are simply a sequence of sequences, where the term "sequence" is a Python jargon for either a tuple (comma-separated numbers or strings enclosed in parenthesis, e.g., (5, 10, 15)) or a list (comma-separated numbers or strings enclosed in square brackets, e.g., [5, 10, 15]). Data are given to plots through the "data" attribute of a plot object:

l = line_plot.T(data=[(10,20), (11,38), (12,29)], xcol=0, ycol=1)

In the above example, three sample points will be drawn along with line segments that connect them: (10, 20) - (11, 38) - (12, 29). Attribute xcol tells the locations of X values within data (the first column of each sample in data), and ycol similarly tell the locations of Y values (the last column of each sample in data). A sample point can contain None, in which case it is ignored.

data = [(10, 20, 21), (11, 38, 22), (13, None, 15), (12, 29, 30)]
l1 = line_plot.T(data=data, xcol=0, ycol=1)
l2 = line_plot.T(data=data, xcol=0, ycol=2)

The above example is equivalent to:

l1 = line_plot.T(data=[(10, 20), (11, 38), (12, 29)], xcol=0, ycol=1)
l2 = line_plot.T(data=[(10, 21), (11, 22), (13, 15), (12, 30)], xcol=0, ycol=1)

Module chart_data provides several functions for generating, reading, or transforming samples.

read_csv( path, delim = ',')
This function reads comma-separated values from a file. Parameter path is either a pathname or a file-like object that supports the readline() method.

Empty lines and lines beginning with "#" are ignored. Parameter delim specifies how a line is separated into values. If it does not contain the letter "%", then delim marks the end of a value. Otherwise, this function acts like scanf in C:

chart_data.read_csv('file', '%d,%s:%d')
Paramter delim currently supports only three conversion format specifiers: "d"(int), "f"(double), and "s"(string).

read_str( delim, lines)
This function is similar to read_csv, but it reads data from the list of lines.

fd = open("foo", "r")
data = chart_data.read_str(",", fd.readlines())

write_csv( path, data)
This function writes comma-separated data to path. Parameter path is either a pathname or a file-like object that supports the write() method.

func( f, from, to, step)
Create sample points from function f, which must be a single-parameter function that returns a number (e.g., math.sin). Parameters xmin and xmax specify the first and last X values, and step specifies the sampling interval.

>>> chart_data.func(math.sin, 0, math.pi * 4, math.pi / 2)
[(0, 0.0), (1.5707963267948966, 1.0), (3.1415926535897931, 1.2246063538223773e-16), (4.7123889803846897, -1.0), (6.2831853071795862, -2.4492127076447545e-16), (7.8539816339744828, 1.0), (9.4247779607693793, 3.6738190614671318e-16), (10.995574287564276, -1.0)]

filter( f, data)
Parameter func must be a single-argument function that takes a sequence (i.e., a sample point) and returns a boolean. This procedure calls func on each element in data and returns a list comprising elements for which func returns True.

>>> data = [[1,5], [2,10], [3,13], [4,16]]
... chart_data.filter(lambda x: x[1] % 2 == 0, data)
[[2,10], [4,16]].

extract_rows( data, rows...)
Extract rows specified in the argument list.

>>> chart_data.extract_rows([[10,20], [30,40], [50,60]], 1, 2)
[[30,40],[50,60]]

extract_columns( data, cols...)
Extract columns specified in the argument list.

>>> chart_data.extract_columns([[10,20], [30,40], [50,60]], 0)
[[10],[30],[50]]

moving_average( data, xcol, ycol, width)
Compute the moving average of YCOL'th column of each sample point in DATA. In particular, for each element I in DATA, this function extracts up to WIDTH*2+1 elements, consisting of I itself, WIDTH elements before I, and WIDTH elements after I. It then computes the mean of the YCOL'th column of these elements, and it composes a two-element sample consisting of XCOL'th element and the mean.

>>> data = [[10,20], [20,30], [30,50], [40,70], [50,5]]
... chart_data.moving_average(data, 0, 1, 1)
[(10, 25.0), (20, 33.333333333333336), (30, 50.0), (40, 41.666666666666664), (50, 37.5)]
The above value actually represents:

[(10, (20+30)/2), (20, (20+30+50)/3), (30, (30+50+70)/3), 
  (40, (50+70+5)/3), (50, (70+5)/2)]

median( data, freq_col=1)
Compute the median of the freq_col'th column of the values is data.

>>> chart_data.median([(10,20), (20,4), (30,5)], 0)
20
>>> chart_data.median([(10,20), (20,4), (30,5)], 1)
5.

mean_samples( data, xcol, ycollist)
Create a sample list that contains the mean of the original list.

>>> chart_data.mean_samples([ [1, 10, 15], [2, 5, 10], [3, 8, 33] ], 0, (1, 2))
[(1, 12.5), (2, 7.5), (3, 20.5)]

stddev_samples( data, xcol, ycollist, delta)
Create a sample list that contains the mean and standard deviation of the original list. Each element in the returned list contains following values: [MEAN, STDDEV, MEAN - STDDEV*delta, MEAN + STDDEV*delta].

>>> chart_data.stddev_samples([ [1, 10, 15, 12, 15], [2, 5, 10, 5, 10], [3, 32, 33, 35, 36], [4,16,66, 67, 68] ], 0, range(1,5))
[(1, 13.0, 2.1213203435596424, 10.878679656440358, 15.121320343559642), (2, 7.5, 2.5, 5.0, 10.0), (3, 34.0, 1.5811388300841898, 32.418861169915807, 35.581138830084193), (4, 54.25, 22.094965489902897, 32.155034510097103, 76.344965489902904)]

transform( func, data)
Apply func on each element in data and return the list consisting of the return values from func.

>>> data = [[10,20], [30,40], [50,60]]
... chart_data.transform(lambda x: [x[0], x[1]+1], data)
[[10, 21], [30, 41], [50, 61]]

One of the frequent uses of transform is to convert a date string to number and back to some other string for display. The next example does this: it takes the input for date in the format of "10/5/1983", and displays the graph in the format of "Oct 5, 1983".

../demos/date.py

import sys
import datetime
from pychart import *

def date_to_ordinal(s):
    month, day, year = map(int, s.split("/"))
    return datetime.date(year, month, day).toordinal()

def format_date(ordinal):
    d = datetime.date.fromordinal(int(ordinal))
    return "/a60{}" + d.strftime("%b %d, %y")

data = [["10/5/1983", 10], ["3/5/1984", 15],
        ["11/10/1984", 16], ["2/22/1985", 20]]
data = chart_data.transform(lambda x: [date_to_ordinal(x[0]), x[1]], data)

ar = area.T(x_coord = category_coord.T(data, 0),
            y_range = (0, None),
            x_axis = axis.X(label = "Date", format = format_date),
            y_axis = axis.Y(label = "Value"))
ar.add_plot(bar_plot.T(data = data))
ar.draw()

Image date