PyNGL Home > Functions > General applied math

Ngl.regline

Calculates the linear regression coefficient between two series.

Available in version 1.3.0 or later.

Prototype

rc,attrs = Ngl.regline(x, y, return_info=True)

Arguments

x, y

One-dimensional numpy or masked arrays of the same length. If either one if these is not a masked array, then a fill value of 1e20 will be used for that array.

return_info=True

An optional logical that indicates whether additional calculations should be returned as part of a list (default is True).

Return values

rc, attrs

A scalar is returned. Some additional values will be returned as a dictionary if return_info is set to True. See description below.

Description

Ngl.regline computes the information needed to construct a regression line: regression coefficient (trend, slope,...) and the average of the x and y values. Ngl.regline is designed to work with one-dimensional x and y arrays. Missing data are allowed.

Ngl.regline also returns the following values as a separate dictionary if return_info is True:

xave (scalar)
average of x
yave (scalar)
average of y
tval (scalar)
t-statistic (assuming null-hypothesis)
rstd (scalar)
standard error of the regression coefficient
yintercept (scalar)
y-intercept at x=0
nptxy (scalar, integer)
number of points used

Examples

Example 1

The following example was taken from:

    Brownlee
    Statistical Theory and Methodology
    J Wiley 1965   pgs: 342-346     QA276  .B77
The regression line information for the example below is: (a) rc=0.9746, (b) tval=38.7, (c) nptxy=18 which yields 16 degrees of freedom (df=nptxy-2). To test the null hypothesis (i.e., rc=0) at the two-tailed 95% level, we note that t(16) is 2.120 (table look-up: 0.975). Clearly, the calculated t-statistic greatly exceeds 2.120 so the null hypothesis is rejected at the 5% level.

Rather than a table lookup, the following could be used to calculate the actual significance level.

       alpha = Ngl.betainc(df/(df+attrs["tval"]^2), df/2.0, 0.5)
or, alternatively,
       prob = 1 - Ngl.betainc(df/(df+attrs["tval"]^2), df/2.0, 0.5)
Note that "Ngl.betainc" hasn't been implemented yet. The example series are:
x    = [ 1190.,1455.,1550.,1730.,1745.,1770., \
         1900.,1920.,1960.,2295.,2335.,2490., \
         2720.,2710.,2530.,2900.,2760.,3010. ]
      
y    = [ 1115.,1425.,1515.,1795.,1715.,1710., \
         1830.,1920.,1970.,2300.,2280.,2520., \
         2630.,2740.,2390.,2800.,2630.,2970. ]
      
rc,attrs   = Ngl.regline (x,y)
print rc
print attrs

# Note use of dictionary items
df   = attrs["nptxy"]-2
tval = attrs["tval"]
yint = attrs["yintercept"]
#prob = (1 - Ngl.betainc(df/(df+tval^2), df/2.0, 0.5) )

#yReg = rc*x + yint
#print "prob",prob
#print yReg
The first two print statements will yield:
0.974561429694
{'xave': 2165.0, 'rstd': 0.025154607619252603, 'yintercept': 15.352282489361187,
 'tval': 38.74285953673624, 'yave': 2125.2777777777778, 'nptxy': 18}

Note 1: The above assumes that all the points are independent. If this is not the case, then the number used to test for significance should be less.

Note 2: To construct 95% confidence limits for the hypothesis that the regression coefficient is one (i.e., rc=1) :