CS 110X Feb 13 2014
Expected Reading: 368-377
Expected Interactions: Dictionaries
No Clicker Assessment Today
TA Thursday night office hours [6-9] moved to Friday [2-5]
A dictionary is the only place that success comes before work. Hard work is the price we must pay for success. I think you can accomplish anything if you’re willing to pay the price.
Vince Lombardi
1 Dictionaries
1.1 But first, the += operation
You have written many programs that need to modify a variable by adding a value to its current value. Something like:
>>> totalInClass = 8 >>> totalInClass = totalInClass + 5 >>> totalInClass 13
Because this is such a common idiom in programming, many languages offer a special operator to make these statements easier to write. For addition, Python has an Addition Assignment operator that would make the above code simpler to write as:
>>> totalInClass = 8 >>> totalInClass += 5 >>> totalInClass 13
Instead of having to repeat the name of the variable, using += allows you to simply declare that "the value ot totalInClass" is incremented by 5."
For each of the basic mathematical operators there are equivalent assignment operators. See if you can follow the logic below to achieve the same result.
>>> sample = 5 >>> sample += 3 >>> sample *= 4 >>> sample -= 8 >>> sample /= 6 >>> sample 4
1.2 Dictionary Type
We are now ready to introduce the final value type for this course. You have seen the basic primitive types of int, float and bool which represent individual values. You have also seen str which represents an immutable sequence of characters.
The list structure in Python is an aggregate that allows you to store a collection of values that can be individual referenced by their index location. Thus for the list x = [5, 9, 10] you can retrieve the middle value using x[1] and you can set the final value with a statement like x[2] = 99.
While describing lists no one bothered to mention that the use of an index position is artificial. For example, you could represent a phone number as a list of three string literals, phone = [’508’, ’831’, ’5000’], and you would have to remember that the area code is stored in phone[0], the city prefix is phone[1] and the line number is phone[2].
Wouldn’t it be easier to simply say phone[’areaCode’]=’508’? In this context, the key value is the string literal ’areaCode’ and the value associated with this key is the string literal ’508’.
Using keys means you don’t have to worry about indexing. Also, this enables you to ignore forever the issue about properly putting values in the list in the right index locations.
In Python, the value used is called a dictionary and you create one using the following notation:
>>> collection = {}
Note the use of the curly braces to signal that an empty dictionary has been created. Once created, you can assign values to the dictionary using the bracket notation dictionary[key] = value.
The above phone number example would look like the following:
>>> phoneNumber = {} >>> phoneNumber[’areaCode’]=’508’ >>> phoneNumber[’prefix’]=’831’ >>> phoneNumber[’line’]=’5000’ >>> phoneNumber {’areaCode’: ’508’, ’line’: ’5000’, ’prefix’: ’831’}
The above dictionary contains three (key, value) pairs. Note that order doesn’t matter. The keys were added in the order of areaCode, prefix and then line and the visual representation of the dictionary shows the ordering as areaCode, line, prefix.
All that matters is that there is a unique mapping from a given key to a given value. You can replace the value associated with a key by simply issuing another assignment operation:
>>> phoneNumber[’line’]=’5888’ >>> phoneNumber {’areaCode’: ’508’, ’line’: ’5888’, ’prefix’: ’831’}
The keys and values can be any Python type. Let’s say you were given a list containing only boolean values True and False. Here is a function that returns their respective counts using a dictionary:
def countBooleanValues(values): """Returns a dictionary with count of True and False values""" counts = {} counts[False] = 0 counts[True] = 0 for val in values: counts[val] += 1 return counts
>>> countBooleanValues([True, False, True, True, False, False]) {False: 3, True: 3}
1.3 Use of csv module
You are considering investing in real estate in Sacramento, California and you want to make sure you don’t make any risky purchases. You decide to correlate crime statistics with some existing real estate transactions to evaluate properties.
The following CSV files were retrieved from an Internet search. Increasingly companies are turning to such "Big Data" solutions to identify trends, risks and opportunities.
The Real Estate transaction file has 985 entries of the following column data:
street,city,zip,state,beds,baths,sq__ft,type,sale_date,price,latitude,longitude 3526 HIGH ST,SACRAMENTO,95838,CA,2,1,836,Residential,May 21 2008,59222,38.631913,-121.434879
And the Crime Log file has 7,584 entries of the following column data:
cdatetime,address,district,beat,grid,crimedescr,ucr_ncic_code,latitude,longitude 1/1/2006 0:00,3108 OCCIDENTAL DR,3,3C,1115,10851(A)VC TAKE VEH W/O OWNER,2404,38.55042047,-121.3914158
Since both log files have (longitude, latitude) coordinates, it should be possible to compute a correlation. However, since humans process information graphically much more efficiently, you also want to create a 2-dimensional plot that visualizes the information.
The above can provide useful information but it can also be overwhleming. To make sense of this information you need two basic operations.
filter – extract a subset of the rows in a data set
extract – extract a subset of the columns in a data set
Homework 5 gives you the opportunity to solve problems that depend on using CSV data effectively. Over the next three lectures, I will be presenting a single program that enables users to produce plots such as found above. Along the way I will solve (hopefully) more challenging problems than you will see on the homework assignment, and thereby better prepare you for the homework and the ensuing Exam 2 next week.
1.4 Lists of Lists
In Lab4, which you have just completed, you gained experience in processing a CSV file. No doubt there were hiccups along the way, so I encourage you to review your code in comparison with the solution which will be posted tonight.
The structure of lab4 showed how to investigate the CSV file programmatically. Observe that the extractColumn function was the workhorse of the assignment. Using this function, you were able to extract a single column from the CSV file. However, if you reflect on the final output of the lab4 quesetion, what you really wanted was more than one column of information, in particular, the output of the program showed the player’s name as well as his Home Run Totals (HR) and Batting Average (BA). While the lab succeeded in its limited need, you need to find a way to write an extractPlayers(HR_threshold, BA_threshold) function, which does not print to the screen but rather returns information.
To make this happen, you need to become aware of the ability of Python to store "Lists of Lists." The concept is clear, although the syntax of Python will make this a bit murky.
What if you wanted to keep a list of two points in Cartesian coordinates, namely (3, 7) and (1,5). I guess you could create a single list of four values, viz., [3, 7, 1, 5] and somehow remember that the first two index values represent the x- and y-coordinates for the first point while the third and fourth values represent the x- and y-coordinates for the second point. However, this will be rather awkward to properly remember and maintain. Rather, you would want to group the (x,y) values together so they don’t get lost.
>>> p1 = [3, 7] >>> p2 = [1, 5] >>> points = [p1, p2] >>> points [[3, 7], [1, 5]]
Don’e make the mistake that [p1,p2] somehow concatenates these two lists together. Rather, you have created a "List of Lists". Since the resulting value is a list, you can process individual elements as you have already done in this course.
>>> points[0] [3,7] >>> for pt in points: print (pt) [3, 7] [1, 5]
This small example demonstrates it is possible to store structured data of arbitrary complexity.
To return to the BaseBall example from the Lab, lab4 prints information that includes the player name and his HR and BA values. Instead of printing this out, the extractPlayers function returns a list of lists.
Let’s get back to extracting the data from the CSV files. Python provides a helpful module to make this an easy problem to solve.
import csv def extractAllRecords(fileName): """ Extract all CSV records and return as single LIST-of-RECORDS Note that the first element in this list contains the description of the columns as defined in the CSV file """ file = open (fileName, ’r’) results = [] reader = csv.reader(file) for row in reader: results.append(row) file.close() return (results)
Observe how simple this method is, and it is made possible by the csv module.
So let’s get back to plotting the Sacramento data set. After displaying the above image, I thought to go back and make sure that the dots are being properly placed on the map. How to do this? I wrote the following function:
# Plot 2d scatter plot of real estate vs. crimes def plotJustOneForAccuracy(num): """ Plot just a single data point to make sure scale is accurate. The specific point is determined by the {num} parameter. This function prints out the address so you can validate on your own. """ realEstate = extractAllRecords(salesFileName) sale = realEstate[num] salePointsLat = [float(sale[10])] salePointsLong = [float(sale[11])] print (sale[:2]) pylabBackground() pylab.scatter(salePointsLong, salePointsLat) pylab.show()
Let’s demonstrate in action. I’ll run this with 150 as the value:
[’6507 RIO DE ONAR WAY’, ’ELK GROVE’]
Review this address in Google to see how close the approximation is.
1.5 Debug Challenge
The following function tries to complete an earlier assessment, namely, to should take in a list of values and modify that list in place such that every negative value is replaced with its absolute value.
Press To Reveal
Defects
1.6 Skills
CS-10. Understand nesting of for and while loops
SM-9. Know how to slice a string substring
DT-11. Understand the Dictionary type
1.7 Self Assessment
I would like to change some of the in-class dynamics and provide targeted guidance to students trying to learn specific skills.
For tomorrow, I would like everyone to attempt the following problem and we will begin class by reviewing different solutions that students have come up with.
Given a list of numbers and a target value, return an element such
that no other element in the list is closer to the target.
1.8 Version : 2014/02/14
(c) 2014, George Heineman