project 4 specs

40
Project 4 Orakle Time due: 9:00 PM Thursday, March 13 Introduction ......................................................................................................................... 2 Anatomy of a Database ....................................................................................................... 2 What Do You Need to Do? ................................................................................................. 5 What Will We Provide? ...................................................................................................... 5 The Tokenizer Class ....................................................................................................... 6 The HTTP Class (aka But I don't know how to use C++ to access the Internet!) .......... 7 Details: The Classes You Must Write................................................................................. 8 MultiMap and MultiMap::Iterator .................................................................................. 8 The MultiMap Class ................................................................................................. 10 The MultiMap::Iterator Class ................................................................................... 13 MultiMap Implementation Notes.............................................................................. 15 Database ........................................................................................................................ 18 The Data Structures Used in a Database ................................................................... 19 Our Testing Framework .................................................................................................... 34 The file Command and the url Command..................................................................... 35 The schema Command.................................................................................................. 35 The add Command ........................................................................................................ 36 Issuing a Query: the qparam, sparam and execute commands ..................................... 36 Requirements and Other Thoughts ................................................................................... 38 What to Turn In................................................................................................................. 39 Grading ............................................................................................................................. 40

Upload: lantea1

Post on 24-Nov-2015

40 views

Category:

Documents


5 download

DESCRIPTION

CS 32 PROJECT 4 UCLA WINTER QUARTER THE HARD QUARTER

TRANSCRIPT

  • Project 4 Orakle

    Time due: 9:00 PM Thursday, March 13

    Introduction......................................................................................................................... 2 Anatomy of a Database....................................................................................................... 2 What Do You Need to Do?................................................................................................. 5 What Will We Provide? ...................................................................................................... 5

    The Tokenizer Class ....................................................................................................... 6 The HTTP Class (aka But I don't know how to use C++ to access the Internet!) .......... 7

    Details: The Classes You Must Write................................................................................. 8 MultiMap and MultiMap::Iterator .................................................................................. 8

    The MultiMap Class ................................................................................................. 10 The MultiMap::Iterator Class ................................................................................... 13 MultiMap Implementation Notes.............................................................................. 15

    Database........................................................................................................................ 18 The Data Structures Used in a Database................................................................... 19

    Our Testing Framework.................................................................................................... 34 The file Command and the url Command..................................................................... 35 The schema Command.................................................................................................. 35 The add Command........................................................................................................ 36 Issuing a Query: the qparam, sparam and execute commands ..................................... 36

    Requirements and Other Thoughts ................................................................................... 38 What to Turn In................................................................................................................. 39 Grading ............................................................................................................................. 40

  • 2

    Introduction The NachenSmall Software Corporation has decided to get into the database business and create their own database offering to compete against Oracle and Microsoft. Given that the NachenSmall leadership team consists entirely of UCLA alums, theyve decided to offer the job to build this new database to the students of CS32. Lucky you! So, in your last project for CS32 Winter 2014, your goal is to build a simple set of C++ classes that can be used to store and search through (aka query) large amounts of data. If youre able to prove to NachenSmalls reclusive and bizarre CEO, Carey Nachenberg, that you have the programming skills to build the simple database described in this specification, hell hire you to build the complete project, and youll be rich and famous.

    Anatomy of a Database A database is a piece of software that stores one or more data records. Each data record contains all of the known information about a single entity (e.g., a student, a customer, etc.). The database lets a user search through its many records efficiently in order to find records that match specific criteria (e.g., show me all students with a last name of Smith with a GPA of 3.0 or higher). Other software applications (e.g., a website like my.ucla.edu) typically use such a database to hold their data. So what is a data record? A data record is a group of related fields about a single entity. For example, if we want our database to store student data records, each student data record might contain the following fields: a first name field, a last name field, a student ID number field, a phone number field, and a GPA field. Here are some examples of student data records: Record #1: Carey,Nachenberg,102030405,310-825-4321,3.71 Record #2: David,Smallberg,304454123,818-666-2323,1.25 Record #3: David,Copperfield,987654321,424-750-7519,3.99 As you can see, each data record (also called a row) holds a first name field, a last name field, a student ID number field, a phone number field, and a GPA field. So we can say that our database holds multiple rows of data, and each row is composed of the same five fields (first name, last name, student ID, phone number, GPA). When you create a new database from scratch, you must specify a schema that describes what types of fields you will be storing in each data record of the database. For example, if you wanted to create a database that holds student data records, you would provide the following schema to the database:

  • 3

    firstName: not indexed lastName: indexed studentID: not indexed phoneNum: not indexed GPA: indexed The schema above states that were going to store five fields of information in each data record/row of the database: a first name, last name, student ID, phone number and GPA. The schema also describes whether each field needs to be indexed or not. If a field is indexed, it means that the database enables the user to efficiently search for records based on the contents of that field across all of the data records. So, given the schema above which specifies that the lastName and GPA fields must be indexed, the database would let the user efficiently search for all users with a lastName of Ziggy. However, the database would not let the user search for students by their phoneNumber, since this field was not designated as indexed. Once you have specified a schema for a database, you may then add one or more data records to the database that match that schema. So, for example, after specifying the schema above, we could insert the following rows into our database. Row #1: Corey,Wang,100200300,818-555-1212,3.62 Row #2: David,Smallberg,304454123,310-666-2323,1.25 Row #3: David,Copperfield,987654321,424-750-7519,1.46 Row #4: Jill,Bachelor,3453453356,626-999-1111,3.30 Row #5: Cindy,Wang,005393222,310-555-4545,2.00 Row #6: Buford,Wang,656999332,909-678-4567,1.80 Row #7: Rick,Ronzoni,676767545,310-666-2323,1.75 Row #8: Abel,Salfo,404932223,202-342-2342,1.25 Row #9: Joe,Smith,000000001,452-332-9492,1.99 Row #10:Bill,Smith,003004005,818-885-6735,1.99 Notice that it is possible that different data records contain the same field value records #2 and #3 both have firstName field values of David for example, and records #1, #5 and #6 have the same value of Wang for their lastName field. This makes sense: You wouldnt expect that people have unique first names (or last names or GPAs). Of course, a database doesnt just hold lots of data. It also enables the user to search through that data in order to find data records that match the users criteria. In database lingo, this is called querying the database. So once we have added one or more data records to the database, we may then query the database about these data records. A database query, for the purposes of this assignment, has two parts:

    1. Each query specifies a list of field names followed by minimum and maximum acceptable values for each field name.

  • 4

    2. Each query specifies how to order/sort the data records that match your querys criteria when those matching data records are returned to the user.

    Here is an example query (written in pseudo-code for clarity): Fields to match:

    lastName,Ronson,Wang GPA,1.0,2.0

    Ordering criteria: GPA, descending lastName, ascending firstName, ascending The above query indicates that the user wants to find all users whose last names are between Ronson and Wang, inclusive (e.g., [Ronson,Wang]), *and* whose GPA is between 1.0 and 2.0, inclusive (e.g., [1.0,2.0]). Further, the query specifies that any matching data records must be returned to the user in a specific order. The results must first be ordered by the GPA field (in descending order 2.0 would come before 1.5, etc.). For those data records with exactly matching GPA values, the results must secondarily be ordered by their lastName field in ascending order (e.g., Branson comes before Coldwell). Finally, for those data records with the same GPA and the same last name, they should further be ordered by the firstName field in ascending order. So for the above query, the database would output the following data records in the following order: Row #5: Cindy,Wang,005393222,310-555-4545,2.00 Row #10: Bill,Smith,003004005,818-885-6735,1.99 Row #9: Joe,Smith,000000001,452-332-9492,1.99 Row #6: Buford,Wang,656999332,909-678-4567,1.80 Row #7: Rick,Ronzoni,676767545,310-666-2323,1.75 Row #8: Abel,Salfo,404932223,202-342-2342,1.25 Row #2: David,Smallberg,304454123,310-666-2323,1.25 As you can see, these seven returned rows all have a lastName field value between Ronson and Wang, inclusive, and have GPA values between 1.0 and 2.0, inclusive. Data records that dont meet BOTH of these criteria are absent from our query results. For example, David Copperfield has a GPA of 1.46, which is between 1.0 and 2.0. However Davids last name of Copperfield does not fall between Ronson and Wang, so Davids record has been omitted from the results. Notice that the rows have been ordered primarily by their GPA field, in descending order (with the highest GPA of 2.0 at the top, and the lowest GPA of 1.25 at the bottom). Secondarily, these records have been ordered by their lastName field, in ascending order. So notice that both Abel Salfo and David Smallberg both had matching GPAs of 1.25 therefore, they were secondarily ordered by their last name, with Salfo coming before

  • 5

    Smallberg in the results. Finally, notice that two of our users have the same GPA of 1.99. These two data records also have the same value for the lastName field (of Smith). Therefore, these records have been tertiarily ordered by their firstName field, in ascending order, with Bill coming before Joe. For the purposes of this assignment, the user may specify search criteria (e.g., find data records with a GPA of between 1.0 and 2.0) only on fields that have been designated as indexed in the schema. So in the above example, the user could specify search criteria referencing only the lastName and the GPA fields. The user may, however, order their search results (i.e., matching records) by any field(s) they like, regardless of whether the fields have been indexed.

    What Do You Need to Do? So, at a high level, what do you need to build? You need to build a class called Database:

    1. You need to be able to provide a schema to a database. 2. You need to be able to add one or more data records to a database, either

    retrieving these rows of data from a website on the Internet, or letting the user add them one at a time locally.

    3. You need to be able to issue a query to the database and obtain a collection of records that match the querys search criteria, ordered in a manner consistent with the querys sorting criteria.

    You need to build a class called MultiMap, representing a collection of key/value associations; this class must be implemented using a binary search tree.

    1. You can add a new item to a MultiMap 2. You can search for items in a MultiMap, getting a MultiMap::Iterator indicating a

    matching item. You need to build an iterator class called MultiMap::Iterator:

    1. You can access the key/value association that is indicated by a MultiMap::Iterator.

    2. You may advance an iterator forward or backward through its MultiMap. 3. You may check if an iterator is valid.

    What Will We Provide? Well provide a simple main.cpp file and a test.h file that brings your entire program together. The test.h file includes a test framework that will help you test your classes as you build them.

  • 6

    Well provide an HTTP class that can be used to download a web page from a web server on the Internet (e.g., from http://reddit.com). If you specify the URL for a page, it will download the contents of the page and place them into a string. The section on the HTTP class below has details. You will use this class to import record data (e.g., a list of student data records) from a remote website over the Internet. You must NOT modify this class in any way. Finally, well provide you with a class called Tokenizer to help you break apart strings and separate them (e.g., break aaa,bbb,ccc into aaa, bbb, and ccc). You may use this class anywhere you like in your project, but you must NOT modify it.

    The Tokenizer Class We provide a Tokenizer class for you to use in your program to simplify the process of tokenizing strings. Tokenizing is the process of breaking up a string that is divided by a set of delimiters into a succession of smaller strings. A delimiter is typically a dividing character like a space, period, comma or other punctuation mark. The Tokenizer class can be used to chop up a provided string into parts, with each part separated by delimiters that you specify. Here is the class declaration: class Tokenizer { public: Tokenizer(const std::string& text, std::string delimiters); bool getNextToken(std::string& token); }; Heres how you might use the class: void bar() { std::string delimiters = " ,.?"; // space, comma, period, question std::string tokenizeMe = "This isnt a test, is it? Really!"; Tokenizer t(tokenizeMe, delimiters); string w; while (t.getNextToken(w)) { cout

  • 7

    token: a token: test token: is token: it token: Really! You may use this class anywhere in your program where tokenization is required. This class is defined in the Tokenizer.h file (which we provide for your use ).

    The HTTP Class (aka But I don't know how to use C++ to access the Internet!) Oh, we knew you were going to say that! Such a whiner! But wouldnt you like to learn how to write a program that interacts with other computers over the Internet? We thought so. So were going to provide you with a reasonably functional Internet HTTP interface that lets you download pages from the Internet. HTTP is the protocol used by web browsers to download web pages from servers on the Internet into your browser. When you use our interface, you dont have to worry about the details of how to communicate over the Internet yourself. Of course, if you want to see how our interface works, youre welcome to do so and before you know it, youll be forming your own start-up Internet company to compete against Google1. Our HTTP interface's primary public function (get) is as easy to use as this: #include "http.h" int main() { string url = "http://en.wikipedia.org/wiki/Bald"; string page; // to hold the contents of the web page // The next line downloads a web page for you. So easy! if (HTTP().get(url, page)) cout

  • 8

    int main() { HTTP().set("http://a.com", "This is a test page."); HTTP().set("http://b.com", "Here is another."); HTTP().set("http://c.com", "Everyone loves CS 32"); string page; if (HTTP().get("http://b.com", page)) cout

  • 9

    so they may appear in any order. As an example, here are some ordered sequences of the associations from the example above, along with one that is not ordered: Ordered: Andrea " 6 Bill " 8 Bill " 2 Bill " 2 Bill " 3 Carey " 5 James " 4 Joe " 1 Larry " 7 Larry " 9

    Ordered: Andrea " 6 Bill " 2 Bill " 3 Bill " 2 Bill " 8 Carey " 5 James " 4 Joe " 1 Larry " 9 Larry " 7

    Ordered: Andrea " 6 Bill " 2 Bill " 2 Bill " 3 Bill " 8 Carey " 5 James " 4 Joe " 1 Larry " 7 Larry " 9

    Not ordered: Bill " 2 Andrea " 6 Bill " 3 Bill " 8 Bill " 2 Carey " 5 Joe " 1 James " 4 Larry " 7 Larry " 9

    The last sequence above is not ordered because "Andrea" must precede "Bill", and "James" must precede "Joe". An iterator for a multimap is either invalid, or is valid and indicates one of the associations in the multimap. Given a valid iterator, you can retrieve the key and the value of that association. You can tell the iterator to advance to the next association or back up to the previous association in that multimap, where next and previous are in terms of an ordered sequence of the multimap's associations. Here is an example of using the MultiMap and MultiMap::Iterator types you will implement. This code creates a MultiMap, inserts some associations into it, and writes all the associations whose key is greater than or equal to "Bill", in order of the keys: void foo() { MultiMap myMultiMap; myMultiMap.insert("Andrea", 6); myMultiMap.insert("Bill", 2); myMultiMap.insert("Carey", 5); myMultiMap.insert("Bill", 8); myMultiMap.insert("Batia", 4); myMultiMap.insert("Larry", 7); myMultiMap.insert("Larry", 9); myMultiMap.insert("Bill", 3); // Start at the earliest-occurring association with key "Bill" MultiMap::Iterator it = myMultiMap.findEqual("Bill"); while (it.valid()) { cout

  • 10

    One possible output produced by this function is Bill 3 Bill 8 Bill 2 Carey 5 Larry 9 Larry 7 The output must contain the three "Bill" lines, followed by the "Carey" line, followed by the two "Larry" lines. The order of the "Bill" lines among themselves is allowed to be different, and the "Larry" lines may be in the other order. The specification below of MultiMap's findEqual member function clarifies why all three "Bill" lines must appear.

    The MultiMap Class Your MultiMap class must have the following public interface. You must NOT change or add to the public interface, with two exceptions: (1) if you wish, you may add MultiMap::Iterator constructor(s) with whatever parameters you like, and (2) if the compiler-generated destructor, copy constructor, and assignment operator for MultiMap::Iterator don't behave correctly, you may declare and implement them. class MultiMap { public: // You must implement this public nested MultiMap::Iterator class class Iterator { public: Iterator(); // You must have a default constructor Iterator(/* you may have any parameters you like here */); bool valid() const; std::string getKey() const; unsigned int getValue() const; bool next(); bool prev(); }; MultiMap(); ~MultiMap(); void clear(); void insert(std::string key, unsigned int value); Iterator findEqual(std::string key) const; Iterator findEqualOrSuccessor(std::string key) const; Iterator findEqualOrPredecessor(std::string key) const; private: // To prevent MultiMaps from being copied or assigned, declare these members // private and do not implement them. MultiMap(const MultiMap& other); MultiMap& operator=(const MultiMap& rhs); };

  • 11

    Here are the general requirements for your MultiMap class:

    1. Your MultiMap class must use the public interface documented above. You may add only private members to the MultiMap class; you must not add other public members to MultiMap. Doing so will result in a score of ZERO for this part of the project.

    2. The keys of the associations are case-sensitive, so D'Oyly Carte and d'oyly carte are different keys. This makes your implementation task easier.

    3. Your MultiMap class does not need to implement any member function for removing an individual association. (You will, though, have to remove all associations in clear() and the destructor.)

    4. You must not use any STL containers to implement MultiMap (e.g., no map, set, multimap, unordered_map, vector, list, etc.).

    5. As detailed later, you must implement MultiMap using a binary search tree that you build yourself (defining your own private node type, maintain a root pointer, etc.). The tree does not need to use a balancing algorithm (unless youre masochistic and want to implement one).

    For the descriptions below, we define a valid Iterator as one for which calling valid() on it returns true; an invalid Iterator is one for which it returns false.

    Requirements for MultiMap()

    The default constructor must create a MultiMap containing no associations. This constructor must run in O(1) time.

    Requirements for ~MultiMap()

    The destructor must release all resources held by the MultiMap. For a MultiMap containing N associations, the destructor must run in O(N) time.

    Requirements for void clear()

    The clear method must remove all associations from the MultiMap, resulting in it containing no associations. (You might later insert some new associations.) For a MultiMap containing N associations, clear must run in O(N) time.

    Requirements for void insert(std::string key, unsigned int value)

    The insert method must add to the MultiMap a new association with the indicated key and value. Because this is a multimap, it is allowable for this operation to result in more than one association having the same key, or even the same key and value. For a MultiMap containing N associations, insert must run in average case O(log N) time, worst case O(N) time. (Because you are not required to keep the binary search tree you use to implement MultiMap balanced, some insertion orders may result in a terribly unbalanced tree.)

  • 12

    Requirements for Iterator findEqual(std::string key) const

    If no association in the MultiMap has a key equal to the key parameter, the findEqual method must return an invalid Iterator; otherwise, it must return a valid Iterator. If at least one association in the MultiMap has a key equal to the key parameter, the findEqual method must return a valid Iterator indicating the earliest such association. By "earliest", we mean that in an ordered sequence of all the associations in the MultiMap, the association indicated by the returned Iterator comes before any others with the same key or a greater key. This "earliest" requirement is why the example code on page 9 prints all three "Bill" lines, not two or one of them. For a MultiMap containing N associations, findEqual must run in average case O(log N) time, worst case O(N) time. (Because you are not required to keep the binary search tree you use to implement MultiMap balanced, a terribly unbalanced tree may result in the worst case search time.)

    Requirements for Iterator findEqualOrSuccessor(std::string key) const

    If no association in the MultiMap has a key greater than or equal to the key parameter, the findEqualOrSuccessor method must return an invalid Iterator; otherwise, it must return a valid Iterator. If at least one association in the MultiMap has a key greater than or equal to the key parameter, the findEqualOrSuccessor method must return a valid Iterator indicating the earliest such association. By "earliest", we mean that in an ordered sequence of all the associations in the MultiMap, the association indicated by the returned Iterator comes before any others with the same key or a greater key. For a MultiMap containing N associations, findEqualOrSuccessor must run in average case O(log N) time, worst case O(N) time. (Because you are not required to keep the binary search tree you use to implement MultiMap balanced, a terribly unbalanced tree may result in the worst case search time.)

    Requirements for Iterator findEqualOrPredecessor(std::string key) const

    If no association in the MultiMap has a key less than or equal to the key parameter, the findEqualOrPredecessor method must return an invalid Iterator; otherwise, it must return a valid Iterator. If at least one association in the MultiMap has a key less than or equal to the key parameter, the findEqualOrPredecessor method must return a valid Iterator indicating the latest such association. By "latest", we mean that in an ordered sequence of all the associations in the MultiMap, the association indicated by the returned Iterator comes after any others with the same key or a lesser key.

  • 13

    For a MultiMap containing N associations, findEqualOrPredecessor must run in average case O(log N) time, worst case O(N) time. (Because you are not required to keep the binary search tree you use to implement MultiMap balanced, a terribly unbalanced tree may result in the worst case search time.)

    The MultiMap::Iterator Class Here are the general requirements for your nested MultiMap::Iterator class:

    1. Your Iterator class must use the public interface documented above. You may add only private members to the Iterator class; you must not add other public members to Iterator, with two exceptions: (1) if you wish, you may add Iterator constructor(s) with whatever parameters you like, and (2) if the compiler-generated destructor, copy constructor, and assignment operator for Iterator don't behave correctly, you may declare and implement them.. Adding any other public members will result in a score of ZERO for this part of the project.

    2. If an Iterator is created as a result of an operation on a MultiMap, then after insert(), clear(), or the destructor is called on that MultiMap, the behavior of further operations on that Iterator, except assigning to it or destroying it, is not defined by this spec. Roughly speaking, if a MultiMap's contents change, you can't assume any Iterators currently being used with it are still reliable to use. Notice that since this spec leaves the behavior undefined in this case, your implementation may do whatever it likes in this case, even crashing. Typically, you don't write any special code to detect such a situation (which is often impossible or expensive to do), so you just allow your normal code to do what it does, letting the chips fall where they may.

    For the descriptions below, we talk about an Iterator being in a valid or an invalid state. An iterator is in a valid state if it indicates an association in a Multimap; otherwise, it is in an invalid state. (As an example, an Iterator in a valid state that indicates the last association in an ordered sequence of a MultiMap's associations goes into an invalid state when you call next() on it, since there is no association after the last one.) Requirements for MultiMap::Iterator::Iterator()

    The default constructor must create an Iterator in an invalid state. This constructor must run in O(1) time.

    Requirements for other MultiMap::Iterator constructors

    You may write other Iterator constructors with whatever parameters you like. It is your choice whether the Iterator created by any such constructor is in a valid or invalid state. Any such constructor must run in O(1) time.

  • 14

    Requirements for the MultiMap::Iterator destructor, copy constructor and assignment operator

    The Iterator class must have a public destructor, copy constructor and assignment operator, either declared and implemented by you or left unmentioned so that the compiler will generate them for you. If you design your class well, the compiler-generated versions of these operations will do the right thing. Each of these operations must run in O(1) time.

    Requirements for bool MultiMap::Iterator::valid() const

    The valid method must return true if the Iterator is in a valid state, and false otherwise. The valid method must run in O(1) time.

    Requirements for std::string MultiMap::Iterator::getKey() const

    If the Iterator is in a valid state, the getKey method must return the key from the association indicated by the iterator. This method must run in O(1) time. Notice that this spec does not define the behavior of getKey if the Iterator is in an invalid state, so your implementation may do whatever it likes in that case, even crashing.

    Requirements for unsigned int MultiMap::Iterator::getValue() const

    If the Iterator is in a valid state, the getValue method must return the value from the association indicated by the iterator. This method must run in O(1) time. Notice that this spec does not define the behavior of getValue if the Iterator is in an invalid state, so your implementation may do whatever it likes in that case, even crashing.

    Requirements for bool MultiMap::next()

    If the Iterator is in an invalid state, the next method does nothing and returns false. Otherwise, the Iterator is in a valid state, so it indicates an association in a MultiMap. Consider an ordered sequence of the associations contained by that MultiMap. If the association indicated by the Iterator is the last one in that sequence, then the next method puts the Iterator into an invalid state and returns false; otherwise, next makes the Iterator indicate the association in the sequence that comes immediately after the one it currently indicates, and returns true. For a MultiMap containing N associations, next must run in average case O(log N) time, worst case O(N) time. (Because you are not required to keep the binary search tree you use to implement MultiMap balanced, a terribly unbalanced tree may result in the worst case time. On the other hand, whether a tree is balanced or not, it can be proved that for most of the nodes, next can run in O(1) time!)

    Requirements for bool MultiMap::prev()

    If the Iterator is in an invalid state, the prev method does nothing and returns false.

  • 15

    Otherwise, the Iterator is in a valid state, so it indicates an association in a MultiMap. Consider an ordered sequence of the associations contained by that MultiMap. If the association indicated by the Iterator is the first one in that sequence, then the prev method puts the Iterator into an invalid state and returns false; otherwise, prev makes the Iterator indicate the association in the sequence that comes immediately before the one it currently indicates, and returns true. For a MultiMap containing N associations, prev must run in average case O(log N) time, worst case O(N) time. (Because you are not required to keep the binary search tree you use to implement MultiMap balanced, a terribly unbalanced tree may result in the worst case time. On the other hand, whether a tree is balanced or not, it can be proved that for most of the nodes, prev can run in O(1) time!)

    There is a further behavioral and performance requirement. Suppose that it is a valid Iterator indicating an association in a MultiMap containing N associations. Consider the following code: assert(it.valid()); MultiMap::iterator p; for ( ; it.valid(); it.prev()) p = it; for ( ; p.valid(); p.next()) cout

  • 16

    organized based on comparing keys. If the class we're implementing were a Map, not a MultiMap, so that each key is unique, then it would be obvious to have each node also contain the (only) value corresponding to the key. But you are to implement a MultiMap, which allows multiple associations that have the same key. There are several ways to implement a binary search tree that allows duplicate keys. One way is to have each node contain the key and value of a single association; if more than one association has the same key, the tree will contain more than one node with the same key. Let's call this the single-value-per-node approach, and see what happens when we execute this code: MultiMap mm; mm.insert("joe", 5); mm.insert("joe", 2); mm.insert("bill", 1); mm.insert("zoey", 4); mm.insert("joe", 7); The empty MultiMap would be represented by the empty tree, and inserting the joe"5 association would result in a tree that has the key "joe" and the value 5 in its one node. Let's assume we're not going to implement any tree-balancing algorithm (since the spec doesn't require us to, and we don't want to spend forever on this project!). Then this first-inserted node will always be the root of the tree. Now what happens when we insert the joe"2 association? By the definition of a binary search tree, the node containing that association can be inserted in either the left or the right subtree of a node with the same key. A simple approach would be to always insert the node in, say, the left subtree of a node with an equal key. If we take this approach, the tree that results from the five insert operations above is

    An Iterator would presumably contain a pointer to the node representing the association that the Iterator indicates. The natural implementation of Iterator's next method would make the Iterator's pointer point to the next node in an inorder traversal of the tree, and MultiMap's findEqual method would return an Iterator pointing to the node with the desired key that comes earliest in an inorder traversal of the tree. If we're not balancing the tree, that would be the deepest node with that key in the tree.

  • 17

    Another way to implement a Multimap using a binary tree uses what we'll call the multiple-value-per-node approach. Each node in the tree contains a unique key and a pointer to a linked list of list nodes containing the value parts of all the associations with that key. Using this approach, the tree resulting from this code MultiMap mm; mm.insert("joe", 7); mm.insert("joe", 2); mm.insert("bill", 1); mm.insert("zoey", 2); mm.insert("joe", 5); might look like this:

    With this approach, the representation of an Iterator is a little more complicated than a simple pointer to a tree node, because we have to be able to retrieve through the Iterator both the key and the value of the association it indicates. If we repeatedly call the next method on an Iterator, we have to be able to visit all the values associated with a given key, and then proceed to visit the values associated with the next key in an inorder traversal of the tree. You may use any binary search tree-based data structure you like to implement your tree. If you sketch out the algorithms for both of these methods, you'll find that the multiple-value-per-node approach is simpler to implement and can be considerably faster than the single-value-per-node approach. You may choose either approach, though, or another of your own design. Whichever approach you take, your Iterator's next method needs to know how to advance a pointer to a tree node to point to the tree node that would be next in an inorder traversal of the tree. The performance requirements imply that you must not implement next so that each time it's called, it starts going through a full inorder traversal of the tree until it finds the proper node. Here are some hints about one way of implementing it instead:

    1. Consider having your tree nodes contain a parent pointer. 2. Recognize the two cases you need to deal with:

    a. The current tree node has a right child; in this case, the next node in an inorder traversal is somewhere in the current node's right subtree.

    b. The current tree node does not have a right child; in this case, the next node in an inorder travesal, if there is one, is an ancestor of the current node.

  • 18

    Reasoning about the prev method is symmetrical. Well let you figure the rest out on your own. Try drawing some sample trees and see if you can figure out the pattern for locating the next node in the tree for an inorder traversal. Dont make your trees too simple or you may fool yourself into thinking the problem is easier than it is. There are other ways of enabling iteration with next and prev. As long as the implementation you come up with satifies the spec, you're free to use it.

    Database The Database class is responsible for implementing a simple database, and must leverage your MultiMap and MultiMap::Iterator classes to do so. Here's what you can do with a Database:

    1. You can create a new database. 2. You can specify a schema for a database. This specifies what fields (e.g., first

    name, last name, phone number, GPA) are present in each record. You may also specify which fields in the schema should be indexed, and therefore may be searched by the user in a query. For example:

    FirstName, indexed LastName, indexed PhoneNumber, not indexed Occupation, not indexed Age, indexed

    Note: Specifying a new schema for a database will remove all currentlyly existing records and indexes from the database.

    3. You can add one or more records to the database (e.g. a record with fields joe, smith, 818-555-1212, Engineer, 024).

    4. You can import a web page containing a bunch of records from a specified URL into the database, e.g., from http://some.website.com/data.txt. The web page must have the data stored in a comma-separated format. Here we show data for our schema of first name, last name, phone number, occupation, age:

    Joe,Smith,818-555-1212,forklift operator,019 Bill,Nachenberg,310-456-7890,college professor,056 Sally,Smallberg,800-123-4567,organ harvester,025 Yen,Chen,310-877-3353,advertising intern,028 Barry,Smith,442-324-2342,unemployed,060 Sally,Feng,543-234-2342,accountant,024 Daniel,Chen,310-345-3234,power-programmer,029 Dan,Nieh,510-656-4643,philosopher,023

  • 19

    5. You can select records in the database based on various criteria and have the results ordered order in a number of different ways (or not at all). For example, Find all people whose last name is between Nac and Smart, inclusive, who are between 20 and 23 years old. Order the resulting records by last name in ascending order, and if there are two or more records with the same last name, order those records by the first name in ascending order. or Find all people whose last name is greater than or equal to Feng and whose first name is between Carl and Eunice. Order the resulting records by first name in ascending order, and if there are two or more records with the same first name, order those records by the age in descending order.

    6. You can remove all records from the database and remove the current schema. After doing this, you may specify a new schema and add one or more new records that adhere to that schema.

    The Data Structures Used in a Database Ultimately, a simple database has three primary data structures:

    1. A schema description 2. A bunch of rows of data (also known as data records) 3. One or more field indexes

    The Schema Description The schema (shown in the diagram above as m_schema) describes what each record in the database must look like. In the diagram above, we can see that the schema indicates

  • 20

    that each record has three fields: a user name, a phone number, and the persons age. The user can specify any number of fields they like in a given schema. Notice that in addition to specifying the name of each field, the schema also indicates whether or not each field should be indexed or not. What does it mean that a field should be indexed? If a field is indexed, this means that the database must use a data structure (such as a multimap) that lets the user efficiently search through the values associated with that field in the database. For example, since the phonenum field is designated as it_indexed in the example above, the user must be able to efficiently (e.g., in log N time) search through the phone number values (e.g., 818-555-1212, 310-234-2342, 310-234-2342, 310-234-2342, 424-676-0202, etc.) of the records in the database to find all rows in the database with a specific phone number, or with a specified range of phone numbers.

    The Bunch of Rows of Data Each database holds zero or more rows of data (shown in the diagram above as m_rows). A single row of data holds a collection of values (e.g., a username, phone number and age) that matches the schema. (In our simple database, all values are C++ strings; in a real database, the values could be integers, doubles, strings, timestamps, etc.) As you can see in the diagram above, our database has five rows (numbered 0 through 4). In a real database, there might be billions of rows and theyd be stored on dozens of hard drives. For your database for this project, these rows may be stored in a STL vector.

    One or more field indexes Every field in your schema that has been designated as an indexed field must have a dedicated index inside your database. For the purposes of this project, an index is basically a binary search tree-based MultiMap that maps each field value (e.g., 310-234-2342) to the row or rows {2,3,4} where that field value may be found. Your database must have at least one index, and might have many indexes (shown as the m_fieldIndex vector/array in the diagram above), depending on how many fields the schema specifies must be indexed. For example, since the username field (the first or 0th field in our schema) was designated as indexed in our schema, m_fieldIndex[0] contains a mapping between every username value (e.g., climberkip, davidsmall, ednatodd, missessmall, smallkid) and a row number in the database of a record whose field equals that value. For example, m_fieldIndex[0] contains a mapping of ednatodd"4, because a record with a username field of ednatodd may be found in row 4 of m_rows. Similarly, in the example above, our phonenum field (the second field in the schema) was also designated as indexed. As such, notice that m_fieldIndex[1] contains a mapping between every phone number value (e.g., 818-555-1212, 310-234-2342, 310-234-2342,

  • 21

    310-234-2342, 424-676-0202) and a row number where that particular value may be found in m_rows. Notice that a given value like 310-234-2342 may be found in multiple rows in your database (that makes sense multiple people could have the same phone number), so your index needs to allow for this. This is why we use a multimap and not a regular map to implement each index. Why have an index? Well, say you want to quickly find all of the people who have a particular phone number, or find all people between 20 and 22 years old? If you have N records and no index data structure, youd have to use an O(N) linear search algorithm, going through every row of data looking for fields that matched what you were looking for. But with an index, you can quickly (in O(log N) time) locate all matching rows, speeding up your search dramatically this is exactly what real databases do! Ok, so now we know what data structures make up a Database. Lets discuss its public interface. Your Database class must have the following public interface. You must NOT change or add to the public interface, except that if the compiler-generated default constructor and/or destructor behave correctly, you do not have to declare or implement them. class Database { public: enum IndexType { it_none, it_indexed }; enum OrderingType { ot_ascending, ot_descending }; struct FieldDescriptor { std::string name; IndexType index; }; struct SearchCriterion { std::string fieldName; std::string minValue; std::string maxValue; }; struct SortCriterion { std::string fieldName; OrderingType ordering; }; static const int ERROR_RESULT = -1; Database(); ~Database(); bool specifySchema(const std::vector& schema); bool addRow(const std::vector& rowOfData); bool loadFromURL(std::string url); bool loadFromFile(std::string filename);

  • 22

    int getNumRows() const; bool getRow(int rowNum, std::vector& row) const; int search(const std::vector& searchCriteria, const std::vector& sortCriteria, std::vector& results); private: // To prevent Databases from being copied or assigned, declare these members // private and do not implement them. Database(const Database& other); Database& operator=(const Database& rhs); }; Here are the general requirements for your Database class:

    1. Your Database class must use the public interface documented above. You may add only private members to the Database class; you must not add other public members to Database. Doing so will result in a score of ZERO for this part of the project.

    2. Strings in the database are case-sensitive, so D'Oyly Carte and d'oyly carte are different strings. This makes your implementation task easier.

    3. You must not use the STL map, multimap, or unordered_map containers, or the nonstandard hash_map or hash_multimap containers, to implement Database. You may use any other STL containers (e.g., vector, list, set, etc.).

    4. For the purpose of indexing data that a schema requires to be indexed, your Database implementation must use a MultiMap.

    5. Your Database class must, at a minimum, contain the following data structures: a. m_rows: A vector of data records (e.g., a vector of vector of strings) b. m_fieldIndex: A vector of MultiMaps or pointers to MultiMaps

    Requirements for Database()

    The default constructor must create a Database containing no rows and no field descriptions in its schema. This constructor must run in O(1) time.

    Requirements for ~Database()

    The destructor must release all resources held by the Database. For a Database containing F fields in its schema and N rows, the destructor must run in O(FN) time.

    Requirements for bool specifySchema(const std::vector& schema)

    The specifySchema method is used to specify a new schema for the database. The schema describes what fields will be in every data record and which fields must be indexed by your database. Every time the user calls the specifySchema method, it must first completely reset your database, discarding any existing field descriptions in its schema, any existing rows, and any indexes. It then must install a new schema.

  • 23

    The details of the new schema are in the vector of Database::FieldDescriptors passed to the specifySchema method. Each FieldDescriptor structure holds two values: the name of a field (e.g., username or phonenum), and a value that specifies whether this field should be indexed or not (Database::it_indexed or Database::it_none). Heres how the specifySchema method might be called: bool setSchema(Database& db) { Database::FieldDescriptor fd1, fd2, fd3; fd1.name = "username"; fd1.index = Database::it_indexed; // username is an indexed field fd2.name = "phonenum"; fd2.index = Database::it_indexed; // phone # is an indexed field fd3.name = "age"; fd3.index = Database::it_none; // age is NOT an indexed field std::vector schema; schema.push_back(fd1); schema.push_back(fd2); schema.push_back(fd3); return db.specifySchema(schema); } The specifySchema method must ensure that the Database object maintains a separate index (implemented using a MultiMap) for every indexed field in the schema. In the above example, two of the fields, username and phonenum, were designated as indexed, so specifySchema must ensure that m_fieldIndex has two initialized MultiMaps to index all values stored in these two fields across all the rows. The specifySchema method must return false if there is not at least one indexed field in the schema; if it returns false, it must leave the schema with no field descriptions in it. If there is at least one indexed field, specifySchema returns true. Your specifySchema method does not have to worry about being passed an invalid schema (for example, one with an empty string as a field name, or one with two fields with the same name). Given the complexity of this project, youve got better things to worry about. We will not try to trick your code in this manner. If you do want to check for situations like these, you may; if you detect them, have specifySchema return false, leaving the schema with no field descriptions in it.

    Requirements for bool addRow(const std::vector& rowOfData)

    The addRow method is used to add a new data record (also known as a row) into your database. A row of data is represented by a vector of string values that correspond to the current schema. (You may not add a row until youve specified a schema.) The row must have the same number of fields as your current schema, and the jth item in

  • 24

    the vector corresponds to the jth field in the schema. The addRow method must return true if it is successful, and false otherwise. Given the schema shown in the previous sections example (username, phonenum, age), here is how the addRow method might be called to add a new row of data values that matches this schema to a database: void addARow(Database& db) // assumes schema has already been specified { std::vector row; row.push_back("ednatodd"); // field 0: username row.push_back("424-676-0202"); // field 1: phone number row.push_back("0035"); // field 2: age db.addRow(row); // add the new row to the Database } An added row of data must have the same number of fields as the schema, and the jth value of the added row corresponds to the jth field of the schema. So ednatodd, the first field in the row vector above, is a username value, since that was the first field name specified in our schema; 818-555-1212, the second value in the row vector, is a phonenum value, since that was the second field name specified in our schema above; and so on. If the row the user passes in contains a number of values that is not the number of fields in the Databases current schema (the one installed by the most recent successful call to the specifySchema method), then the new row will not be added, and the addRow method must return false without changing the database. Otherwise, the addRow method must perform the following actions when provided with a valid new row of data:

    1. It must add the new row to the end of the m_rows vector that the Database object maintains. If the m_rows vector already holds N rows in positions 0 through N-1, then a new row added must be placed in position N.

    2. If the new row is being added to position N of m_rows, then for each value at position j in the row of data that is being added, if field j was designated as an indexed field in the schema, then addRow must insert an entry into m_fieldIndex[j] that associates rowOfData[j] with N. (In the example above, fieldIndex[0] it would associate ednatodd with N, since field 0 of the schema is indexed.)

    3. The method returns true. So, suppose the database looked like this at some time:

  • 25

    After calling the addARow function above, the database would look like this:

    As you can see, ednatodds record was added to the end of the m_rows vector into position 4. In addition, since the schema specified that the username and phonenum fields must be indexed, an association has been added to m_fieldIndex[0] mapping ednatodd " 4, the position of the new record in m_rows. Also, a new association has been added to m_fieldIndex[1] mapping 424-676-0202 " 4. When the addRow() method has completed, our new row will have been added and all indexes updated. The method will then return true.

    Requirements for bool loadFromURL(std::string url) Requirements for bool loadFromFile(std::string filename)

    The loadFromURL method loads a schema and potentially many data records from a web page. (See the HTTP class section on page 7 to learn how to connect to the Internet.) The loadFromFile method loads a schema and potentially many data records from a data file.

  • 26

    Both functions work in the same way, exception that they get their input from different sources. If N is the number of data records loaded from the input and the schema has F indexed fields, then these methods must run in average case O(FN log N) time. Heres how the methods might be used: void addFromInternet(Database& db) { bool ok = db.loadFromURL("http://www.somesite.com/patient-data/january"); if (ok) cout

  • 27

    Yen,Chen,310-877-3353,advertising intern,028 Barry,Smith,442-324-2342,unemployed,060 Sally,Feng,543-234-2342,accountant,024 Daniel,Chen,310-345-3234,power-programmer,029 Dan,Nieh,510-656-4643,philosopher,023 The data being imported must have the same number of fields on each line (e.g., 5 in the example above) as the schema specified on the top line. If you cannot successfully import the data from the specified URL or file for any reason (the page indicated by the URL can't be fetched, the indicated file can't be opened, the data in the web page or file is not valid, the Internet is not available, the schema is invalid, a data row has the wrong number of fields, etc.), then these methods must return false. Otherwise these methods must import all rows (adding each to the end of the m_rows vector and indexing all relevant fields in m_fieldIndex), and return true. If these methods return false, then either the database must be put in a state in which the schema has no field descriptions in it and the database has no rows, or a state in which the schema is valid and the database correctly holds zero or more valid rows from the input (but not all the rows, otherwise you should return true); it's your choice which of these states you leave the database in.

    Requirements for int getNumRows() const

    The getNumRows method returns the number of rows currently in the database. This method must run in O(1) time.

    Requirements for bool getRow(int rowNum, std::vector& row) const

    The getRow method puts data from the row at position rowNum in the database into the provided row vector parameter. Any data row contained any data prior to the call to getRow must be replaced by the desired row of data from m_rows. This method must run in O(F) time, where F is the number of fields in each row. If rowNum is invalid, then getRow must return false and not change the row parameter. Otherwise, the method returns true. void getAndPrintRowNum(Database& db, unsigned int rowNum) { std::vector targetRow; bool ok = db.getRow(rowNum, targetRow); if (ok) { // print each field value followed by a space for (size_t i = 0; i < targetRow.size(); i++) cout

  • 28

    cout

  • 29

    void doAQuery(Database& db) { std::vector searchCrit; SearchCriterion s1; s1.fieldName = "username"; s1.minValue = "albert"; s1.maxValue = "molly"; SearchCriterion s2; s2.fieldName = "phonenum"; s2.minValue = ""; // no minimum specified s2.maxValue = "310-234-2342"; searchCrit.push_back(s1); searchCrit.push_back(s2); // Well leave our sort criteria empty for now, which means // the results may be returned to us in any order std::vector sortCrit; std::vector results; int numFound = db.search(searchCrit, sortCrit, results); if (numFound == Database::ERROR_RESULT) cout

  • 30

    } else cout

  • 31

    As an example, one possible sequence of sorting criteria might be: username, ascending age, descending phonenum, ascending What does it mean to have three (or more) different items in our ordering criteria? It means that all three ordering rules must be applied to the data. But how can we sort things by username and age and phonenum? Well, all rows must first be ordered by their username field value (the top criterion) in ascending order. That means that any row containing albert would come before a row containing cindy, and that a row containing albert would come after a row with a username of Albert (upper-case ASCII characters have lower values than lower-case ASCII characters). If each row had a different value for its username field, all rows would be ordered simply by their username field. However, if two or more rows have a the same username, then how should those rows be ordered relative to each other? Well, the next sorting criterion (age, descending, in the example) must be applied to order those rows. In the example above, wed order all rows with the same username based on each persons age, in descending order (i.e., with older people above younger people) And what if two or more rows have identical usernames and ages? Well, then the last sorting criterion (phonenum, ascending) would need to be applied to these rows. In the example above, these rows would be ordered based on their phone number, in ascending order. Heres an example to help you understand how to properly order your results. Consider the following schema, used to represent students: lastname, indexed firstname, indexed studentID, not indexed GPA, not indexed Suppose that searching through the database using some search criteria indicated that the following data rows satisfied the search criteria: Smith,James,100300001,3.50 Nachenberg,Carey,400217123,3.99 Smallberg,David,000000001,3.99 Wang,Billy,398764354,2.73 Feng,Cameron,424567897,3.24 Wang,Lily,240943234,3.87

  • 32

    Nachenberg,Simon,001423625,2.15 Wang,Jeff,592325224,3.76 Smith,Alice, 200300421,3.50 Wang,Eric,909222524,3.17 Smith,James,777493762,3.52 Now, further suppose that the user specified the following sorting criteria: SortCriteria c1; c1.fieldName = "lastname"; c1.ordering = Database::ot_ascending; SortCriteria c2; c2.fieldName = "firstname"; c2.ordering = Database::ot_ascending; SortCriteria c3; c3.fieldName = "GPA"; c3.ordering = Database::ot_descending; Then the rows must be returned in the following order: Feng,Cameron,424567897,3.24 Nachenberg,Carey,400217123, 3.99 Nachenberg,Simon,001423625,2.15 Smallberg,David,000000001,3.99 Smith,Alice, 200300421,3.50 Smith,James,777493762,3.52 Smith,James, 100300001, 3.50 Wang,Billy,398764354,2.73 Wang,Eric,909222524,3.17 Wang,Jeff,592325224,3.76 Wang,Lily,240943234,3.87 As you can see, all of the results were first ordered by last name in ascending order. Second, where there was a tie with the last name (as with Nachenberg, Smith and Wang), items were further ordered by their first name in ascending order. Finally, where there was a tie in both last and first name, as with James Smith (a common name) the items were further ordered in descending order by the students GPAs. Now youre probably wondering: How do I sort a bunch of data items based on multiple criteria. Heres a hint that you can adopt (with many changes) to solve this problem. The example below sorts a bunch of student records by lastname (ascending), firstname (ascending) and GPA (descending). The sorting criteria are hard-coded into the program, which isn't what you need for this project.

  • 33

    struct Student { std::string lastName; std::string firstName; std::string studentID; std::string GPA; }; bool doesABelongBeforeB(const Student& a, const Student& b) { // return true if a belongs before b, false otherwise // assuming an ascending ordering by lastName if (a.lastName < b.lastName) return true; if (a.lastName > b.lastName) return false; // otherwise the lastnames are the same // return true if a belongs before b, false otherwise // assuming an ascending ordering by firstName if (a.firstName < b.firstName) return true; if (a.firstName > b.firstName) return false; // both lastname AND firstname match, try GPA // return true if a belongs before b, false otherwise // assuming a descending ordering by GPA return a.GPA > b.GPA; } void sortStudents(Student array[], int numStudents) { sort(array, array + numStudents, doesABelongBeforeB); } Hopefully this will give you an idea of how to solve the problem. Big-O requirements Assume that the database holds N rows. Assume there are C search criteria that were provided to identify matching rows. Assume that the search criterion for a particular field identifies M matching items. Assume that your query results in R matching rows of data. Assume that the results are to be sorted using S sorting criteria. Here are the time complexity requirements for the search method:

    1. To determine all rows that meet a single search criterion (e.g., find all last

    names between Nachenberg and Smallberg), search must run in average case O(M log N) time.

  • 34

    2. To determine which rows meet all criteria and should be returned to the user (e.g., last name is between Nachenberg and Smallberg, AND GPA is between 3.0 and 4.0), search must run in average case O(CM log N) time. (Hint: Theres a hash-based version of the set class called unordered_set.)

    3. To order your R matching rows based on the S sort criteria, search must run in average case O(SR log R) time.

    Our Testing Framework We have built a simple data-driven test framework that you can use to test your project 4 implementation and also test how our solution works. If you simply use our provided main.cpp and test.h files, youll have a nice system for testing your Database code. So how does our test framework work? Well, when you use our main.cpp and test.h files and compile them with your MultiMap and Database classes, your compiled program will act as a test-bed. You can run your compiled program from the command line with various test scripts, like this:

    This will cause our test-bed to follow the test instructions in the test-script.dat file (which well describe in a second), and use these instructions to test your Database class. Alternatively, you can run your compiled program from within Visual Studio or Xcode, and it will start by asking you for the name of the script file you want to use. So what does a test script look like? Well its basically a bunch of commands, with one command per line. Our test system loads up the script and basically executes it from top to bottom, one command at a time. It prints out all of the results to cout as it goes. Here are the commands you may use:

  • 35

    The file Command and the url Command These commands may be used to import a schema and one or more rows of data from a data file or from a website via a URL. The command syntax is (case sensitive): file:c:\proj4-14\data-file.txt or file:/Users/fred/cs32p4/mydata or url:http://cs.ucla.edu/classes/winter14/cs32/Projects/4/Data/census.csv Note that there are no superfluous spaces allowed between the command (e.g. file or url), the colon (:) or the argument (the filename or URL). This command will cause the specified file or web page to be loaded from the disk/internet into your database by calling your Databases loadFromFile() or loadFromURL() method. Such a file or web page must contain the schema first, then one or more rows of data. See the loadFromFile() and loadFromURL() sections of this document for more details on the appropriate format for this imported data.

    The schema Command This command may be used to initialize the current database and set its schema. The syntax is (case sensitive): schema:field1[*],field2[*],,fieldN[*] Each field name may optionally have a * immediately following it, indicating that this field is an indexed field. Fields that lack a * immediately after their name are non-indexed fields. (The [*] above indicates an optional * value.) Heres an example: schema:lastName*,firstName*,age*,occupation*,numKids,SSN This designates that your schema has 6 items named lastName, firstName, age, occupation, numKids and SSN (social security number). The lastName, firstName, age, and occupation fields have been designated as indexed (this would be passed into your Databases specifySchema() method).

  • 36

    Notice that no extraneous whitespace is allowed between fieldnames, asterisks, or the colon.

    The add Command The add command may be used to add a new row to a database whose schema has previously been set (e.g., either by loading the schema from a URL or file, or by specifying it with the schema command). The syntax is: add:field1Value,field2Value,,fieldNValue Notice that you must not have extraneous spaces before or after the colon or separating commas, although you may have spaces in your field values (e.g., software engineer) if you like. Heres how it might be used to add a new row of data consistent with the schema shown in the section above: add:Nachenberg,Carey,0042,software engineer,0,765-33-2242 This command will call your Databases addRow() method with the specified parameters.

    Issuing a Query: the qparam, sparam and execute commands If youd like to issue a query to your database, you can do it by specifying one or more sets of query parameters using the qparam command, then specifying zero or more sets of sorting parameters using the sparam command, and finally using the execute command. Heres the syntax of the qparam command: qparam:fieldName,minVal,maxVal qparam:fieldname,,maxVal qparam:fieldName,minVal, You may leave out either minVal or maxVal, but not both. You must always have two commas, even if you leave out a minimum or maximum value! Make sure not to have any extraneous whitespace. Here are some examples: qparam:lastName,Aaronson,Albertson This command would find all people with a last name of [Aaronson,Albertson] qparam:lastName,Aaronson,Aaronson

  • 37

    This command would find all people with a last name of Aaronson qparam:age,,30

    This command would find all people whose age is less than or equal to 30, e.g., [00,30]

    qparam:firstName,Gennady, This command would find all people whose first name is greater than or equal to Gennady

    Note that you may have multiple qparam commands, one after the other, to specify multiple search criteria for a query (see example below). Heres the syntax of the sparam command: sparam:fieldName,{ascending,descending} Make sure not to have any extraneous whitespace. The { } means that you must pick either ascending or descending, but not both. Here are some examples: sparam:lastName,ascending

    This command specifies that data should be sorted by the values of the lastName fields in ascending order.

    sparam:age,descending This command specifies that data should be sorted by the values of the age fields in descending order.

    Note that you may have multiple sparam commands, one after the other, to specify multiple ordering criteria for a query (see example below). The first sparam command to appear will be the primary ordering method, the second ordering command will then specify the secondary ordering method, and so on. Heres the syntax of the execute command: execute This command has no parameters, colons, etc. It must be placed after any desired qparam or sparam commands. It will take all of the earlier query parameters and sorting parameters found in the test script (since the last execute command) and pass them into your Databases search() method. It will then get the results from the search() method and print out all field values from each matching row to cout. Once it is done, all former qparam and sparam commands will be discarded, and your next query must specify new query and sorting parameters again. Heres a complete example of a query:

  • 38

    qparam:lastName,Wang,Zeng qparam:age,015,025 qparam:occupation,student,student sparam:lastName,ascending sparam:firstName,ascending sparam:age,descending execute And heres what a result might look like: Wang,Scott,019,student,0,545-28-2161 Wang,Taylor,017,student,0,938-11-9273 Wang,Taylor,016,student,0,735-30-2341 Yen,Kylie,020,student,0,478-82-9702 ------------------------------------------------------------- Notice that all returned records met the requirements of the query:

    1. The names were between Wang and Zeng, inclusive 2. The ages were between 15 and 25 years old 3. The occupation was student

    The results were then ordered first by their lastName in ascending order, then by their firstName in ascending order, and finally by their age in descending order.

    Requirements and Other Thoughts Make sure to read this entire section before beginning your project!

    1. In Visual C++, make sure to change your project from UNICODE to Multi Byte Character set, by going to Project " Properties " Configuration Properties " General " Character Set

    2. In Visual C++, make sure to add wininet.lib to the set of input libraries, by going to Project " Properties " Linker " Input " Additional Dependencies ; otherwise, youll get a linker error!

    3. The entire project can be completed in under 500 lines of C++ code beyond what we've already written for you, so if your program is getting much larger than this, talk to a TA youre probably doing something wrong.

    4. Before you write a line of code for a class, think through what data structures and algorithms youll need to solve the problem. How will you use these data structures? Plan before you program!

    5. Dont make your program overly complex use the simplest data structures possible that meet the requirements.

    6. You must not modify any of the code in the files we provide you that you will not turn in; since you're not turning them in, we will not see those changes. We will

  • 39

    incorporate the required files that you turn in into a project with special test versions of the other files.

    7. Make sure to implement and test each class independently of the others that depend on it. Once you get the simplest class coded, get it to compile and test it with a number of different unit tests. Only once you have your first class working should you advance to the next class.

    8. Were providing you with working versions of the MultiMap and MultiMap::Iterator classes that use the C++ STL libraries. You can use these classes to build and test your Database class even if you cant figure out how to implement your MultiMap or MultiMap::Iterator classes!

    9. You may use only those STL containers (e.g., vector, list) that are not forbidden by this spec. For MultiMap, this means you must use none at all. For Database, this means you must not use map, multimap, unordered_map, or the nonstandard hash_map; use your MultiMap class if you need a map, for example.

    10. Try your best to meet our big-O requirements for each method in this spec. If you cant figure out how, then solve the problem in a simpler, less efficient way, and move on. Then come back and improve the efficiency of your implementation later if you have time.

    If you dont think youll be able to finish this project, then take some shortcuts. For example, use the substitute MultiMap class we provide instead of creating your own MultiMap class if necessary to save time. You can still get a good amount of partial credit if you implement most of the project. Why? Because if you fail to complete a class (e.g., MultiMap), we will provide a correct version of that class and test it with the rest of your program. If you implemented the rest of the program properly, it should work perfectly with our version of the MultiMap class and we can give you credit for those parts of the project you completed. But whatever you do, make sure that ALL CODE THAT YOU TURN IN BUILDS without errors with both Visual Studio and either clang++ or g++!

    What to Turn In You should turn in five files:

    MultiMap.h Contains your MultiMap and MultiMap::Iterator declarations MultiMap.cpp Contains your MultiMap and MultiMap::Iterator implementations Database.h Contains your Database class declaration Database.cpp Contains your Database class implementation report.doc, report.docx, or report.txt Contains your report

    You are to define your classes' declarations and all member function implementations directly within the specified .h and .cpp files. You may add any #includes or constants

  • 40

    you like to these files. You may also add support functions for these classes if you like (e.g., operator