03/22/2013ecs251 winter 2013, final1 ecs251 winter 2013 final name: student id: email: open...

8
03/22/2013 ecs251 winter 2013, final 1 ecs251 Winter 2013 final Name: Student ID: Email: Open book/laptop, but NO Internet, or any type of electronic communication. Totally 5 questions, 6% each, total 8 pages. Please write precise and clean. Please READ the questions VERY CAREFULLY before putting down the final answer. And, also please mark your answer clearly. You can use the back of the pages. Every page of this exam book needs to be returned back. Some of the questions might not have a standard solution – i.e., you can argue either way (and, the grading will be based on your description/explanation). If we suspect any cheating behavior, we will pass the case to the academic committee immediately. Score: ___/30

Upload: helena-melton

Post on 18-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 03/22/2013ecs251 winter 2013, final1 ecs251 Winter 2013 final Name: Student ID: Email: Open book/laptop, but NO Internet, or any type of electronic communication

03/22/2013 ecs251 winter 2013, final 1

ecs251 Winter 2013 final

Name:Student ID:Email:

Open book/laptop, but NO Internet, or any type of electronic communication. Totally 5 questions, 6% each, total 8 pages.

Please write precise and clean. Please READ the questions VERY CAREFULLY before putting down the final answer. And, also please mark your answer clearly. You can use the back of the pages. Every page of this exam book needs to be returned back. Some of the questions might not have a standard solution – i.e., you can argue either way (and, the grading will be based on your description/explanation).

If we suspect any cheating behavior, we will pass the case to the academic committee immediately.

Score: ___/30

Page 2: 03/22/2013ecs251 winter 2013, final1 ecs251 Winter 2013 final Name: Student ID: Email: Open book/laptop, but NO Internet, or any type of electronic communication

With the “backward and serial validation” of OCC (Optimistic Concurrency Control), for the following execution history, which transaction(s) will be aborted and WHY?Q-01

T1:read Z

read W

read Xread Yvalidation

T2:

read W

read X

read Y

write Xwrite Yvalidation

T3:

read X

read Y

write Ywrite Xvalidation

T4:

read Yread Z

read W

write Zwrite Wvalidation

04/21/23 2ecs251 winter 2013, sample final

Page 3: 03/22/2013ecs251 winter 2013, final1 ecs251 Winter 2013 final Name: Student ID: Email: Open book/laptop, but NO Internet, or any type of electronic communication

(Two Phase Commit) Under exactly which condition, under the 2PC protocol, we have to wait for a crash server to recover before we can continue the commitment protocol?

Q-02

04/21/23 3ecs251 winter 2013, sample final

Page 4: 03/22/2013ecs251 winter 2013, final1 ecs251 Winter 2013 final Name: Student ID: Email: Open book/laptop, but NO Internet, or any type of electronic communication

(Transaction Support for GFS and HDFS) Under this question, we will consider to build a transaction system on top of GFS/HDFS (instead of on top of a Database). When a transaction needs to access an object (either read or write access), it needs to contact the Master Index server (the Master under GFS or Name node under HDFS, but with some extensions to support transaction) first. For supporting the ACID properties, we can potentially use either 2PL (Two-Phase Locking) or OCC (Optimistic Concurrency Control) to guarantee serilizability. As the system architect, please decide which option (2PL or OCC) is better for GFS/HDFS transactions and WHY?

Q-03

04/21/23 4ecs251 winter 2013, sample final

Page 5: 03/22/2013ecs251 winter 2013, final1 ecs251 Winter 2013 final Name: Student ID: Email: Open book/laptop, but NO Internet, or any type of electronic communication

(DHT/Chord) Here is a short description of Chord from Wikipedia:

The Chord protocol is one solution for connecting the peers of a P2P network. Chord consistently maps a key onto a node. Both keys and nodes are assigned an m-bit identifier. For nodes, this identifier is a hash of the node's IP address. For keys, this identifier is a hash of a keyword, such as a file name. It is not uncommon to use the words "nodes" and "keys" to refer to these identifiers, rather than actual nodes or keys. There are many other algorithms in use by P2P, but this is a simple and common approach.A logical ring with positions numbered 0 to 2m − 1 is formed among nodes. Key k is assigned to node successor(k), which is the node whose identifier is equal to or follows the identifier of k. If there are N nodes and K keys, then each node is responsible for roughly K / N keys.

When a new node joins or leaves the network, responsibility for O(K / N) keys changes hands.If each node knows only the location of its successor, a linear search over the network could locate a particular key. This is a naive method for searching the network, since any given message could potentially have to be relayed through most of the network. Chord implements a faster search method.Chord requires each node to keep a "finger table" containing up to m entries. The ith entry of node n will contain the address of successor((n + 2i − 1) mod 2m).

With such a finger table, the number of nodes that must be contacted to find a successor in an N-node network is O(logN).

QUESTION: Should the last part be O(logN) or O(m) (i.e., the size of the network versus the size of the ring)? Please justify your answer!

Q-04

04/21/23 5ecs251 winter 2013, sample final

Page 6: 03/22/2013ecs251 winter 2013, final1 ecs251 Winter 2013 final Name: Student ID: Email: Open book/laptop, but NO Internet, or any type of electronic communication

(Cont.)Q-04

04/21/23 6ecs251 winter 2013, sample final

Page 7: 03/22/2013ecs251 winter 2013, final1 ecs251 Winter 2013 final Name: Student ID: Email: Open book/laptop, but NO Internet, or any type of electronic communication

Q-05(Map and Reduce) from the wikipedia of MapReduce “… David DeWitt and Michael Stonebraker, computer scientists specializing in parallel databases and shared-nothing architectures, have been critical of the breadth of problems that MapReduce can be used for. They called its interface too low-level and questioned whether it really represents the paradigm shift its proponents have claimed it is. They challenged the MapReduce proponents' claims of novelty, citing Teradata as an example of prior art that has existed for over two decades. They also compared MapReduce programmers to Codasyl programmers, noting both are "writing in a low-level language performing low-level record manipulation.” MapReduce's use of input files and lack of schema support prevents the performance improvements enabled by common database system features such as B-trees and hash partitioning, though projects such as Pig (or PigLatin), Sawzall, Apache Hive, Ysmart, HBase and BigTable are addressing some of these problems.

Greg Jorgensen wrote an article rejecting these views. Jorgensen asserts that DeWitt and Stonebraker's entire analysis is groundless as MapReduce was never designed nor intended to be used as a database.

DeWitt and Stonebraker have subsequently published a detailed benchmark study in 2009 comparing performance of Hadoop's MapReduce and RDBMS approaches on several specific problems. They concluded that databases offer real advantages for many kinds of data use, especially on complex processing or where the data is used across an enterprise, but that MapReduce may be easier for users to adopt for simple or one-time processing tasks. They have published the data and code used in their study to allow other researchers to do comparable studies.”

Question: what is your view about DeWitt/Stonebraker’s opinion (versus Jorgensen’s, e.g.) about Map and Reduce? Please briefly describe the rationale behind your view.

04/21/23 7ecs251 winter 2013, sample final

Page 8: 03/22/2013ecs251 winter 2013, final1 ecs251 Winter 2013 final Name: Student ID: Email: Open book/laptop, but NO Internet, or any type of electronic communication

(Cont.)Q-05

04/21/23 8ecs251 winter 2013, sample final