Incremental Computation Of Common Windowed Holistic Aggregates-Books Pdf

Incremental Computation of Common Windowed Holistic Aggregates
25 Jan 2020 | 28 views | 0 downloads | 12 Pages | 480.85 KB

Share Pdf : Incremental Computation Of Common Windowed Holistic Aggregates

Download and Preview : Incremental Computation Of Common Windowed Holistic Aggregates

Report CopyRight/DMCA Form For : Incremental Computation Of Common Windowed Holistic Aggregates



Transcription

The rest of the paper is organised as follows Section 2. provides some detailed background on window functions ter. minology Sections 3 through 5 provide details of the al. gorithms for count distinct mode and discrete quantile Frame Row. Our experimental results are presented in Section 6 We. then discuss related work in Section 7 and conclude in Sec. Partition 1, 2 BACKGROUND, Window functions were introduced in SQL 2003 to ac. commodate complex analytical tasks because such compu Partition 2. tations are typically extremely difficult if not impossible to. implement in older versions of SQL When it is possible. the implementations often involve features such as recursive. queries and self joins and are quite verbose, In order to appreciate the complexity of windowed aggre. Partition 3, gation functions we need to first describe the computation. environment of different classes of SQL functions including. window functions We also need to understand the com. plexity of different aggregation types because we will be dis Figure 1 The Window Computation Environment. cussing windowed aggregates of the most complex type. 2 1 Computation Environments Frames are specified using a number of tuples preceding. SQL functions can be grouped by their computation en or following the tuple whose value is being computed In. vironment into the following three classes addition to a number of tuples frame edges can also be un. bounded When both edges are unbounded then the frame. Tuple functions whose computation environment is degenerates into the entire partition. a single tuple These functions combine existing at As a concrete example of a windowed aggregate consider. tributes in the current tuple to compute a new at the computation given above by avg Sales over parti. tribute in the same tuple Examples include arithmetic tion by Team order by Date rows between 3 preced. operations such as A B and data manipulation func ing and 3 following This function computes a moving. tions such as year ShipDate, average of sales data for each team over time Partitioning. by team keeps different teams sales data separate Order. Aggregate functions whose computation environment ing by date specifies the time series for the moving average. is a set of tuples defined by a common grouping key The rows preceding following clause defines the aggrega. These functions combine multiple values of a single tion frame and restricts the moving average to a seven day. attribute that share a common set of values for the window. grouping attributes Examples include sum Sales Frames can also be specified using ranges of values relative. group byCopyright, State City, 2003 2016 Tableau Software.
and median Delay Incorporated All Rights, to the currentReserved the single ordering column Range. group by Sensor frames can be converted to row frames by using either for. ward scans for fixed frames or by using binary search for. Window functions whose computation environment is variable frames Framing mode is independent of the type. a set of adjacent tuples These functions compute a of aggregate being computed so we will assume from this. new attribute for each row by combining values from point on that we are using row framing. neighboring rows where neighboring is defined by a Not all window functions specify frames and aggregates. partitioning an ordering and a frame Examples in in particular can be either framed or unframed Neverthe. clude rank over partition by Sport order by less we can implement unframed aggregates by simply using. Score desc and avg Sales over partition by the framed aggregate with a frame specification that is un. Team order by Date rows between 3 preceding bounded at both ends Unframed aggregates produce only. and 3 following one value for the entire frame and as optimising this is triv. ial we will assume from this point on that all aggregates are. The environment for Window functions is by far the most. using bounded framing, complex and consists of three levels as shown in Figure 1. The tuples being processed are first partitioned into. 2 2 Aggregate Complexity, disjoint groups based on 0 or more partition keys Aggregate functions have traditionally 14 been grouped. into three categories based on their complexity, The tuples in each partition are then ordered by sorting. the partition on 0 or more sorting keys Distributive aggregates are those whose computation. can be distributed and recombined Common exam, Finally each window function can specify a subset of ples of distributive aggregates include sum and max.
the partition called a frame that restricts the compu. tation to a consecutive set of rows within the ordered Algebraic aggregates are those whose computation is. partition a simple algebraic function of the data and other ag. gregates Common examples of algebraic aggregates Frame Data Result. include average and variance 0 3 4 3 2 7 2 5 4 3, 1 3 4 3 2 7 2 5 3 4. Holistic aggregates are those whose computation re 2 3 4 3 2 7 2 5 3 3. quires looking at all the data at once and hence their 3 3 4 3 2 7 2 5 3 3. evaluation cannot be decomposed into smaller pieces 4 3 4 3 2 7 2 5 3 4. Common examples of holistic aggregates include count. distinct and median Table 1 Count Distinct Example. Holistic aggregates are the most difficult to compute ef. ficiently because they cannot be decomposed into smaller The block iterated interfaces used in the TDE are simi. pieces All of the aggregates we consider here i e count lar to the interfaces used by HyPer and this design makes. distinct mode and quantile are holistic it easy to share state between consecutive frames for incre. mental evaluation of windowed aggregates, 2 3 Previous Work For historical reasons our sort code is a standard intro. Window functions are a relatively new feature of SQL but sort 21 with parallel sub sorts which has been adapted to. research on their optimisation is growing The first research sort pairs of columns We call out the implications of this. publication on how to optimise window function queries is in the discussion in Section 6. given by Cao et al 9 Their work examines how to optimise. multiple window functions with different window definitions 3 WINDOWED COUNT DISTINCT. They propose multiple physical Window operators based on. The first holistic aggregate we will investigate is count. the partitioning and sorting state of the previous window. distinct Consider the following SQL query which uses. including the traditional Full Sort and two new operators. this aggregate, called Hashed Sort and Segmented Sort Our work is based. on improving the performance of the Hashed Sort operator s e l e c t S t a t e count d i s t i n c t Department. The most relevant previous work is the implementation of from S a l e s. a Window operator proposed by Leis et al 20 and imple group by S t a t e. mented in HyPer 18 an in memory database system They. This query returns the number of unique departments. provided implementations of a number of important window. that made sales in each state States that have a low num. functions using a block style interface where multiple rows. ber may need to have more marketing resources devoted to. are processed in a single call They also present multiple al. them to improve sales, gorithms such as Removable Cumulative Aggregation and. We could implement this functionality in terms of simpler. acceleration structures such as Segment Trees for comput. SQL operations by using a nested query, ing windowed aggregates However the aggregate window.
functions they discuss are limited to distributive and alge s e l e c t S t a t e count. braic aggregates such as sum and average The goal of our from s e l e c t S t a t e Department. work is to extend the class of windowed aggregates to include from S a l e s. the most common holistic aggregates and to show how to group by S t a t e Department. compute them relatively efficiently group by S t a t e. The HyPer Window operator is similar to the Hashed Sort. This two level implementation works well for a single use. operator from 9 and is highly scalable Their implemen. of count distinct but if there is more than one e g we. tation first partitions the windowed table by hashing the. added a count distinct of customers to provide context. partition keys into a fixed number 1024 of bins and then. then each count will have to be computed separately using. sorts each bin on both the partition and sort keys We have. a variant of this two level query and then explicitly joined. based our Window operator on this model, back to the main query on State. 2 4 Our Implementation s e l e c t S t a t e DeptCount CustCount. We implemented our Window operator in the Tableau Data from. Engine TDE 23 a block iterated column store used in s e l e c t S t a t e group by S t a t e as S. the Tableau product suite The TDE uses a traditional l e f t join. Volcano 13 operator tree for execution The physical op s e l e c t S t a t e count as DeptCount. erator family consists of materialising stop and go opera from s e l e c t S t a t e Department. tors called Tables and non materialising iterating opera from S a l e s. tors called Flows The Window operator was implemented as group by S t a t e Department. a Table operator that consumes another Table operator as group by S t a t e as D. input on S S t a t e D S t a t e, During the hash partitioning phase of Window we use 256 l e f t join. hash buckets because we materialise the column containing s e l e c t S t a t e count as CustCount. the hash values and this choice allows us to materialise a from s e l e c t S t a t e Customer. column that is only one byte wide which reduces the mem from S a l e s. ory overhead by a factor of two This value is at the low end group by S t a t e Customer. of the bucket count sweet spot from Figure 9 in 20 and group by S t a t e as C. our evaluation in Section 6 2 shows that this choice does not on S S t a t e C S t a t e. impact performance, Algorithm 1 Na ve Count Distinct Algorithm 2 Incremental Count Distinct. 1 procedure NaiveCountFrame counts values F 1 procedure Increment nonzero counts value. 2 counts 2 if value counts then, 3 for f F do 3 counts value 1. 4 counts values f 1 4 else, 5 end for 5 counts value 1.
6 end procedure 6 nonzero 1, 7 7 end if, 8 procedure NaiveCountD result values f rames 8 end procedure. 9 counts 9, 10 for i 0 result do 10 procedure Decrement nonzero counts value. 11 F f rames i 11 counts value 1, 12 NaiveCountFrame counts values F 12 if counts value 0 then. 13 result i counts 13 nonzero 1, 14 end for 14 end if. 15 end procedure 15 end procedure, This approach generates two hash tables per aggregate 17 procedure IncrementalCountD result values.
one for the group by and one for the join and rapidly be f rames. comes unwieldy Because of this complexity count distinct 18 P 0 0. is sometimes implemented as an aggregate function that uses 19 counts. a single hash set per row as the aggregate state The hash 20 nonzero 0. set tracks the unique elements of the domain that have been 21 for i 0 result do. seen and the final result is simply the size of the hash set 22 F f rames i. This approach only requires one hash table per aggregate 23 if nonzero counts then. reducing complexity and increasing performance It also is 24 NaiveCountFrame counts values F. much easier to implement in a windowing context because 25 nonzero counts. we do not need to create aggregates and joins for every row 26 else. in the output 27 for f F P do, 28 Increment nonzero counts values f. 3 1 Na ve Count Distinct 29 end for, The simplest windowed algorithm for count distinct is 30 for f P F do. to use a hash set as in the aggregate function state imple 31 Decrement nonzero counts values f. mentation and clear it at the start of each frame We call. this algorithm Na ve Count Distinct and it is shown as Al 32 end for. gorithm 1 33 end if, In our implementation we try to reuse the main hash set 34 result i nonzero. memory but the content buckets are freed for each frame 35 P F. 36 end for, 3 2 Incremental Count Distinct 37 end procedure. To compute the next count distinct value we may be. able to reuse the hash table from the previous frame Con. sider the example in Table 1 with a 4 element wide frame are disjoint then we can fall back to the na ve algorithm. There are 3 distinct values in the brackets of Frame 0 but if not we can increment the counts for the added frame. When we move the frame one element to the right to values in P F at Line 28 and then decrement the counts. Frame 1 we remove a 3 on the left but there is still one for removed frame values in F P at Line 31 The result is. remaining and we also added a new value 7 on the right then the number of non zero counts. so the new count of distinct values is 4 The second 3 will. remain in the window for one more frame before being re 3 3 Autocorrelation. moved Suppose now that we have just moved to Frame 3 where. We can model this incremental change by tracking not the second 3 has just been removed In this situation we. only the values but their counts using a hash map When can either delete the hash table bucket or leave it with a. ever we move to the next frame we then only have to update count of 0 If the domain being aggregated is uncorrelated. the hash map counts for the values that are removed and with the ordering di. of aggregate being computed so we will assume from this point on that we are using row framing Not all window functions specify frames and aggregates in particular can be either framed or unframed Neverthe less we can implement unframed aggregates by simply using the framed aggregate with a frame speci cation that is un bounded at both

Related Books

FOR IMMEDIATE RELEASE American International Group

FOR IMMEDIATE RELEASE American International Group

FOR IMMEDIATE RELEASE 1 Contacts Sabra Purtill Investors 212 770 7074 sabra purtill aig com Daniel O Donnell Media 212 770 3141 daniel odonnell aig com Claire Talcott Media 212 458 6343 claire talcott aig com AIG Reports Fourth Quarter and Full Year 2019 Results Net income attributable to AIG common shareholders was 922 million or 1 03 per diluted common share for the fourth

BHARATHI STUDY CENTRE MADURAI TRICHY PH 9894274672

BHARATHI STUDY CENTRE MADURAI TRICHY PH 9894274672

BHARATHI STUDY CENTRE MADURAI TRICHY PH 9894274672 www kalvisolai com Tet Oct 2012 PAPER II Answer Keys 1 In order to motivate children a teacher offers some sweets for the completion of each problem This is a Ans A Primary reinforce 2 A child who is more delighted with success is possessing the need for Ans A affection 3 A teacher who encourages the playway method in his

National Curriculum Baseline Test Key Stage 2

National Curriculum Baseline Test Key Stage 2

Key Stage 2 National Curriculum Baseline Test 6 Write these numbers in order of size starting with the largest largest 7 Write the missing numbers 2 marks 144 hours days 63 days weeks 96 months years Page 9 of 22 8 At the start of January there were 451 houses for sale in the estate agents During January March 146 houses were sold 329 more houses were put on the market

Memory Based Questions of GATE 2020

Memory Based Questions of GATE 2020

Memory Based Questions of GATE 2020 Mechanical Engg Afternoon Session Q 14Q 14 Phase diagram does not represent a heat transfer rate b temperature at which phase change takes place c phase transformation rate d Ans c End of Solution Q 15Q 15 Initial tool geometry ASA 0 9 r e r s c e 0 R After some time 0 9 r e

Grade Book Manual NUST LMS Portal

Grade Book Manual NUST LMS Portal

Document LMS Grade Book Manual Version 1 Department LMS Project Status Final Issued Authors Asma Paracha Reviewer Ms Zunera Zahid Approver Approved from each School Representative and LMS Team Issue Date May 2014 Distributor LMS Team Disclaimer This document contains confidential information It should not be distributed without prior approval from LMS National University of Science and

Introduction to Moodle Lesson 3 Assignments

Introduction to Moodle Lesson 3 Assignments

Moodle Lesson 2 Assignments 1 Version Moodle 1 9 5 Revised Aug 20 2011 Introduction to Moodle Lesson 3 Assignments Suppose you want to assign an essay for students to write using Microsoft Word Part of the students assignment is to upload their completed Word files to Moodle where you can retrieve them for grading

Moodle Lesson Activity ctyou org

Moodle Lesson Activity ctyou org

Updated 5 31 2017 Moodle v3 3 22 MOODLE LESSON ACTIVITY WELCOME TO THE MOODLE LESSON ACTIVITY TUTORIAL In this tutorial you will learn What the Lesson activity is Suggestions for using the Lesson activity How to set up and build a simple linear Lesson If you have not used one of CareerTech s Moodle training tutorials before view the instructions for using it as a self

Table of Contents Oklahoma City Community College

Table of Contents Oklahoma City Community College

Taking Attendance in Moodle 74 The Moodle Gradebook 75 Setup Page 76 Gradebook Calculation 77 Course Grade Settings 82 Entering Student Grades for Offline Activities 84 Personalized Learning Designer 86 ILP Integration Menu 87 Never Attended Report 87 Final Grades Report 88 2 An online reference area that displays and expl expected p Expected Practices for Designing and Teaching Online

TEACHING WITH MOODLE itafe huntertafe edu au

TEACHING WITH MOODLE itafe huntertafe edu au

Using the Moodle gradebook 16 Storing your content 17 Completion tracking 20 Reporting and backup 20 General course maintenance and cleanup 21 Who does what in TAFE Western Moodles 21 Additional resources to help you 22 Frequently Asked Questions 23 Email template to send students with course Moodle details 24 Teaching with Moodle A Guide for TAFE Western Staff Ver 9 Nov 2016 Page 2

Introduction to Moodle Lesson 4 Discussions

Introduction to Moodle Lesson 4 Discussions

Moodle Lesson 3 Discussions 7 Version Moodle 1 9 5 Revised Aug 20 2011 Appendix Adding Discussions to the Moodle Gradebook There are a couple of different ways to handle grading discussions and adding these grades to the Moodle gradebook 1 When a discussion is complete look at the quality and quantity of each student s

COLLIN COUNTY COMMUNITY COLLEGE DISTRICT

COLLIN COUNTY COMMUNITY COLLEGE DISTRICT

BIOL 1406 Lecture Spring 2018 1 COLLIN COUNTY COMMUNITY COLLEGE DISTRICT LECTURE SECTION SYLLABUS COURSE NUMBER BIOL 1406 COURSE TITLE General Biology I INSTRUCTOR S NAME David L McCulloch OFFICE NUMBER I223 OFFICE HOURS MW 11 1 T 9 00 10 00 4 30 5 30 R 9 00 10 00 by appointment CLASS INFORMATION Class Meeting Time MW 9 30 11 00 am TR 10 00 11 15 am T 5 30 8