skyserver database past, present, and future jim gray: microsoft alex szalay (and friends): johns...
TRANSCRIPT
SkyServer Database SkyServer Database Past, Present, and FuturePast, Present, and Future
Jim Gray: MicrosoftJim Gray: MicrosoftAlex Szalay (and friends): Johns HopkinsAlex Szalay (and friends): Johns Hopkins
Help from:Help from:Cathan Cook (personal SkyServer),Cathan Cook (personal SkyServer),Maria A. Nieto-Santisteban (image cutout service)Maria A. Nieto-Santisteban (image cutout service)Peter Kukol, Gyorgy Fekete, Wil O’Mullane (evolved HTM to native C#)Peter Kukol, Gyorgy Fekete, Wil O’Mullane (evolved HTM to native C#)Cesar Galindo-Legaria, Ming-Chuan Wu (statistics for optimizer)Cesar Galindo-Legaria, Ming-Chuan Wu (statistics for optimizer)
Promised AbstractPromised Abstract Can database technology help manage and mine Can database technology help manage and mine
scientific data? That is the question I have been scientific data? That is the question I have been trying to answer with my astronomy colleagues trying to answer with my astronomy colleagues (especially Alex Szalay.) We have had some success (especially Alex Szalay.) We have had some success but still face many problems. I will start by but still face many problems. I will start by describing the odyssey of putting the Sloan Digital describing the odyssey of putting the Sloan Digital Sky Survey online and give some statistics about Sky Survey online and give some statistics about how it is used and what we are doing now. That how it is used and what we are doing now. That segment will end with a discussion of how the segment will end with a discussion of how the integration of SQL with the CLR (common language integration of SQL with the CLR (common language run time) makes it much easier for us to handle run time) makes it much easier for us to handle scientific datatypes and spatial access methods. The scientific datatypes and spatial access methods. The World-Wide Telescope is an attempt to federate all World-Wide Telescope is an attempt to federate all the astronomy archives of the world. I will briefly the astronomy archives of the world. I will briefly describe the architecture of SkyQuery a prototype describe the architecture of SkyQuery a prototype portal to several archives each of which is a web portal to several archives each of which is a web service. Urls: service. Urls: http://research.microsoft.com/~grayhttp://research.microsoft.com/~gray http://skyserver.pha.jhu.edu/dr1/http://skyserver.pha.jhu.edu/dr1/ http://skyQuery.nethttp://skyQuery.net
Instructions from Last YearInstructions from Last Year
Less marketectureLess marketecture More SubstanceMore Substance More demos and codeMore demos and code
SkyServer Overview SkyServer Overview (10 min)(10 min) 10 minute SkyServer tour10 minute SkyServer tour
Pixel space Pixel space http://skyserver.sdss.org/en/http://skyserver.sdss.org/en/
Record space: Record space: http://skyserver.sdss.org/en/tools/explore/obj.asp?id=2255030989160697http://skyserver.sdss.org/en/tools/explore/obj.asp?id=2255030989160697 Doc space: NedDoc space: Ned Set space: Set space: Web & Query Logs: Web & Query Logs: Dr1 WebServiceDr1 WebService
You can download You can download (thanks to Cathan Cook )(thanks to Cathan Cook ) Data + Database code:Data + Database code: Website: Website:
Data Mining the SDSS SkyServer DatabaseData Mining the SDSS SkyServer Database MSR-TR-2002-01MSR-TR-2002-01
select top 10 * from weblog..weblog where yy = 2003 and mm=7 and dd =25 order by seq desc
select top 10 * from weblog..sqlLogorder by theTime Desc
http://skyserver.pha.jhu.edu/dr1/en/tools/chart/navi.asp
http://research.microsoft.com/~gray/SDSS/personal_skyserver.htm
Cutout Service (10 min)Cutout Service (10 min)A typical web serviceA typical web service Show itShow it Show WSDLShow WSDL Show fixing a bugShow fixing a bug Rush through code.Rush through code. You can download it.You can download it.
Maria A. Nieto-Santisteban did most of this (Alex and I started it)Maria A. Nieto-Santisteban did most of this (Alex and I started it)
http://research.microsoft.com/~gray/SDSS/personal_skyserver.htm
SkyQuery: SkyQuery: http://skyquery.net/http://skyquery.net/ Distributed Query tool using a set of web servicesDistributed Query tool using a set of web services Four astronomy archives from Four astronomy archives from
Pasadena, Chicago, Baltimore, Cambridge (England).Pasadena, Chicago, Baltimore, Cambridge (England). Feasibility study, built in 6 weeks Feasibility study, built in 6 weeks
Tanu Malik (JHU CS grad student) Tanu Malik (JHU CS grad student) Tamas Budavari (JHU astro postdoc)Tamas Budavari (JHU astro postdoc) With help from Szalay, Thakar, GrayWith help from Szalay, Thakar, Gray
Implemented in C# and .NETImplemented in C# and .NET Allows queries like:Allows queries like:
SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o,
TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5
AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2
2MASS
INT
SDSS
FIRST
SkyQueryPortal
ImageCutout
SkyQuery StructureSkyQuery Structure Each SkyNode publishes Each SkyNode publishes
Schema Web ServiceSchema Web Service Database Web ServiceDatabase Web Service
Portal is Portal is Plans Query (2 phase) Plans Query (2 phase) Integrates answersIntegrates answers Is itself a web serviceIs itself a web service
Four Database TopicsFour Database Topics
Sparse tables: Sparse tables: column vs row storecolumn vs row storetag and index tablestag and index tablespivotpivot
Maplist (cross apply)Maplist (cross apply) Bookmark bug Bookmark bug Object Relational has arrived.Object Relational has arrived.
Column Store PyramidColumn Store Pyramid Users see fat base tablesUsers see fat base tables
(universal relation)(universal relation)
Define popular columns index Define popular columns index tag tabletag table 10% ~ 100 columns 10% ~ 100 columns
Make many skinny indices Make many skinny indices 1% ~ 10 columns 1% ~ 10 columns
Query optimizer picks right planQuery optimizer picks right plan Automate definition & useAutomate definition & use Fast read, slow insert/update Fast read, slow insert/update Data warehouseData warehouse
Note: prior to Yukon, Note: prior to Yukon, index had 16 column limit. index had 16 column limit. A bane of my existence. A bane of my existence. Simpl
e
Typical Semi-join Fat quer
y
Obese query
BASEBASE
INDICIESINDICIES
TAGTAG
ExamplesExamples
create table base (id bigint, f1 int primary key,f2 int, …,f1000 int)
create index tag on base (id) include (f1, …, f100)
create index skinny on base(f2,…f17)
Simple
Typical Semi-join
Fat quer
y
Obese query
BASE
INDICIESINDICIES
TAGTAG
A Semi-Join ExampleA Semi-Join Example
create table fat(a int primary key, b int, c int, fat char (988))declare @i int, @j int; set @i = 0again: insert fat values(@i, cast(100*rand() as int), cast (100*rand() as int), ' ') set @i = @i + 1; if (@i < 1000000) goto again
create index ab on fat(a,b)create index ac on fat(a,c)
dbcc dropcleanbuffers with no_infomsgs
select count(*) from fat with(index (0)) where c = b-- Table 'fat'. Scan 3, reads 137,230, CPU : 1.3 s, elapsed 31.1s.
dbcc dropcleanbuffers with no_infomsgs
select count(*) from fat where b=c-- Table 'fat'. Scan 2, reads: 3,482 CPU 1.1 s, elapsed: 1.4 s.
1GB
8MB 8MB
b=c 3.4K IO 1.4 sec
abab acac
b=c 137 K IO
31 sec
Moving From Rows to ColumnsMoving From Rows to ColumnsPivotPivot & & UnPivotUnPivotWhat if the table is sparse? What if the table is sparse? LDAP has 7 mandatory LDAP has 7 mandatory
and 1,000 optional attributesand 1,000 optional attributes
Store row, col, valueStore row, col, value
create table Features ( object varchar , attribute varchar,
value varchar, primary key ( object,
attribute)) select *from (features pivot value on attribute
in (year, color) ) as Twhere object = ‘4PNC450’
Featuresobject attribute value
●●●●4PNC450 year 20004PNC450 color white4PNC450 make Ford4PNC450 model Taurus●●●●
TObject year color 4PNC450 2000 white
Maplist Meets SQL – cross applyMaplist Meets SQL – cross apply
Your table-valued function F(a,b,c) Your table-valued function F(a,b,c) returns all objects related to a,b,c.returns all objects related to a,b,c.
spatial neighbors, spatial neighbors, sub-assemblies, sub-assemblies, members of a group, members of a group, items in a folder,…items in a folder,…
Apply this function to each rowApply this function to each row Classic drill-downClassic drill-down
use use outer applyouter apply if f() may be null if f() may be null
select p.*, q.*from parent as p cross apply f(p.a, p.b, p.c) as qwhere p.type = 1
p1
f(p1)
p2
f(p2)
pn
f(pn)
The Bookmark BugThe Bookmark Bug SQL is a non-procedural language.SQL is a non-procedural language. The compiler/optimizer picks the The compiler/optimizer picks the
procedureprocedurebased on statistics.based on statistics.
If the stats are wrong or missing….If the stats are wrong or missing….Bad things happen.Bad things happen.Queries can run VERY slowly.Queries can run VERY slowly.
Strategy 1: allow users to specify plan.Strategy 1: allow users to specify plan. Strategy 2: make the optimizer smarterStrategy 2: make the optimizer smarter
(and accept hints from the user.)(and accept hints from the user.)
An Example of the ProblemAn Example of the Problem A query selects some fields A query selects some fields
of an index of an index and of huge table.and of huge table.
Bookmark plan: Bookmark plan: look in index for a subset.look in index for a subset. Lookup subset in Fat table.Lookup subset in Fat table.
This is This is great if subset << table.great if subset << table. terrible if subset ~ table.terrible if subset ~ table.
If statistics are wrong, If statistics are wrong, or if predicates not independent,or if predicates not independent,you get the wrong plan.you get the wrong plan.
How to fix the statistics?How to fix the statistics?
Ind
ex
Huge table
A Fix: Let user ask for statsA Fix: Let user ask for stats Create Statistics on View(f1,..,fn)Create Statistics on View(f1,..,fn) Then the optimizer has the right dataThen the optimizer has the right data
Picks the right plan.Picks the right plan.Statistics on ViewsStatistics on Views, , C. Galindo-Legaria, M. Josi, F. Waas, M. Wu, VLDB 2003C. Galindo-Legaria, M. Josi, F. Waas, M. Wu, VLDB 2003, ,
Q3: Q3: Select count(*) from Galaxy Select count(*) from Galaxy where r < 22 and r_extinction > 0.120where r < 22 and r_extinction > 0.120
Bookmark: 34 M random IO, Bookmark: 34 M random IO, 520 minutes520 minutes Create Statistics on Galaxy(objID ) Create Statistics on Galaxy(objID ) Scan: 5 M sequential IOScan: 5 M sequential IO 18 minutes18 minutes
Ultimately this should be automated,Ultimately this should be automated, but for now,… it’s a step in the right direction.but for now,… it’s a step in the right direction.
Object Relational Has ArrivedObject Relational Has Arrived VMs are moving inside the DBVMs are moving inside the DB Yukon includes Common Language Runtime Yukon includes Common Language Runtime
(Oracle & DB2 have similar mechanisms).(Oracle & DB2 have similar mechanisms). So, C++, VB, C# and Java are So, C++, VB, C# and Java are
co-equal with TransactSQL.co-equal with TransactSQL. You can define classes and methodsYou can define classes and methods
SQL will store the instancesSQL will store the instancesAccess them via methodsAccess them via methods
You can put your analysis code You can put your analysis code INSIDEINSIDE the database. the database.
Minimizes data movement.Minimizes data movement.You can’t move petabytes to the clientYou can’t move petabytes to the clientBut we will soon have petabyte databases.But we will soon have petabyte databases.
datacode
datacode
+code
And..And.. Fully-async and synchronous (blocking) callsFully-async and synchronous (blocking) calls
and multi-concurrent-result sets and multi-concurrent-result sets per connection (transaction)per connection (transaction)
Queues built in:Queues built in: Fire-and forget asynchronous processingFire-and forget asynchronous processing
It listens to Port 80 for SOAP calls :It listens to Port 80 for SOAP calls : TP-lite is backTP-lite is back
It’s a web service It’s a web service Notification service and Notification service and
data mining and data mining and olap and olap and reporting and reporting and xml and xml and xquery andxquery and.... ).... )
But, back to OR.But, back to OR.
The HTM code body
The Pre CLR designThe Pre CLR design
Transact SQL sp_HTM(20 lines) 469 lines of
“glue”looking like: // Get Coordinates param datatype, and param length information of if (srv_paraminfo(pSrvProc, 1, &bType1, &cbMaxLen1, &cbActualLen1, NULL, &fNull1) == FAIL) ErrorExit("srv_paraminfo failed...");
// Is Coordinate param a character stringif (bType1 != SRVBIGVARCHAR && bType1 != SRVBIGCHAR &&
bType1 != SRVVARCHAR && bType1 != SRVCHAR)ErrorExit("Coordinate param should be a string.");
// Is Coordinate param non-nullif (fNull1 || cbActualLen1 < 1 || cbMaxLen1 <= cbActualLen1)
ErrorExit("Coordinate param is null.");
// Get pointer to Coordinate parampzCoordinateSpec = (char *) srv_paramdata (pSrvProc, 1);if (pzCoordinateSpec == NULL)
ErrorExit("Coordinate param is null.");pzCoordinateSpec[cbActualLen1] = 0;
// Get OutputVector datatype, and param length information if (srv_paraminfo(pSrvProc, 2, &bType2, &cbMaxLen2, &cbActualLen2, NULL, &fNull2) == FAIL) ErrorExit("Failed to get type info on HTM Vector param...");
The “glue” CLR designThe “glue” CLR designDiscard 450 lines of UGLY Discard 450 lines of UGLY codecode
The HTM code body
C# SQL sp_HTM(50 lines)
using System;using System.Data;using System.Data.SqlServer;using System.Data.SqlTypes;using System.Runtime.InteropServices;namespace HTM {
public class HTM_wrapper {[DllImport("SQL_HTM.dll")] static extern unsafe void * xp_HTM_Cover_get (byte *str);public static unsafe void HTM_cover_RS(string input) {
// convert the input from Unicode (array of 2 bytes) to an array of bytes (not shown) byte * input; byte * output;
// invoke the HTM routine output = (byte *)xp_HTM_Cover_get(input);
// Convert the array to a tableSqlResultSet outputTable = SqlContext.GetReturnResultSet();
if (output[0] == 'O') { // if Output is “OK”uint c = *(UInt32 *)(s + 4); // cast results as datasetInt64 * r = ( Int64 *)(s + 8); // Int64 r[c-1,2]
for (int i = 0; i < c; ++i) { SqlDataRecord newRecord = outputTable.CreateRecord(); newRecord.SetSqlInt64(0, r[0]);
newRecord.SetSqlInt64(1, r[1]); r++;r++;outputTable.Insert(newRecord);
} } // return outputTable;} } }
Thanks!!! To Peter Kukol (who wrote this)
The Clean CLR designThe Clean CLR designDiscard all glue codeDiscard all glue codereturn array cast as tablereturn array cast as table
CREATE ASSEMBLY HTM_AFROM '\\localhost\HTM\HTM.dll' CREATE FUNCTION HTM_cover( @input NVARCHAR(100) )RETURNS @t TABLE ( HTM_ID_START BIGINT NOT NULL PRIMARY KEY, HTM_ID_END BIGINT NOT NULL
)ASEXTERNAL NAME HTM_A:HTM_NS.HTM_C::HTM_cover
using System;using System.Data;using System.Data.Sql;using System.Data.SqlServer;using System.Data.SqlTypes;using System.Runtime.InteropServices;namespace HTM_NS {
public class HTM_C {public static Int64[,2] HTM_cover(string input) {
// invoke the HTM routine return (Int64[,2]) xp_HTM_Cover(input); // the actual HTM C# or C++ or Java or VB code goes here.
} } }
Your/My code
goes here
Performance (Beta1)Performance (Beta1) On a 2.2 Ghz Xeon On a 2.2 Ghz Xeon
Call a Transact SQL functionCall a Transact SQL function
3333μμss
Call a C# functionCall a C# function
5050μμss
Table valued functionTable valued function
not good in not good in ββ11
Array (== table) valued function 200 Array (== table) valued function 200 μμss
+ per row 27 + per row 27 μμs s
CREATE ASSEMBLY ReturnOneAFROM '\\localhost\C:\ReturnOne.dll'GOCREATE FUNCTION ReturnOne_Int( @input INT) RETURNS INTAS EXTERNAL NAME ReturnOneA:ReturnOneNS.ReturnOneC::ReturnOne_IntGO----------------------------------------------- time echo an integerdeclare @i int, @j int, @cpu_seconds float, @null_loop floatdeclare @start datetime, @end datetimeset @j = 0set @i = 10000set @start = current_Timestampwhile(@i > 0) begin
set @j = @j + 1set @i = @i -1
endset @end = current_Timestampset @null_loop = datediff(ms, @start,@end) / 10.0
set @i = 10000set @start = current_Timestampwhile(@i > 0) begin
select @j = dbo.ReturnOne_Int(@i)set @j = @j + 1set @i = @i -1
endset @end = current_Timestamp set @cpu_seconds = datediff(ms, @start,@end) / 10.0 - @null_loopprint 'average cpu time for 1,000 calls to ReturnOne_Int was ' + str(@cpu_seconds,8,2)+ ' micro seconds'
The CodeThe Code
using System;using System.Data;using System.Data.SqlServer;using System.Data.SqlTypes;using System.Runtime.InteropServices;namespace ReturnOneNS { public class ReturnOneC {
public static int ReturnOne_Int(int input) {return input;
} }}
Function written in C# inside the DB
Program in DB in different
language (Tsql) calling function
What Is the Significance?What Is the Significance?
No more inside/outside DB dichotomy.No more inside/outside DB dichotomy. You can put your code near the data.You can put your code near the data. Indeed, we are letting users put Indeed, we are letting users put
personal databases near the data personal databases near the data archive.archive.
This avoids moving large datasets.This avoids moving large datasets. Just move questions and answers.Just move questions and answers.
Meta-MessageMeta-Message
Trying to fit science data into Trying to fit science data into databasesdatabases
When it does not fit, something is When it does not fit, something is wrong.wrong.
Look for solutionsLook for solutions Many solutions come from OR extensionsMany solutions come from OR extensions Some are fundamental engine changesSome are fundamental engine changes
More structure in DBMore structure in DB Richer operator setsRicher operator sets Better statisticsBetter statistics
© 2002 Microsoft Corporation. All rights reserved.© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.