dbi316. building and maintaining applications with relational and non-relational data is hard...
TRANSCRIPT
Microsoft SQL Server Beyond Relational Landscape: Current and Future
Michael Rys ([email protected], @SQLServerMike)Principal Program Manager, SQL ServerMicrosoft Corp.
DBI316
Beyond Relational Data
Building and Maintaining Applications with relational and non-relational data is hard
Complex integrationDuplicated functionalityCompensation for unavailable services
Pain Points
Goals
Reduce the cost of managing all dataSimplify the development of applications over all dataProvide management and programming services for all data
My favorite Beyond Relational Application
Structured and unstructured Search
Related/”Semantic” Search
What is the Beyond Relational Vision?
Efficient storage for all dataTables, XML, Spatial, Documents, Digital Media, Scientific Records, Factoids…
Rich Data Processing Capabilities for all applicationsData formats and content natively understood for rich application and user experienceConsistent Application Model and Data Constructs to ease application development, migration and long-term retention
Rich Capabilities and Services over all dataProvide rich services, e.g.,
Query and Reason over data and extracted semanticsSearch across structural impedance of different data formatsIntegrated backup/restore for all data
Beyond Relational Story
StructuredData
Query
T-SQL
B-trees
ManageabilityAvailability
Files
Programmability
Beyond Relational Story
StructuredData
Query
T-SQL
B-trees
ManageabilityAvailability
Files
Programmability
Unstructured Data
Search
Beyond Relational Story
StructuredData
Query and Type Operations
T-SQL/Data Types
B-trees
ManageabilityAvailability
Files
Programmability
Unstructured Data
Search
Filestream
Win 32
Semi-structuredData/XML
XML, FTS, SpatialIndices
XQuerySpatial ops
Spatial, XML, HierarchyID
Beyond Relational Story
StructuredData
Query and Type Operations
T-SQL/Data Types
B-trees
Manageability& Availability
Programmability
Unstructured Data
Search
Win 32
Semi-structuredData/XML
Semantic Platform
Efficient Storage for BR Data
Rich Query and Search Services over all Data
Rich Data ProgrammingCapabilities
Files
Filestream
XML, FTS, SpatialIndices
XQuerySpatial ops
Spatial, XML, HierarchyID
Beyond Relational Feature OverviewSQL Server
2005SQL Server 2008
FileTable (Win 32 I/O)Scale-up FileStreamScale-up SearchSearch functionalitySemantic Similarity
Reliable Messaging
Service Broker
Full Text IndexingRich unstructured Data & Services
Remote BLOB Store APIFilestreamIntegrated FTS
Spatial
Fully supported Geometry and Geography data types and Functions
XML Data Type XQueryXML Schema
XML UpgradesLarge UDTsSparse ColumnsWide Table/ColumnSetFiltered IndicesHierarchyID
Semistructured Data & Documents
SQL Server Denali
Reporting Services support
FullGlobe2D ExtensionsPervasive Spatial
over FileStream
R2
Multi-castEnqueue time
Poison-Message handling
with RCSI
Beyond Relational in SQL Server Denali
Address important customer requests for Capabilities and rich services for Rich Unstructured Data (RUDS)
Scale Up for storage and search to 100mio to 500mio documentsEasy use/access to Unstructured data from all applicationsRich insight into unstructured data to make better decisions
We deliver what you asked for to build Spatial-aware Applications
Advanced 2D SpatialMake Spatial pervasive across platform
Rich Unstructured Data & Services Ecosystem
Fulltext Search
Semantic Key Phrase
Extraction
Rich
Serv
ices
Database
Disk1 Disk2 Disk3
Multiple Containers
Sca
le-u
p
Solu
tion
s
Database Applications
Transactional Access
Blobs
DBFileStreDB
FileStreams
Integrated Backup/Replication/AlwaysOn
Integrated AdministrationIntegrated Administration?
Win32 Applications
SMB Share Files/Folder
s
FileStream API
Streaming Win32 AccessStreaming Win32 Access??
Customer Application
Azure lib
Centera lib
SQL FILESTREAM lib
SQL RBS API
Azure Centera SQL DB
Remote BLOB Storage
FileStreamsFileTable
RBS Example Workflow
Application
RBS Client Library
BLOB Store Provider Library
BLOB Store SQL Server
ClaimID ClaimDate PhotoRef
4390 6/5/2007 <Binary(20)>1
2
3
1 Write BLOB(Photo)
Return Blob ID
Write Blob ID to PhotoRef field
2
3
Machine Boundary
RBS Services• Create• Fetch• GC• Delete
demo
Managing documents integrated in SQL Server Denali
FILETABLE Overview…
DB_Directory1(Database1)
DB_Directory2(Database2)
LogFiles (FileTable)
Documents(FileTable)
Media(FileTable)
MSSQLSERVER
\\my_machine\MSSQLSERVER\DBDirectory2\Documents
Filestream Share
Database Directories
FileTable Directories
FileTable Namespace Root Folder Hierarchy
FILESTREAM Read Performance (Remote)
240 KB 480 KB 1 MB 2 MB 4 MB 8 MB0
100
200
300
400
500
600
700
800
900
Filestream Win32 (Filesystem) Access
Filestream T-SQL
Varbinary
Filesystem Win32 Access Gain (%)
Th
rou
gh
pu
t (M
bp
s)
Measured with SQL Server 2008
FILESTREAM Write Performance (Remote)
240 KB 480 KB 1 MB 2 MB 4 MB 8 MB
-200
-100
0
100
200
300
400
500
600Insert
Filestream Win32 (Filesystem) Ac-cess
Filestream T-SQL
Varbinary
Filesystem Win32 Access Gain (%)
Th
rou
gh
pu
t (M
bp
s)
Measured with SQL Server 2008
Some tips…
Reading bigger buffers gives better performance Volumes hosting FILESTREAM/FILETABLE data should have 8.3 name generation and LastAccessTime disabledFILESTREAM/FILETABLE containers to reside on dedicated volumes Have one volume per FILESTREAM/FILETABLE container
enables space management at volume level“Magic” SMB buffer size = ~60KB
Another “good” value is 480KB
ROWGUID unique index for aligned partitioning for FILESTREAMAntiVirus programs should be configured not to delete infected files but to quarantine themIf using compressed volumes, use cluster size 4 KB
Scale-up: FileStreamMultiple Containers
SQL Server 2008 R2Only one storage container/FILESTREAM filegroupLimits storage capacity scaling and I/O scaling
SQL Server DenaliSupport for multiple storage containers/filegroup.
DDL Changes to Create/Alter Database statementsAbility to set max_size for the containersDBCC Shrinkfile Emptyfile support
Scaling FlexibilityStorage scaling by adding additional storage drivesI/O scaling with multiple spindles
Scale-up: Multiple containers
• Use of multiple spindles for achieving better I/O Scalability
Scale-up: FileStream Perf/Scale
Improved performance of T-SQL and File I/O accessVarious enhancements to improve read/write throughput
5 fold increase in Read throughputLinear scaling with large number of concurrent threads
Always On support for FILESTREAM
SQL Server 2008 R2 - No support for data mirroringHeavily requested feature
SQL Server Denali – Full support for SQL Always On New Always On solution will mirror FILESTREAM data alsoCTP1 supports Async mirroring and Readable SecondariesSync and Multiple Secondaries support coming later for Denali
Unstructured Storage In SQL Server 2008 & DenaliFile Stores /
External Blob Stores (CAS)
SQL BLOBs Remote Blob API FILESTREAM FILETABLE
Streaming Performance
Depends on external store
Depends on external store
Win32 App CompatDepends on external store
Depends on external store
Link Level Consistency
Data Level Consistency
Integrated Query & Management
Non-local Windows File Servers
n/a
External Blob Stores n/a
Full Text Search Challenges
In SQL Server 2005 and before:Indexes stored outside SQL Server lead to manageability challengesMixed query performance suffers from having to pull over complete full-text result setIndex creation has scaling issues on big boxes
In SQL Server 2008 and before:Query performance has scaling issuesSome useful search capabilities are not available: Property Search, customizable NEAR among them
Innovation in Search: Semantic similarity
Full Text Search Improvements in SQL Server 2008
Full-Text Engine and Indexes fully integratedCatalog, index and stopword lists now inside the database
Better performance in many common scenariosMake mixed queries perform and scaleOptimizer has knowledge about FT index
Exposing the FT Index content and any given WB behavioursNew word breakers (WB)
Better supportability toolsFrom 23 to over 40 word breakers/locales
SELECT * FROM candidates WHERE CONTAINS(resume,’”SQL Server”’) AND ZipCode = ‘98052’
Semantic SimilarityInput: Text such as varchar, Office, PDF, HTML, email…Output: Rowset functions with standard SQL queries
Illustrating example:
Key Title Document
D1 Annual Budget …
D2 Corporate Earnings …
D3 Marketing Reports …
… … …
---------------------------------------------------------
----------------------------------------------------------------------
----------------------------------------------------------------------
-------------
Source Table
ID Keyword Colid … compDocid CompOc CompPid
K1 revenue 1 … 10,23,123 (1,4),(5,8),(1,34) 2,5,6,8,4,3
K2 growth 1 … 10,23,123 (1,5),(5,9),(1,34) 2,5,6,8,5,4
… … … … … …
Keyword Index (Full-Text)
Keyphrases KeyphraseDocumentsID DocID
T1 (revenue) D1 (Annual Budget)
T2 (growth) D2 (Corporate Earnings)
T3 (Windows) D3 (Marketing Reports)
… …
T1 (revenue) D7 (Finance Report)
… …
T3 (Windows) D11 (Azure Strategy)
T4 (Azure) D11 (Azure Strategy)
ID Keyword
T1 revenue
T2 growth
T3 Windows
T4 Azure
… …
DocumentSimilarityDocID MatchedDocID
D1 (Annual Budget) D2 (Corporate Earnings)
D1 (Annual Budget) D7 (Finance Report)
D3 (Marketing Reports) D11 (Azure Strategy)
… …
Full-Text and Semantic Processing
quarter, record, revenue…
2b
3
2 a1
+ Language Models
demo
Semantic Extraction and RelationshipsFullText Search in SQL Server Denali
Full-Text Property Search
--needed once per database instance to load the office filtersexec sp_fulltext_service 'load_os_resources',1goexec sp_fulltext_service 'restart_all_fdhosts'go
-- use property searchCREATE SEARCH PROPERTY LIST p1;
ALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH (PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9', PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');
ALTER FULLTEXT INDEX ON fttable SET SEARCH PROPERTY LIST = p1Go
-- Querying DOCUMENT HAVING “John Doe” AS AUTHOR ONLYselect * from fttable where contains(property(ftcol, 'System.Author'), '"John Doe"')
Full-Text Customizable Near
--OLD NEAR SYNTAXselect * from fttable where contains(*, 'test near Space')
--NEW NEAR USAGES:
--SPECIFY DISTANCEselect * from fttable where contains(*, 'near((test, Space), 5,false)')
--REDUCE DISTANCEselect * from fttable where contains(*, 'near((test, Space), 2,false)')
--ORDER OF WORDS IS SPECIFIED AS IMPORTANTselect * from fttable where contains(*, 'near((test, Space), 5,true)')
Indexing Performance
Populating an index of 20 million rows of 1K data on identical hardware (time in minutes)
2 min 1 min
Scale-up: Full-Text Search
Queries over 350M documents database and random DMLs running in background. Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
Scale-up: Full-Text Search
Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer playback benchmark
SQL Server Spatial Data Overview
Storage and retrieval of spatial data using standard SQL syntaxSpatial Data Types (geometry, geography)Spatial Methods (intersects, buffer, etc.)Spatial Indexes
Offers full set of Open Geospatial Consortium components (OGC/SQL MM, ISO 19125)Spatial Builder InterfaceSSMS VisualizationIntegration with Bing MapsIntegration with Reporting Services in SQL Server 2008 R2
Denali Spatial 2D Customer Values and Goals
Add to spatial feature set based on industry trend and customer asksFaster and more predictable Performance
SQL CAT Whitepaper what is new in Denali
New Spatial Type Features
New relational methods added to geographyTo match geometry
Invalid objects are accepted for geographyMakeValid will convert them to valid objects
Bigger than a hemisphere objects for geographyFULLGLOBE spatial type is available“small” holes can be created
Increased precision from 27 to 48 bits for spatial computationsPersisted computed columns of a spatial typeSpatial Aggreators
New Spatial Type Features
Circular arcs segmentsCircularStringsCompoundCurvesCurvePolygons
All methods support circular arcs
Circular arc support on a ellipsoidRed – CircularString with 4 pointsYellow – LineString with the same 4 points
Spatial Performance
New default Index tessellationMore stable performance
for windows of different sizefor data with different spatial density
For default values:Up to 2x faster for longer queries > 500 ms
More efficient primary filter Fewer rows returned
10ms slower for very fast queries < 50 msIncreased tessellate time which is constant
More query plans are supporting spatial index: Nearest NeighborNew Index Hint to select cells for query windowIndex compression is enabled for spatial indexes
Ca. 50% smaller indices, 0-15% slower queries
Optimized spatial query plan for STDistance and STIntersects like queriesFaster point data queriesOptimized STBuffer, lower memory footprint
Spatial Index Performance
New grid gives much stable performance for query windows of different sizeBetter grid coverage gives fewer high peaks
Spatial Nearest Neighbor
Main scenarioGive me the closest 5 Italian restaurants
Execution plan SQL Server 2008/2008 R2: table scanDenali: uses spatial index
Specific query pattern required
SELECT TOP(5) *FROM Restaurants rWHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULLORDER BY r.pos.STDistance(@me)
demo
Spatial in SQL Server Denali
Nearest Neighbor Performance
NN query vs best current workaround (sort all points in 10km radius)
*Average time for NN query is ~236ms
Find the closest 50 business points (22 million in total)
Key Takeaways
Beyond Relational is targeting non-traditional database workloads that are growing rapidly in the enterprise.
Examples: Content and Collab, Spatial etc.
Beyond Relational is targeting key ISV asks in fast growing markets:
Examples: eDiscovery, Healthcare, GIS, Document management etc.
Beyond Relational is key strategy to enable you to build complex data applications that go beyond relational data!
Related Content
Denali Whitepapers and information:http://www.sqlserverlaunch.com/http://sqlcat.com/whitepapers/archive/2010/11/09/new-spatial-features-in-sql-server-code-named-denali-community-technology-preview-1.aspxSearch Demo: https://connect.microsoft.com/SQLServer/Downloads/DownloadDetails.aspx?DownloadID=35226
Find Me Later At…The SQL Server booth: Tuesday to Thursday in the afternoon On Twitter: @SQLServerMikeEmail: [email protected]: http://sqlblog.com/blogs/michael_rys
Database Platform (DAT) Resources
Try the new SQL Server Mission Critical BareMetal Hand’s on-Labs
Visit the updated website for SQL Server® Code Name “Denali” on www.microsoft.com/sqlserver and sign to be notified when the next CTP is availableFollow the @SQLServer Twitter account to watch for updates
Visit the SQL Server Product Demo Stations in the DBI Track section of the Expo/TLC Hall. Bring your questions, ideas and conversations!
• Microsoft® SQL Server® Security & Management • Microsoft® SQL Server® Optimization and Scalability• Microsoft® SQL Server® Programmability • Microsoft® SQL Server® Data Warehousing• Microsoft® SQL Server® Mission Critical • Microsoft® SQL Server® Data Integration
Resources
www.microsoft.com/teched
Sessions On-Demand & Community Microsoft Certification & Training Resources
Resources for IT Professionals Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet http://microsoft.com/msdn
Learning
http://northamerica.msteched.com
Connect. Share. Discuss.
Complete an evaluation on CommNet and enter to win!
Scan the Tag to evaluate this session now on myTech•Ed Mobile
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.