accelerated, threaded xml parsingmichael/data/2014-09-05-xml-parse.pdfsep 05, 2014  · – a new,...

22
Accelerated, Threaded XML Parsing loading your documents quicker … Matúš Kukan <[email protected]> Michael Meeks <[email protected]> matus & mmeeks, #libreoffice-dev, irc.freenode.net

Upload: others

Post on 21-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Accelerated, ThreadedXML Parsing

    loading your documents quicker …Matúš Kukan 

    Michael Meeks 

    matus & mmeeks, #libreofficedev, irc.freenode.net

    mailto:[email protected]:[email protected]

  • Big data needs DocumentLoad optimization

  • Parallelized Loading ...

    ● Desktop CPU cores are often idle.● XML parsing:

    ● The ideal application of parallelism● SAX parsers:● “Sucking icAche eXperience” parsers

    – read, parse a tiny piece of XML & emit an event … punch that deep into the core of the APP logic, and return ..

    – Parse another tiny piece of XML.● Better APIs and impl's needed: Tokenizing,

    Namespace handling etc.● Luckily easy to retro-fit threading ...● Dozens of performance wins in XFastParser.

  • XML format lameness ...

    ● Spreadsheets have a great way of expressing repeated formulae:● R1C1 notation:● =SUM($A$1:$A$5)-A1

    – → =SUM(R1C1:R5C1)-R[-2]C[-1]● Looks ugly – but it's constant down a column.● Lunatic standardizers for ODF ( & OOXML ) ignored me

    on this …

    ● Formulae hard and expensive to parse, so don't …● Predictive generation down a column & comparison.

    – Removes tons of token allocations etc.

  • Parallelised load:(boxes are threads).

    ● Split XML Parse & Sheet populate

    ● Parallelised Sheet Loading …

    ● Parallel to GPU compilation

    Unzip,XML Parse,

    Tokenize

    Thread 1 Thread 2Populate

    Sheet DataStructures.

    Unzip,XML Parse,

    Tokenize

    PopulateSheet DataStructures.

    … etc.

    =COVAR(A1:A300,B1:B300)→ OpenCL code

    → Ready to execute kernels

    Tools->Options->Advanced->”Experimental Mode” required for parallel loading

    Progress barthread

  • FastSaxParserImpl::parseStream(...){ get event list from maPendingevents; }

    FastSaxParserImpl::consume(...){ based on event type, call ..,while the list is not empty; }

    maPendingEventsprotected by mutex …

    Entity::startElement(...){ create child ContextHandlerand call 'startElement' on it;

    then put it into stack; }

    Entity::characters(...){ retrieve contextHandler and

    call 'characters' on it; }

    Entity::endElement(...){ call 'endElement' onContextHandler and

    remove it from stack; }

    MainThread

    xml::sax::XFastContextHandlerInterface implemented by the callerof FastSaxParser::parseStream()

    where actual processingof tokenized elements happens.

    ParserThread

  • ParserThread

    XML_Parse(...)

    FastSaxParserImpl::parse()

    FastSaxParserImpl::callbackEndElement(...)

    { pop from namespace stackand create 'endElement' event; }

    FastSaxParserImpl::callbackStartElement(...)

    { create 'startElement' event -contains list of attributeswith computed tokens;

    push current namespaceinto stack; }

    FastSaxParserImpl::callbackCharacters(...)

    { create 'characters' event -contains string

    inside an element; }

    FastSaxParserImpl::produce(...)

    { if limit is hit, push producedevents into maPendingEvents; }

    maPendingEventsprotected by mutex

    xml::sax::XFastTokenHandlerInterface implemented by the callerof FastSaxParser::parseStream()used to compute element tokens.

  • Code improvements

    ● We try hard to avoid any (de)allocations● FastSaxParserImpl::consume(EventList *pList) just

    processes the list of events without freeing any memory. pList is then pushed into mutex protected buffer and reused in FastSaxParserImpl::callbackFoo(...) functions.

    ● commit “don't allocate uno::Sequences when we don't need to.”– Makes small strings reuse one sequence buffer

    ● commit “FastAttributeList: avoid OStrings in attribute list; just use char buffer”– something like vector wrt. allocations– easy to get attribute strings

  • Code improvements

    ● cache values● commit “fastparser: cache default namespace

    token for ooxml.”● commit “oox: special-case single-character a-z

    token mapping.”– 50% of OOXML tokens are primarily lower-case

    character, a-z● commit “fastparser: avoid std::stack::top() -

    cache it's results.”– amazingly std::stack::top() takes 146 pseudo-cycles to

    do not much, so instead cache the result in a single pointer in lieu of burning that code.

  • Does it work ? with GPU enabled

    dates-worked.xlsx

    groundwater-daily.xlsm

    mandy-no-macro.xlsx

    mandy.xlsm

    matrix-inverse.xlsx

    stock-history.xlsm

    sumifs-testsheet.xlsx

    numbers-100k.xlsx

    numbers-formula-100k.xlsx

    numbers-formula-8-sheets-100k.xlsx

    num-formula-2-sheets-1m.xlsx

    0.1 1 10 100

    Wall-clock time to load set of large XLSX spreadsheets: 8 thread Intel machine

    Calc 4.1.3CalcReference

    Log Time / seconds

    Apologies for another log scale: Average 5X vs. 4.1.3

  • 0 1 2 3 4 5 6

    6.2X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

    dates-worked.xlsx

    ● 1 sheet with half plain data and half formulas

  • groundwater-daily.xlsm

    0 1 2 3 4 5 6 7 8 9 10

    5.3X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

    ● 4 sheets with both data and formulas

  • mandy.xlsm

    0 2 4 6 8 10 12 14 16 18 20

    8X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

    ● 4 sheets of formulas

  • matrix-inverse.xlsx

    ● Mostly just 1 sheet with numbers

    0 2 4 6 8 10 12

    4.2X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

  • stock-history.xlsm

    ● Mostly 2 sheets with both data and formulas

    0 2 4 6 8 10 12 14 16 18

    3.8X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

  • sumifs-testsheet.xlsx

    ● 1 sheet with a lot of numbers + 1 sheet with some formulas and diagrams

    0 1 2 3 4 5 6

    3.7X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

  • numbers-100k.xlsx

    ● 1 sheet with 100k numbers

    0 1 2 3 4 5 6

    4.4X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

  • numbers-formula-100k.xlsx

    ● 1 sheet with 100k numbers and 100k formulas

    0 1 2 3 4 5 6 7 8

    4.6X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

  • numbers-formula-8-sheets-100k.xlsx

    ● 8 sheets with 100k numbers and 100k formulas

    0 5 10 15 20 25 30 35 40 45

    10.2X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

  • num-formula-2-sheets-1m.xlsx

    ● 2 sheets with 1m numbers and 1m formulas each

    0 10 20 30 40 50 60 70 80 90

    5.4X vs. 4.1.3

    Calc 4.1.3CalcReference

    seconds

  • Quick demo & questions on Calc / threaded

    loading bits ?

  • Threaded XML Parsing

    ● LibreOffice is innovating:● Threaded parsing is just one example

    – a new, elegant, efficient means to parse XML with SAX

    ● Plenty more to do● Next steps are porting more code to use XFastParser● AutoCorrection: huge dictionaries & slow parsing● ODF filters● XFastSerializer needs love etc.

    ● Thanks for all of your help and testing !● Questions ?

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22