doceng2013 bilauca healy - splitting wide tables optimally

23
Splitting Wide Tables Optimally Mihai Bilauca Patrick Healy DocEng2013, September 10– 13, 2013, Florence, Italy Department of Computer Science and Information Systems University of Limerick, Ireland Supported by Science Foundation Ireland under the research programme 01/P1.2/C009, Mathematical Foundations, Practical Notations, and Tools for Reliable Flexible Software.

Upload: mbilauca

Post on 25-Jun-2015

73 views

Category:

Technology


0 download

DESCRIPTION

In this presentation we discuss the problems that occur when splitting wide tables across multiple pages. We focus our attention on finding solutions that minimize the impact on the meaning of data when the objective is to reorder the columns such that the number of pages used is minimal. Reordering of columns in a table raises a number of complex optimization problems that we will study in this paper: minimizing page count and at the same time the number of column positions changes or the number of column groups split across pages. We show that by using integer programming solutions the number of pages used when splitting wide tables can be reduced by up to 25% and it can be achieved in short computational time. http://doi.acm.org/10.1145/2494266.2494317

TRANSCRIPT

Page 1: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Splitting Wide Tables Optimally

Mihai Bilauca Patrick Healy

DocEng2013, September 10– 13, 2013, Florence, Italy

Department of Computer Science and Information Systems University of Limerick, Ireland

Supported by Science Foundation Ireland under the research programme 01/P1.2/C009, Mathematical Foundations, Practical Notations, and Tools for Reliable Flexible Software.

Page 2: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Splitting Wide Tables Optimally

Splitting Wide Tables Optimally

Why this paper?

• Tables are widely used for presenting logical

relationships between data items;

• Widely spread WYSIWYG tools have poor support for

wide tables;

• Authoring tables is hard, time consuming and error

prone;

• Style manuals recommendations are not always

supported

• Very little research in this area

Slide 2 of 23

Page 3: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Splitting Wide Tables Optimally

A wide table split across multiple pages

Slide 3 of 23

Page 4: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Splitting Wide Tables Optimally

Grouping of data items increases readability

+ Zoom in

Slide 4 of 23

Page 5: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Splitting Wide Tables Optimally

Splitting Wide Tables Optimally

Style recommendations from Chicago Manual of Style

“For a two-page broadside table – which should be presented on facing pages if at all possible – column heads need not be repeated; for broadside tables that run beyond two pages, column heads are repeated only on each new verso.

Where column heads are repeated, the table number and “continued” should also appear.

For any table that is likely to run to more than one page, the editor should specify whether continued lines and repeated column heads will be needed and where footnotes should appear (usually at the end of the table as a whole).”

Slide 5 of 23

Page 6: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Splitting Wide Tables Optimally

Splitting Wide Tables Optimally

Overview

We present MIP Solutions using OPL for 3 problems that occur

when splitting wide tables with the aim to minimize the effect

on the meaning of data:1. Minimize Page Count2. Minimize Page Count and Column Positioning

Changes 3. Minimize Page Count and Group Splitting

Report experimental results with IBM CPLEX 12.3

Conclusions

MIP – Mixed Integer Programming

OPL – Optimization Programming Language

Slide 6 of 23

Page 7: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

1.Minimum Page Count

Splitting Wide Tables Optimally Slide 7 of 23

Page 8: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

1.Minimum Page Count – OPL Model

dvar int+ pageSel[Pages] in 0..1; dvar int+ X[Pages][Cols] in 0..1;

dexpr int pageCount = sum(p in Pages) pageSel[p];

minimize pageCount;

subject to{ ct1: // select only one page for each column forall(j in Cols) sum(p in Pages) X[p][j] == 1;

ct2: // only columns that fit in the page forall(p in Pages) sum(j in Cols) colW[j] / pageW ∗ X[p][j] <= pageSel[p];

}Splitting Wide Tables Optimally Slide 8 of 23

Page 9: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

1.Minimum Page Count - Results

Building Table Formatting Tools

● Page count can be reduced by 14% to 25%

● The difficulty of the problem is not directly linked to the

problem size but to the data itself

Columns 10 20 30 40 50 60

PC 7 16 19 29 34 48

OPC 6 12 15 23 26 39

%Imp 14.28% 25.00% 21.05% 20.68% 23.52% 18.75%

Time 2.25 0.13 0.17 1.18 04.30 1.52

Slide 9 of 23

Page 10: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

2.Minimum Page Count & Column Positioning Changes

Splitting Wide Tables Optimally Slide 10 of 23

Page 11: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

2.Minimum Page Count & Column Positioning Changes

PageW: 490 points colW : [210, 140, 210, 420, 280, 350, 70, 140, 140, 350]7 pages : {210,140} {210} {420} {280} {350,70} {140,140}

{350}

Minimum 5 pages: ColIdx : [1, 7, 8, 5, 2, 9, 6, 10, 3, 4]Pages: {210,280} {140,350} {420,70} {140,210} {350,140}

Minimum 5 pages and column position changes possDiffcolIdx : [1, 2, 3, 5, 4, 7, 6, 8, 9, 10]Pages : {210,140} {210,280} {420,70} {350,140} {140,350}

Splitting Wide Tables Optimally Slide 11 of 23

Page 12: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

2.Minimum Page Count & Column Positioning Changes

Splitting Wide Tables Optimally

dvar int+ pageSel[Pages] in 0..1; dvar int+ pageIdx[Cols] in 0..1;dvar int+ colIdx[Cols] in 0..1;

// check if j1 is placed on a page before j2dexpr int posO[j1,j2 in Cols] = j1 <= j2−1;

dexpr int posN[j1,j2 in Cols] = (colIdx[j1]<=colIdx[j2]−1)

dexpr float posDiff = sum(j1,j2 in Cols : j2 < j1)abs(posO[j1,j2] − posN[j1,j2]);

dexpr int pageCount = sum(p in Pages) pageSel[p];

// a, b, obj1Val variables are used for OPL flow controlminimize a * pageCount + b * posDiff;

Slide 12 of 23

Page 13: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

2.Minimum Page Count & Column Positioning Changes

Splitting Wide Tables Optimally

subject to {ct1: // do not exceed page width forall(p in Pages) sum(j in Cols) colW[j]/(p==pageIdx[j]) / pageW <= pageSel[p]; ct2: // page and column indexes relationship forall(ordered j1,j2 in Cols) (pageIdx[j1]<=pageIdx[j2]-1) - (colIdx[j1]<=colIdx[j2]-1) == 0;ct3: // unique column index values forall(ordered j1,j2 in Cols) colIdx[j1]!=colIdx[j2];// if the minimum page count obj1Val is set// maintain this value for subsequent searchesct4: if (obj1Val >= 0 ) pageCount == obj1Val;}

Slide 13 of 23

Page 14: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

2.Minimum Page Count & Column Positioning Changes

Building Table Formatting Tools

Results

● Promising performance:

– 2.25s for minimizing a 10 column table with posDiff 33 down to 4, page count from 9 down to 8;

– 89s for minimizing a 20 column table with posDiff 194 down to 4, page count from 13 down to 11;

● Computational time increases with columns number ● The data instance can have no better solutions

Slide 14 of 23

Page 15: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

3.Minimum Page Count & Group Splitting

Splitting Wide Tables Optimally Slide 15 of 23

Page 16: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

3.Minimum Page Count & Group Splitting

User specifies which columns should preferably be kept together

PageW: 490 points colW : [210, 140, 210, 420, 280, 350, 70, 140, 140, 350]7 pages: {210,140} {210} {420} {280} {350,70} {140,140}

{350}

Minimum 5 pages: ColIdx:[3, 5, 4, 7, 10, 6, 8, 1, 2, 9]Pages: {210,280} {420} {70,350} {350,140} {210,140,140}

Group columns 2,3 and 7:colIdx:[2, 3, 7, 4, 9, 10, 6, 8, 1, 5]Pages :{140,210,70} {420} {140,350} {350,140} {210,280}

Splitting Wide Tables Optimally Slide 16 of 23

Page 17: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

3.Minimum Page Count & Group Splitting

Splitting Wide Tables Optimally

int colG[Cols] = ...;// column groupsdvar int+ pageSel[Pages] in 0..1; dvar int+ pageIdx[Cols] in 0..1;

// find the first column of the groupint gFirstCol[g in groups] =

first({j | j in Cols : colG[j] == g});

// counts how many columns of a group are on a// different page than the first group’s columndexpr int gSplit[g in groups ] =

sum(j in Cols : colG[j] == g )(pageIdx[j] != pageIdx[gFirstCol[g]]);

dexpr int gSplitCount = sum(g in groups)(gSplit[g] >= 1 );

dexpr int pageCount = sum(p in Pages) pageSel[p];

Slide 17 of 23

Page 18: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

3.Minimum Page Count & Group Splitting

Splitting Wide Tables Optimally

// a, b, obj1Val variables are used for OPL flow controlminimize a * pageCount + b * posDiff;

subject to {ct1: // do not exceed page width forall(p in Pages) sum(j in Cols) colW[j] * (p==pageIdx[j])/ pageW <= pageSel[p];

// if the minimum page count obj1Val is set// maintain this value for subsequent searchesct2: if (obj1Val >= 0 ) pageCount == obj1Val;}

Slide 18 of 23

Page 19: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

3.Minimum Page Count & Group Splitting Model

Building Table Formatting Tools

Results

● Promising performance:● 1m for a 20 column table with 3 groups, none

split, page count from 12 down to 9;● 2m for 30-40 column tables but time increased

up to 12m when the number of groups increased;

● Computational time increases with columns and

groups number

● Some relaxed solutions can be prefferedSlide 19 of 23

Page 20: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Conclusions

Splitting Wide Tables Optimally Slide 20 of 23

Page 21: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Conclusions

• Optimal arrangement of columns such that the page count is minimized when splitting wide tables can be achieved in relatively short running time; for tables with 60 columns a solution has been found in less than 2s;

• If additional criteria are added, for example minimizing the number of relative column positions changes,the problems become harder as the number of columns increase;

• the difficulty of the problems not only depends on the problem size but on the complexity of the data;

Splitting Wide Tables Optimally Slide 21 of 23

Page 22: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Ongoing work

Minimizing the overall page count when a large table containing text is displayed on fixed size pages and neither column widths nor row heights are known in advance.

Splitting Wide Tables Optimally Slide 22 of 23

Page 23: DocEng2013 Bilauca Healy - Splitting Wide Tables Optimally

Thank you!

www.tabularlayout.org

Splitting Wide Tables Optimally Slide 23 of 23