contribution from dr data team
DESCRIPTION
The Data Journalism Award 2014 Data Journalism Portfolio (team/newsroom)TRANSCRIPT
DR INVESTIGATIVE DATA TEAMContribution from DR DATA TEAM to the Data Journalism Award 2014 / Data journalism portfolio (team/newsroom)
In this portfolio you will find a selection of the stories and illustrations we have published since we first went “on air” November 1st 2013.
For a full overview of our production
Please visit this webpage: http://www.pinterest.com/katrinefrich/drs-unders%C3%B8gende-databaseredaktion/
All articles are in Danish.
The DR Investigative Datateam was launched October 1st 2013. We are one editor, two jour-nalists, one graphics designer and one programmer. We do all parts of Data Journalism our self. From scraping data of the web and using freedom of information act to dig out data and docu-ments from public administration to selecting, sorting, refining, filtering and analyzing data – to the final visual and editorial presentation.
We want data to have relevance to our readers. If we use lots of resources on interactive graphics, we want our readers to find something useful when they click on the devise.We only do the story if
we find news. No news, we reject the data. PortfolioWe live by two mottos when we select data for stories.
1 2
Katrine Birkedal Frich
Editor
Mads Rafte Hein
Graphics designer
Kresten Morten Munksgaard
Datajournalist
Bo Elkjær (Skipper)
Datajournalist
Jens Lykke Brandt
Programmer
Tax-authorities misevaluated property – with benefits for rich people and disadvantage for poor people
One of our very first stories was based on complicated calculations gone wrong in the department of
Tax in Denmark. For a while lots of house-owners in Denmark were complaining that the valuations
made by tax-authorities (which was base for the property-tax paid by house owners) was out of touch
with the market valuations. When houses were sold the price of the house was far from the valuation
made by Tax-authorities. The consequences were that house-owners paid too much (or too little) prop-
erty tax.
This information had been well reported by Danish media. But the Database Team was wondering: Did
this misjudgment from authorities affect house owners equally? We decided to investigate that ques-
tion by filing a freedom of information act to the National Audit Office which was investigating the
scandal in Tax. We got the datafile with 12.000 rows with information about each of the houses sold in
second half of 2011.
With this data we analyzed the price of the houses by sorting them in to 5 categories (from low price
to high price). Within each of these categories we were able to analyze how many houses were sold
below or above the price estimated by Tax. Our analyz’s showed that most of the lower priced houses
(where people with low income live) were valuated far too high – leading to the potential consequence
that these people are paying to much tax, while houses that sold high on the market where valuated
too low by Tax – leading to the potential consequences that rich people paid too little property tax. An
inverted “Robin Hood” as one of our sources called it.
Besides telling this story in words, we told it in an interactive graphics where our readers were able to
select their own local municipality and see how Tax valuated houses in their own area.
http://www.dr.dk/Nyheder/Politik/2013/10/16/202745.htm
http://www.dr.dk/Nyheder/Politik/2013/10/16/212744.htm
http://www.dr.dk/Nyheder/Politik/2013/10/16/213712.htm
http://www.dr.dk/Nyheder/Politik/2013/10/16/204230.htm
What data did we analyze?
1. Number of citizens split to municipalities
2. 12.000 rows with data of houses sold and valuated by tax
Data in Excel Rows Columns Total number of cells
Raw data from database 16.249 14 227.486
Selected and refined 12.202 14 170.828
National average Municipality average (examplified by Brøndby)
Published Oktober 2013 Time spent on this story: 6 daysReaders since publishing: 61.166
All of these data were sorted, refined, combined, calculated and analyzed by the team.
How many leaders does your municipality need? Why does one municipality need eight bosses to lead 100 employees, when another municipality can
do it with four? That’s the question raised by our investigation in heads of public administration.
We chose to look in to the amount of leaders in the local administration of municipalities. In a time
where money are sparse and all public good is subject to reduction, we found it interesting to investi-
gative through data if the public leaders hald them self as accountable to reductions as they hold the
employees at public schools, public daycare etc.
We therefore chose different sets of data to compare the municipalities and made an interactive device
(for mobile as well as web) that made it easy for the readers to choose their own local municipality and
study not only the amount of leaders, but also to compare it with service level, salary of the leader and
amount of leaders compared to citizens and compared to employees (please go to the web to see the
full extent of the interactive graphics).
What data did we analyze?
1. Number of citizens split by municipalities
2. Number of people employed in municipalities
3. Level of service in the municipality (a marker calculated by Government appointed
Commission)
4. Number of leaders
All of these data were sorted, refined, combined, calculated and analyzed by the team.
Data in Excel Rows Columns Total number of cells
Raw data from database 556 98 54.448
Selected and refined 99 98 9.702
Links to article:
http://www.dr.dk/Nyheder/Politik/KV13/Artikler/Hele_landet/2013/11/06/200141.htm
http://www.dr.dk/Nyheder/Politik/KV13/Artikler/Hele_landet/2013/11/06/195829.htm
http://www.dr.dk/Nyheder/Politik/KV13/Artikler/Hele_landet/2013/11/06/175643.htm
http://www.dr.dk/Nyheder/Politik/KV13/Artikler/Hele_landet/2013/11/06/175643.htm
Published November 2013 Time spent on this story: 5 daysReaders since publishing: 79.790
The soil is toxic
One of our first projects was to map and illustrate poisoned soil in Denmark. To the citizens of Denmark
it is not news that some of our soil is contaminated with waste and chemicals from – among other sourc-
es – old industry, dry cleaning and gas stations. But few of us really know the extent of the pollution.
And few know the exact places of poisoned soil in the landscape.
To the Data Team the challenge was to show the readers exactly where the poisoned soil is situated. Is
it near your house? Is it at the playground at the kindergarten? Or the forest where you walk your dog
every day? That was a few of the questions we wanted to answer not only by writing stories, but also
by showing it in a map, where the readers were able to click and study the polluted areas of the soil in
Denmark.
Even though a huge part of Denmark is classified as toxic, the subject has never been given much
awareness in media or politics. The public spending on cleaning of polluted soil is as a result very small
(55 million euro). The responsible administration has as a matter of fact claimed that with the given
budget it would take more than 50 years just to clean the soil which at the moment is considered nec-
essary to clean in order to keep water in the ground drinkable and people living near toxic areas from
getting ill.
What did we do?
We decided to create a map of the poisoned soil. But instead of overloading our readers with all infor-
mation at once, we dripped different layers of information in the map with days delay. We decided to
run three different layers of the map with a pile of different stories to accompany the map.
First map:
The first iteration of the map included all registered areas that are classified by the authorities as either
contaminated or ‘likely contaminated’. The areas are shared to the public in the form of shapefiles; a
mostly open file-format for storing geo-information.
The files were converted to KML, another file-format used by Google products, and imported into both
Published November 2013 Time spent on this story: 20 daysReaders since publishing: 258.878
a database for further data-analysis and into Google Fusion Tables for visualization. We had to write our
own program for the import into the database.
In the database we could do queries with other areas and points like positions of schools, daycare-cen-
ters etcetera.
In Google Fusion Tables we merged the areas with additional information for each area: We had gotten
extended information on the contaminated areas by the use of several requests to the authorities using
the Freedom of Information law.
Finally we added all the information in an interactive google-map, where users could zoom, pan and
click on areas to get the extra information. We made a big effort to make the map work on mobile de-
vices and altered several UI-elements to accomplish this.
The first map showed the data of 29.000 areas in Denmark, where the soil is polluted or under suspi-
cion of being polluted.
Second map:
Second iteration was focused on the most contaminated and expensive areas. Data was gathered from
multiple sources and enriched by several more requests to the authorities.
The extra data was put on the map as icons using the Google Map API. To find the center of each area
we had to construct a query in the database that could give us the exact point.
Each of these icons reveals detailed information of plans and costs in the past and future.
Third map:
The last iteration was adding Natura 2000 areas. Natura 2000 is a collection of several special nature
types that are designated as needing special conservation and protection.
A shapefile from the European Environment Agency holding all Natura 2000 areas in Europe was con-
verted to KML and parsed by another program we created to hold only the Danish areas: this shrank the
KML file from 1.800MB to 6MB. These were imported into both database and Google Fusion Tables.
Google Fusion Tables could then display the Natura 2000 areas on our map, but we wanted to do
more: In the database we constructed a query that returned all contaminated and “likely contaminat-
ed” areas that overlapped the Natura 2000 areas.
Expensive poison grounds Nature plots V1 & V2 Google map
We marked the 1.309 overlapping areas with an icon on the map for users to easily see the scope of the problem.
Links to articles:
http://www.dr.dk/nyheder/tema/jordforurening/forside.htm
The soil is toxic
Gambling without winning is not a puzzle – it’s a train ride
For once we broke with our motto that says if there is no news, we drop the data. In autumn 2013 the
Danish Broadcasting Corporation had a theme about gambling and lotto. We decided to make a small
story to the web that visualizes exactly how small the chance of winning lotto in Denmark really is.
Our graphics designer, Mads Rafte Hein, is also an artist. We used both of his excellent skills to draw
a beautiful animated cartoon for the web. The story he drew was about the chance of winning in lotto
being just as small as your chance of hitting a bucket with a coin standing along the trail while you pass
it in speed and trying to aim from a window of the train. The story was based on calculation from two
mathematic experts – and that was the data of the story ☺
Published November 2013 Time spent on this story: 20 daysReaders since publishing: 23.665
http://www.dr.dk/Nyheder/Indland/2013/11/27/145110.htm
Corporate tax
The project was to map and illustrate the corporate tax paid by Danish companies in 2012. The project
was based on data released by the Danish tax authority SKAT. It is only the second time since 2012 that
SKAT has released data in full on corporate taxes.
The Data Team was met with several challenges in collecting, analyzing and presenting the data. Some
250.000 companies are listed as taxable. 57.000 companies actually paid tax. One percent of these
companies paid more than two thirds of the total corporate tax paid to the Danish exchequer. Out of
this percent only seven companies paid one third of the total corporate tax.
Finally, the corporate tax paid form 5.6 percent of the total tax paid to the Danish exchequer by all
taxpayers.
What did we do?
We decided to create a slideshow describing the corporate tax paid in 2012. This main story was to be
accompanied with other stories, describing the largest taxpayers, the companies that lost money in
2012, and the distribution of corporate tax to the Danish municipalities. Data was released from SKAT
the 5th of December 2012 and the stories were to be published on December 27 and 28.
Collection of data
The primary obstacle was the collection of data. Even though SKAT released data on all Danish compa-
nies’ tax payment in 2012, due to political reasons the release was severely amputated. Because of the
way the data was released you could only get access to information on the companies one at a time.
SKAT opened access to a database, where you could search the companies by name or by registration
number. Doing this you could get access to a page showing you the company name, the registration
number, type of corporation, applicable tax code, and corporate tax for the company, taxable income
and deductible deficit. Furthermore, if applicable, the page would contain information on taxed in-
come of oil extraction for the companies operating in the North Sea and also when applicable informa-
tion on companies under joint taxation.
Published December 2013 Time spent on this story: 20 daysReaders since publishing: 279.878
Since the data was released in the way it was, we needed to set up an automatic scraper that would
search SKAT’s database by company number and copy off the data one company at a time.
To do this we downloaded a full list of all company registration numbers from the Danish company reg-
istry cvr.dk. This list was fed into a program that was coded by the Data Teams programmer. The code
was set up to collect between 20 and 30 individual company records per second from SKAT’s data-
base. It took a few days to collect tax records on 243.000 taxable companies.
Since data on each company was very sparse we needed to combine the information collected from
SKAT with data from the Danish company registry cvr.dk in order to get information on addresses and
accompanying municipalities on each company.
Both the spreadsheet with the data from SKAT and the full list of company records from cvr.dk were per
se too large to handle in Microsoft Excel. So in order to combine the data we imported the data from
the scraper into OpenRefine and combined it with the full list of company records from cvr.dk.
After combining and cleaning up the data we were able to export it in files that could be imported into
Microsoft Excel and analyzed here.
Analyzing the data
According to the Danish tax code the municipalities each get 13,41 percent in proceeds of the paid
corporate tax. But parts of the proceeds are divided between the municipalities after a set of distribu-
tion keys dependent on distribution of employees, daughter companies, etc. Also the proceeds are
distributed in a three year delay – so that the proceeds each municipality receives in 2012 were actually
paid in tax in 2009. In total this means that even though we had data on locations in municipalities of
companies we were unable to directly compare the municipalities by proceeds based on the 2012 in-
formation from SKAT.
In order to get accurate data on municipality tax revenue we collected data on the actual reported reve-
nue from Statistics Denmark. These figures were divided up by public records on municipality population.
Presenting the data
The first batch of stories published December the 27th, was centered on the municipality revenues from
corporate tax. The main story was carried by a map showing the revenues per citizen, nationwide. Thus,
we could show the rich and the poor municipalities based on corporate tax revenue, showing the very
large differences nationwide in revenue. http://www.dr.dk/Nyheder/Penge/2013/12/23/195455.htm
We also published stories with lists over the wealthiest and poorest municipalities and accompanying
interviews, also describing how the wealthiest municipalities lost revenue in the national redistribution
which takes place each year according to the distribution keys briefly mentioned above.
The second batch of stories published December the 28th, was centered on the interactive graphics
we developed to show the distribution of corporate tax payments.
http://www.dr.dk/Nyheder/Penge/2013/12/27/151440.htm
Again, the main story was accompanied with other articles catching up on different aspects of the issue.
Company directors were interviewed and experts were interviewed who nuanced the information pre-
sented and put it in a national financial context.
Corporate tax
A motion story about tax
The presentation utilizes the D3 framework for illustration and animation. The large data-set were
compiled into a dense format for transport to the client, where the information would be extracted
again in order to fit into the animation-model developed for this presentation.
The animation itself consists of roughly 1000 tiny boxes that are animated using randomized values for
delay and duration - and thus each session sports an unique animation. http://www.dr.dk/Nyheder/
Penge/2013/12/27/151440.htm
Heritage-calculator CO2 Calculator
When the Danish Broadcasting Corporation (DR) launched their TV Drama “The Heirs” which aired on a Sun-
day evening with more than 1.723.000 viewers (which are one third of the total population in Denmark) the
department of news followed up with a set of stories about the difficulty a heritage can cause in a family.
When you travel by airplane it has a price in CO2. While companies try to offer a ticket to green conscience
giving the costumers a choice to pay a fee for the flight, almost nobody does. But the travel across the world
is still heavy on CO2. Our desk tried to show the “price” for the environment by creating a calculator that il-
lustrates how much CO2 home appliances would be able to pollute before it is comparable to a given flight.
Published January 2014Time spent on this story: 2 daysReaders since publishing: 64.432
Published February 2014Time spent on this story: 5 days Readers since publishing: 22.000
Links to article:
http://www.dr.dk/Nyheder/Indland/2014/01/31/170158.htm
http://www.dr.dk/Nyheder/Indland/2014/01/31/164547.htm
http://www.dr.dk/Nyheder/Indland/2014/01/31/163233.htm
http://www.dr.dk/Nyheder/Indland/2014/01/31/161618.htm
See stories here: http://www.dr.dk/Nyheder/Tema/arv/forside.htm The Database Team contributed
to the theme by creating a calculator where people could find out by themselves the amount of money
they are entitled to.
EU citizens and the social benefits
In Denmark politicians have for a while fought about whether or not people from other EU-member
states who live and work in Denmark should have same access to Danish Social Benefits (such as unem-
ployment-pay, child benefits etc. which are for free if you pay your tax and live in Denmark). The debate
rely on the premise that people from member states such as Poland, Rumania and Lithuania come to
Denmark to exploit the social benefits rather than to live and work as “the rest of us”. So the Database
Team decided to investigate whether or not people from other EU states, living in Denmark, are exploit-
ing the welfare system or not.
We started by asking the tax authorities to provide data about how many citizens from EU states who
are receiving child benefits (an amount of approximately 170 euro each month per child). The result
was that yes, more people are receiving this benefit, – but the total amount of money that the Danish
State spent on this is less than one percent of the amount spent on child benefit on total to all parents
living in Denmark.
http://www.dr.dk/Nyheder/Politik/2014/02/27/095702.htm
But the debate didn’t stop by that acknowledgement. It went on to concern about whether or not some
people are exploding the social benefits such as unemployment payment and pensions etc. At the Da-
tabase Team we again asked the data if there should be any facts to support or dismis the accusations.
We got data from Danish Statistic which had to make a special request for us. After we got the data, we
sorted it and created an interactive map of European countries and made it possible for our readers to
click and see the share of citizens from different states use of social benefits in Denmark. The result of
our analyze: No other state have citizens living in Denmark exploding social benefits.
http://www.dr.dk/Nyheder/Indland/2014/03/05/170709.htm
http://www.dr.dk/Nyheder/Indland/2014/03/05/170709.htm
Published March 2014Time spent on this story: 3 daysReaders since publishing: 50.261