[ieee 2013 joint international conference on rural information & communication technology and...

4
2013 Joint International Conference on Rural Information & Communication Technology and Electric-Vehicle Technology (rICT & ICeV-T) November 26-28, 2013, Bandung-Bali, Indonesia 978-1-4799-3365-5/13/$31.00 ©2013 IEEE Targeted Advertising Optimization using Vector Space Model for Online Behavior on News Portal Computational Advertising Case Study : harianjogja.com Samuel Andi Kristyan 1 , Suhardi 2 , Albarda 3 , Budiman Dabarsyah 4 School of Electrical Engineering and Informatics Bandung Institute of Technology (ITB) [email protected] 1 , [email protected] 2 , [email protected] 3 , [email protected] 4 Abstract Targeted advertising is the placement of advertising in order to reach much attraction as many users. The raise of targeted advertising can be gain by using an optimization on demography, geography, and online behavior. While online advertising behavior is that gather information about the online activities of web users (such as visited web user) and use it to display relevant ads or content to the web user habits. Everyday, thousands of users online treated by many kinds of advertisement, but not all of them were eye-catching. This phenomenon is playing an important role related to the interest of web users who will click on the advertisement. The development of computational advertising in Indonesia grew fastly, especially in news portal online. However, there were still many problems such as the sameness display ads on all topic in some news topic and the irrelevance between the advertisement with the news topic (for example: the stationary advertisement on the sport news). Consequently, the advertisement can not attract the attention of web users because it does not fit with the habits and intentions of visitors. So, it is necessary to optimize the system of advertisement showing in order to attract more web users who will visit the advertisement. This research will focus on optimizing behavior targeted online advertising that aims at adjusting display ads with habits of web users. Keyword - Advertising, Targeted Advertising, Online Behavior, Computational Advertising, Information Retrieval I. BACKGROUND Computational advertising is the new sub-discipline which has the cross section in the big scale, text analysis, information retrieval, statistical modeling, machine learning, classification, optimization, and microeconomics. The main problem of computational advertising is finding the compatibility between the users and the context of the advertisement. The context of the advertisement can be found in search engine (sponsored search), web page (content mismatch and display ads), video, portable device (mobile ads), etc. The information can be differ from one user to others from detail to none. The number of potential advertisements are million. So, the rank is depend on the compatibility between the context and the result of observing. This solution is expected to be the basic information to earn $30 million to the advertisement industry. Nowadays, advertising by internet media is become one of popular important media besides the television, newspaper, and advertisement board. In Indonesia, there are many news portal, but most of them are only adapted from other advertisement which shows the same content. It caused the less income to the news portal since the limitation of the capacity to the advertisement. The problem is also become the problem to the advertisement maker who can not obtain many users. In consequent, the product selling is not optimal. Harianjogja.com is one of news portal online centered in Jogja. It provides the local news, national and international issues. Harianjogja.com offering the advertisement service and using display advertising to display the advertisement. It can be seen from the display of the banner in news portal hanjogja.com. With the good management of potential advertisement in harianjogja.com, the industry and the advertisement maker are not only get the optimal income but also increase the users number. The management includes data management advertisement, users data management (habit, web user location, IP address, etc). The data management can increase the total number of the advertisement users. So it can be increase the number of visitors on the ads attached. The solution designed for this observation is the optimazion of targeted advertising in computational advertising news portal. Targeted advertising is the advertising placement to in order to attract many users. The increase of targeted advertising itself can be done by the optimization on demography, geography, and online behavior. To optimize the online behavior, the writer divided user into some categories based on the news category / article. First, the users is divided into some categories based on the topic such as tourism, sport, politic, economic, technology, etc. based on the user browsing history. Then, it is using the algorithm vector space to show the advertisement which relevant with the web page. In this observation, the writer discuss the topic about the optimization of targeted advertising which is based on the topic segment and online behavior on

Upload: budiman

Post on 12-Jan-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

2013 Joint International Conference on Rural Information & Communication Technology and Electric-Vehicle Technology (rICT & ICeV-T)

November 26-28, 2013, Bandung-Bali, Indonesia

978-1-4799-3365-5/13/$31.00 ©2013 IEEE

Targeted Advertising Optimization using Vector Space Model for Online Behavior on News Portal

Computational Advertising Case Study : harianjogja.com

Samuel Andi Kristyan1, Suhardi2, Albarda3, Budiman Dabarsyah4

School of Electrical Engineering and Informatics Bandung Institute of Technology (ITB) [email protected], [email protected], [email protected], [email protected]

Abstract – Targeted advertising is the placement of advertising in order to reach much attraction as many users. The raise of targeted advertising can be gain by using an optimization on demography, geography, and online behavior. While online advertising behavior is that gather information about the online activities of web users (such as visited web user) and use it to display relevant ads or content to the web user habits. Everyday, thousands of users online treated by many kinds of advertisement, but not all of them were eye-catching. This phenomenon is playing an important role related to the interest of web users who will click on the advertisement. The development of computational advertising in Indonesia grew fastly, especially in news portal online. However, there were still many problems such as the sameness display ads on all topic in some news topic and the irrelevance between the advertisement with the news topic (for example: the stationary advertisement on the sport news). Consequently, the advertisement can not attract the attention of web users because it does not fit with the habits and intentions of visitors. So, it is necessary to optimize the system of advertisement showing in order to attract more web users who will visit the advertisement. This research will focus on optimizing behavior targeted online advertising that aims at adjusting display ads with habits of web users. Keyword - Advertising, Targeted Advertising, Online Behavior,

Computational Advertising, Information Retrieval

I. BACKGROUND Computational advertising is the new sub-discipline

which has the cross section in the big scale, text analysis, information retrieval, statistical modeling, machine learning, classification, optimization, and microeconomics. The main problem of computational advertising is finding the compatibility between the users and the context of the advertisement. The context of the advertisement can be found in search engine (sponsored search), web page (content mismatch and display ads), video, portable device (mobile ads), etc. The information can be differ from one user to others from detail to none. The number of potential advertisements are million. So, the rank is depend on the compatibility between the context and the result of observing.

This solution is expected to be the basic information to earn $30 million to the advertisement industry. Nowadays, advertising by internet media is become one of popular important media besides the television, newspaper, and advertisement board.

In Indonesia, there are many news portal, but most of them are only adapted from other advertisement which shows the same content. It caused the less income to the news portal since the limitation of the capacity to the advertisement. The problem is also become the problem to the advertisement maker who can not obtain many users. In consequent, the product selling is not optimal. Harianjogja.com is one of news portal online centered in Jogja. It provides the local news, national and international issues.

Harianjogja.com offering the advertisement service and using display advertising to display the advertisement. It can be seen from the display of the banner in news portal hanjogja.com. With the good management of potential advertisement in harianjogja.com, the industry and the advertisement maker are not only get the optimal income but also increase the users number. The management includes data management advertisement, users data management (habit, web user location, IP address, etc). The data management can increase the total number of the advertisement users. So it can be increase the number of visitors on the ads attached. The solution designed for this observation is the optimazion of targeted advertising in computational advertising news portal. Targeted advertising is the advertising placement to in order to attract many users. The increase of targeted advertising itself can be done by the optimization on demography, geography, and online behavior.

To optimize the online behavior, the writer divided user into some categories based on the news category / article. First, the users is divided into some categories based on the topic such as tourism, sport, politic, economic, technology, etc. based on the user browsing history. Then, it is using the algorithm vector space to show the advertisement which relevant with the web page. In this observation, the writer discuss the topic about the optimization of targeted advertising which is based on the topic segment and online behavior on

the computational advertising, such as the proper placement between the advertisement and the users based on their habitual.

II. METHODS Targeted Advertising is the advertising placement in order

to attract users with the certain characteristic such as demography, geography, and behavior variable (history visit). Most of the targeted advertising using the other media in the target oriented. The way like tracking the activities of web online or the mobile which relate the demography, history visit of the users who acces the new web page. On the process, the advertisement is hoped to attract users efficiently, as much as possible. To do the process, the knowledge about paradigm and way of working of the users are needed. The observation ued the contextual relevance of algorithm (RK) for the new web page (the page that doesn’t have the click history on ads) and the behaviour relevance algorithm (RB) for the old page (the page which has been visited before and it has ads click history).

Figure 1 Algorithm layout

A. TERM WEIGHTING Information Retrieval is the informational media method

which is suitable with the users query from the data compilation. Variation of the data compilation can content the variation term. The term can be the words, phrase, or others indexing result unit on the document which can be used to observe the context. The method of term weighting is the way to relate on the document. It was related the two concept on term accumulation, such as: term frequency and inverse document frequency.

w = tf . idf

Formula 1 term weighting

The significant things on term weighting are described as follows: a) Term Frequency (tf) is the significant factor that

determines the term weight on the document based on the number of the appearance number. The bigger of the term appearance, the bigger of the proper value.

b) Inverse Document Frequency (idf) factor, is the reduction of term domination that appear frequently in any document. It is needed because the most appearance on the document can be said as the common term and it doesn’t have a value. In vice versa, the factor of the scarcity of the term (term scarcity) in the document

collection must be notice in the term weighting. The words appeared in a few document (uncommon terms) should give more attention than the words appeared on many documents. The weighting will calculate the vice versa frequency on the document which contents of words (inverse document frequency).

idf = or idf =log( )

Formula 2 Inverse Document Frequency

N is the total number of the document and df is the total number of where the term exist.

B. CONTEXTUAL RELEVANCE Contextual advertising is the strategy to coordinate the

advertisement with the web page according to the contextual relevance[9]. The observation used the coordination algorithm model on the context and advertisement which proposed by Ting Li and others[2] also Berthier and others[32] according to the formulation information retrieval which introduced by Salton and others[31]. Contextual relevance in this observation is using the algorithm vector space. In the algorithm, the contextual relevance web page is p and the advertisement is a. The relevance of p and a can calculate by using the function of consine similarity:

RK = Sim(p, a) = ∑∑ ∑

. . log

. . log

Formula 3 Contextual Relevance [2] tfip is the frequency of appearance of the words ti on the

page p. tfia is the frequency of appearance of the words ti on the advertisement a, D is the total number of the advertisement, Di is the total number of the advertisement where the term ti appeared. The algorithm will be used on the new web page or the old page to show the click history concern of the advertisement.

Figure 2 Contextual relevance

In harianjogja.com have the two main advertisement that become the header and footer. The designer used the different method. For header advertisement it use contextual relevance and for footer advertisement he used the random advertisement. If there are no coherence in algorithm

contextual relevance, he will use the random advertisement to header and footer.

C. BEHAVIOR RELEVANCE Online Behavior is the online habitual of the users. It can be

seen from the history of the reading page used by the users. The contextual relevance is not always attract the users attention to click the advertisement[2], Consequently, there are many observer who focus on attracting the users attention based on the behavior. It is expected can be increase the users attention to click the advertisement based on the users behavior itself. So that, the writer suggest to use the algorithm vector space on behavior advertising in the observation to solve the problem.

On the algorithm relevance behavior, log page j and advertisement a represented as a vector in n dimensions. Relevance between j and a function can be calculated by using the consine similarity function:

RB = Sim(p, a) = ∑∑ ∑

. log . log

Formula 4 Behavior Relevance tfji1 is the total article read by users i on the segment j in the

recent times. Tfij1 used on the header advertisement. tfji2 is the total highest number of article read by the users i on the segment j. tfij2 used on the footer advertisement. tfja is the total advertisement a clicked from segment j. is the total advertisement. is the total advertisement appeared on the segment j.

The algorithm used in the web which have the click history.

Figure 3 Behaviour Relevance

The relevance behavior in header harianjogja.com used the term based on the total article read by users in the final segment. Whereas the footer used the total advertisement read in the most visiting segment.

D. PROBLEM IDENTIFICATION In this phase, there is an identification of existing problem.

The writer have done the activities such as an interview, survey toward Harianjogja.com. There are some identification of the problems: 1. There are still use global advertisement on the website of

harianjogja.com which caused the less of the impression of the user.

2. The limitation of the advertisement so the income is not optimal.

3. The matching method between the topic and the advertisement is not applied yet.

4. Less of the user interest to click the advertisement so the sales and good services are not optimal.

The purpose of the observation is to answer the identification problem above. The answer is to develop the system of computational advertising by using optimization targeted advertising that will be worth wile for all member (advertiser, publisher and web user).

III. EXAMINATION Flowchart matching was made to figure out the relevance

of web pages and ads using space vector algorithm to illustrate the relevance of contextual relevance which describe the behavior.

Figure 4 Flow Chart matching Algorithm

Algorithm vector space used in contextual relevance and behavior relevance to calculate the weight of the similarity between one page to the other. The higher similar term, the bigger similar value.

The trial will carried out in two stages. The first is internal testing, where the system will be piloted in a limited environment and as a tester it is its own development team. The data used in the test phase is the data obtained from the user web portal news. Next step is the integration of data from multiple elements. Then in the second stage, performed externally trials, where the system will be tested by the parties of the agency. Having tested the analysis process is carried out again if the system is built according to the requirements, and then do the repair process.

IV. CONCLUSION From the research, the conclusion can be drawn as follows:

1. Space vector algorithm can be used not only to search for a match between the ads with news content but also to

search for a match between advertisements with the habits of visitors.

2. Algorithms use vector space on contextual relevance and relevance behavior can enhance the attractiveness of visitors to click ads because according to the habits of web users.

3. The use of this algorithm is able to increase the revenue of the news portal which accommodated because the ad is not limited.

REFERENCES [1] https://www.emarketer.com/coverage/advertising-marketing (accessed

june 2013) [2] Li, Ting., Ning Liu., Yan, Jun., Wang, Gang., Bai, Fengshan., Chen,

Zheng. (2009) : “A Markov chain model for integrating behavioral targeting into contextual advertising”. ACM Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising.

[3] Karimzadehga, Maryam. (2011): “A Stochastic Learning-To-Rank Algorithm and its Application to Contextual Advertising” Departement of Computer Science University of Illinois.

[4] Kumar, S.; Dawande, M.; Mookerjee, V.S.. (2007): “Optimal Scheduling and Placement of Internet Banner Advertisements” Knowledge and Data Engineering, IEEE Transactions.

[5] Richardson, Mathew; Dominowska, Ewa, Ragno, Robert (2007): “Predicting Clicks: Estimating the Click-Through Rate for New Ads” Microsoft Research.

[6] Khan, Suleman., Ilyas, Q Mudassar, Anwar, Waqas (2009): “Contextual advertising using keyword extraction through collocation” FIT '09 ACM Proceedings of the 7th International Conference on Frontiers of Information Technology.

[7] Wang, Xuerui., Broder, Andrei., Fontoura, Marcus., Josifovski, Vanja. (2009): “A Search-based Method for Forecasting Ad Impression in Contextual Advertising” International World Wide Web Conference Committee(IW3C2).

[8] Kevin Lang, Joaquin Delgado, Dongming Jiang, Bhaskar Ghosh, Shirshanka Das, Amita Gajewar, Swaroop Jagadish, Arathi Seshan, Chavdar Botev, Michael Binderberger-Ortegay , Sunil Nagaraj, Raymie Stata (2009): “Efficient Online Ad Serving in a Display Advertising Exchange”, ACM Yahoo Research

[9] Aksakalli, Vural. (2012): “Optimizing direct response in Internet display advertising” ACM Journal Electronic Commerce Research and Applications.

[10] Richardson, Matthew., Dominowska, Ewa., Ragno, Robert. (2007): “Predicting Clicks: Estimating the Click-Through Rate for New Ads” International World Wide Web Conference Committee(IW3C2).

[11] GM Kumar, Pavan., P. Leela, Krishna., Parsana, Mehul., Garg, Sachin. (2010) “Relevance-index size tradeoff in contextual advertising” CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management Pages 1721-1724., ACM New York

[12] Bendersky, M.; Gabrilovich, E.; Josifovski, V.; Metzler, D. (2010) “The Anatomy of an Ad: Structured Indexing and Retrieval for Sponsored Search” Proceedings of the 19th international conference on World Wide Web (www 2010).

[13] Phuong, Do Viet., Phuong, Tu Minh (2012) “A Keyword-Topic Model for Contextual Advertising” SoICT '12 Proceedings of the Third Symposium on Information and Communication Technology., ACM New York.

[14] Zhao, Lan., Nagurney, Anna (2012) “A network equilibrium framework for Internet advertising- Models, qualitative analysis, and algorithms” European Journal of Operational Research Volume 187, Issue 2, 1 June 2008, Pages 456–472., Elseveir

[15] Ghosh, Arpita., Rubinstein, Benjamin I. P. (2009) “Adaptive Bidding for Display Advertising” International World Wide Web Conference Committee(IW3C2).

[16] Zuiderveen Borgesius, Frederick (2013) “Behavioral Targeting: A European Legal Perspective” Building Confidence in a Networked World, Volume 11, Issue 1, IEEE Security and Privacy.

[17] Richardson, Matthew., Prakash, Amit., Brill, Eric (2006) “Beyond PageRank: machine learning for static ranking” WWW '06 Proceedings of the 15th international conference on World Wide Web., ACM New York.

[18] Dave, Kushal S. (2012) “Computational advertising: leveraging user interaction & contextual factors for improved ad relevance & targeting” WSDM '12 Proceedings of the fifth ACM international conference on Web search and data mining Pages 757-758., ACM New York.

[19] Dave, Kushal S. (2012) “Computational Advertising: Leveraging User Interaction & Contextual Factors for Improved Ad Retrieval & Ranking” WWW '11 Proceedings of the 20th international conference companion on World wide web Pages 349-354., ACM New York.

[20] Broder, Andrei (2012) “Computational Advertising” Yahoo Research [21] Bhatt, Meghana A. (2012) “Evaluation and associations: A neural-

network model of advertising” Journal of Economic Behavior & Organization Volume 82, Issue 1, April 2012, Pages 236–255, Elseveir.

[22] Lewis, Randall A., Rao, Justin M., Reiley, David H. (2011) “Here, there, and everywhere: correlated online behaviors can lead to overestimates of the effects of advertising” WWW '11 Proceedings of the 20th international conference on World wide web Pages 157-166., ACM New York.

[23] Hsieh, Yu-Chen., Chen, Kuo-Hsiang (2011) “How different information types affect viewer’s attention on internet advertising” Journal Computers in Human Behavior, Volume 27 Issue 2, Pages 935-945., ACM New York.

[24] Mei, Tao., Li, Lusong., Hua, Xian-Sheng., Li, Shipeng (2012) “ImageSense: Towards contextual image advertising” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), Volume 8 Issue 1., ACM New York.

[25] Gabrilovich, Evgeniy., Josifovski, Vanja., Pang, Bo “Introduction to Computational Advertising”., Yahoo Research

[26] Pandey, Sandeep., Aly, Mohamed., Bagherjeiran, Abraham ., Hatch, Andrew ., Ciccolo, Peter ., Ratnaparkhi, Adwait ., Zinkevich, Martin., “Learning to Target: What Works for Behavioral Targeting” Yahoo Research.

[27] Broder, Andrei Z. (2011) “Computational Advertising and Recommender Systems” RecSys '08 Proceedings of the 2008 ACM conference on Recommender systems., ACM New York.

[28] Valitutti, Alessandro., Strapparava, Carlo .,Stock, Oliviero (2008) “Textual Affect Sensing for Computational Advertising” Association for the Advancement of Artificial Intelligence., www.aaai.org.

[29] McAfee, R. Preston., Papineni, Kishore., Vassilvitskii, Sergei., (2010) “Maximally Representative Allocations for Guaranteed Delivery Advertising Campaigns” Yahoo Research.

[30] Rosales, Rómer., Cheng, Haibin., Manavoglu, Eren (2012) “Post-Click Conversion Modeling and Analysis for Non-Guaranteed Delivery Display Advertising” WSDM '12 Proceedings of the fifth ACM international conference on Web search and data mining, Pages 293-302., ACM New York.

[31] Salton, Gerard., Buckley, Christopher (1988) “Term-Weighting Approaches in Automatic Text Retrieval” Information Processing and Management : an International journal., Volume 24., pages 513 – 523., Pergamon Press.

[32] Ribeiro-Neto, Berthier., Cristo, Marco., Golgher, Paulo B., de Moura, Edleno Silva (2005) “Impedance Coupling in Content – Targeted Advertising” SIGIR ’05 ACM SIGIR conference on Research and development in Information Retrieval., pages 496 – 503., ACM New York.