conserving linguistic heritage the foss way

23
Conserving Linguistic Heritage the FOSS way...

Upload: omshivaprakash-h-l

Post on 12-Jul-2015

1.428 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Conserving Linguistic Heritage the FOSS way

Conserving Linguistic

Heritage the FOSS way...

Page 2: Conserving Linguistic Heritage the FOSS way

Hello!I am Omshivaprakash

I’m a Bengaluru based Wikimedian and a FOSS contributor.

I’m here to share my experience helping reuse/conserve the linguistic heritage of Kannada the FOSS way!

Page 3: Conserving Linguistic Heritage the FOSS way

2013-14Vachana

Sanchaya

11th and 12th Century literature & the need of the hour...

Page 4: Conserving Linguistic Heritage the FOSS way

‘’We need to be able to research on Vachana Sahitya. We should be able to search Vachana’s on the NET.We need data to understand Sahitya much better.- Sri OL Nagabhushana Swamy- Sri Vasudendra

Page 5: Conserving Linguistic Heritage the FOSS way

Challenges

▣ ANSI Data available on GoK Website ▣ GOK website not being intuitive▣ 15 large volumes Printed Books + others▣ No real tool to analyze the data at fingertips▣ Hot discussions on public forums needed

concordance & numerical data to debate on literature

Researches wanted data authentically come to consensus via research… but how?

Page 6: Conserving Linguistic Heritage the FOSS way

Digitize in UnicodeIdea was to get hands on the digitized data in

a reusable format & in Unicode

Page 7: Conserving Linguistic Heritage the FOSS way

ScrapeWe found that the data was available in digital format on GoK website http://vachanasahitya.gov.in

but in ANSI format.

We pulled the data with wget and write a python script to systematically extract data and converted the text to Unicode.

ALL IN FLAT FILES

Getting to work on data

But...It was not really enough. How does anyone take all the text in files and do research?We proposed to push this to a database and provide simple GUI tools to search text to look at results.

Page 8: Conserving Linguistic Heritage the FOSS way

more challenges...

Technical difficulties

Providing the end results to large number of people.

Making them understand to use the tools such as MySQL WorkBench/ SQLite Manager etc...

Awareness

Text input methods

SQL syntax

OS compatibility

Expanding scope

What about other research requirements?

How many queries we can write and keep sharing with the linguists not the computer savvy people?

Page 9: Conserving Linguistic Heritage the FOSS way

An opportunity to build something

For language that is close to our heart with few like minded people around over a cup of coffee, during weekends, whenever we have sometime to scribble through the need of our people…

IT WAS FUN...

Page 10: Conserving Linguistic Heritage the FOSS way

We builtVachana Sanchaya

http://vachana.sanchaya.net

Page 11: Conserving Linguistic Heritage the FOSS way

Portal for linguistic research

Page 12: Conserving Linguistic Heritage the FOSS way

Visualization, Discussion board, Concordance & more...

Page 13: Conserving Linguistic Heritage the FOSS way

Enable everyone

studentsResearchers Common Man

Page 14: Conserving Linguistic Heritage the FOSS way

To unearth the wealth of literature

▣ by reading and searching through 21 thousand Vachana’s

▣ written by 250 Vachanakaara’s▣ Researching in finger tips via Concordance &

quick visualizations ▣ Building corpus of 2lac+ unique words ▣ Building biodata of all male & female

vachanakaaras▣ enabling crowd sourced review solution▣ opening up new possibilities for Linguistic

research across other literary work of Kannada.

Page 15: Conserving Linguistic Heritage the FOSS way

We reached masses across the world...

Page 16: Conserving Linguistic Heritage the FOSS way

FOSS

All because of the FOSS tools around us and its philosophy

that we believed in...

Page 17: Conserving Linguistic Heritage the FOSS way

Rails, Nginx, Passenger, Memcached, MySQL, Python, Gitlab, wordpress & more...Only server cost to keep it running

Localized& being adopted to other projects too...

It is being reviewedto be contributed to Wiki Source & Wikipedia

Page 18: Conserving Linguistic Heritage the FOSS way

Moving forward

Bring more literary works online

Standardize Research platform for language

Create timeline for Centuries of Heritage

Page 19: Conserving Linguistic Heritage the FOSS way

How we are planning to do this?

CollaborationEnable community collaboration to build research documents around our literary heritage

EngageEngage students and others to work together on our code to build robust and futuristic tools for all type of literary works(Text, Poems, Old Kannada) etc

EvolveEvolve over period of time, adopt learnings from mistakes, reviews and feedbacks

Consult with communitiesWe would like to consult and learn from multiple language communities. Because Vachana Sahitya is translated to more than 15 languages & more

Keep tweakingWe keep working on tweaking the tool and make it robust to be used as a platform for our upcoming projects

Reaching goalsWe are determined to reach our goal of building unified search tool with timeline for centuries of Kannada Literature the FOSS way...

Page 20: Conserving Linguistic Heritage the FOSS way

We are on Social Media - FB/Twitter/Google+

Embed us on Wordpress via Plugin

We will be on Mobile Soon…

We are opening up APIs to reuse data or build tools around Kannada literature

Adding English and other translated works too....

There is lot more to share

So, Keep in touch!!!

Page 21: Conserving Linguistic Heritage the FOSS way

Our TeamPavithra, Myself, OLN, Vasudendra, Devaraj

Page 22: Conserving Linguistic Heritage the FOSS way

Thanks!Any questions?

You can find me at:Kn/En Wiki: User:OmshivaprakashProject Page: http://vachana.sanchaya.netMain Project: http://kannada.sanchaya.net @omshivaprakash | @vachanasanchaya

Page 23: Conserving Linguistic Heritage the FOSS way

Credits

Special thanks to all the people who made and released these awesome resources for free:▣ Team photo by Amit Mrugvadhe▣ To my team for having made this possible▣ Minicons by Webalys▣ Presentation template by SlidesCarnival▣ Photographs by Unsplash