python introduction - iptricardo/ficheiros/python-introductiontopython.pdf · python is widely used...

32
Python Ricardo Campos Lic ITM Abrantes, Portugal, 2019 Introduction

Upload: others

Post on 12-Jun-2020

51 views

Category:

Documents


0 download

TRANSCRIPT

Python

Ricardo Campos

Instituto Politécnico de Tomar

Lic ITM Abrantes, Portugal, 2019

Introduction

What is Information Retrieval?

This presentation was developed by Ricardo Campos, Professor of ICT of the Polytechnic Institute of Tomar and researcher of LIAAD - INESC TEC. Part of the slides used in this presentation were adapted from presentations found in internet and from reference bibliography:

• Dipanjan Sarkar (2016). Text Analytics with Python

• http://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-0-Scientific-Computing-with-Python.ipynb

• https://www.tutorialspoint.com/python/python_overview.htm

What is Information Retrieval?

What is Information Retrieval?

AGENDAWhat is this talk about?

History

2Why

1Features

3

Anaconda

5

Advantages

4

PyCharm

6Resources

7Q&A

8

What is Information Retrieval?

What is Information Retrieval?

Python is a scientific language and it is the first choice of Scientists much due to its large

community of users, easy to find help and documentation.

Extensive ecosystem of scientific libraries and environments:

• numpy: http://numpy.scipy.org - Numerical Python

• scipy: http://www.scipy.org - Scientific Python

• matplotlib: http://www.matplotlib.org - graphics library

No license costs, no unnecessary use of research budget.

Python is especially good for our purposes in that it does not have a lot of “overhead” before getting

started. It is easy to jump in and experiment with Python in an interactive fashion.

What is Information Retrieval?

Python is widely used in several domains including artificial intelligence (AI), game development,

robotics, Internet of Things (IoT), computer vision, media processing, and network and system

monitoring, just to name a few. Although Python can be used for solving a lot of problems, here

are some of the most popular domains:

• Scripting: Python is known as a scripting language. It can be used to perform many tasks, such

as interfacing with networks and hardware and handling and processing files and databases,

performing OS operations, and receiving and sending email. Python is also used extensively

for server-side scripting and even for developing entire web servers for serving web pages.

Popular Domains

What is Information Retrieval?

• Web development: There are a lot of robust and stable Python frameworks out there that are

used extensively for web development, including Django, Flask, Web2Py, and Pyramid.

• Graphical user interfaces (GUIs): A lot of desktop-based applications with GUIs can be easily

built with Python. Libraries and APIs like tkinter, PyQt, PyGTK, and wxPython allow developers

to develop GUI-based apps with simple as well as complex interfaces.

• Systems programming: We can use Python to perform OS operations including creating,

handling, searching, deleting, and managing files and directories. The Python standard library

(PSL) has OS and POSIX bindings that can be used for handling files, multi-threading, multi-

processing, environment variables, controlling sockets, pipes, and processes.

What is Information Retrieval?

• Database programming: Python is used a lot in connecting and accessing data from different

types of databases, be it SQL or NoSQL. APIs and connectors exist for these databases like

MySQL, MSSQL, MongoDB, Oracle, PostgreSQL, and SQLite. In fact, SQLite, a lightweight

relational database, now comes as a part of the Python standard distribution itself.

• Scientific computing: Python really shows its flair for being multipurpose in areas like numeric

and scientific computing. You can perform simple as well as complex mathematical operations

with Python, including algebra and calculus. Libraries like SciPy and NumPy help researchers,

scientists, and developers leverage highly optimized functions and interfaces for numeric and

scientific programming. These libraries are also used as the base for developing complex

algorithms in various domains like machine learning.

What is Information Retrieval?

• Machine learning: Python is regarded as one of the most popular languages today for

machine learning. There is a wide suite of libraries and frameworks, like scikit-learn, h2o,

tensorflow, theano, and even core libraries like numpy and scipy, for not only implementing

machine learning algorithms but also using them to solve real-world advanced analytics

problems.

• Text analytics: Python can handle text data very well, and this has led to several popular

libraries like nltk, gensim, and pattern for NLP, information retrieval, and text analytics. You can

also apply standard machine learning algorithms to solve problems related to text analytics.

This ecosystem of readily available packages in Python reduces time and efforts taken for

development.

What is Information Retrieval?

Python was developed by Guido van Rossum in the late eighties and early nineties

at the National Research Institute for Mathematics and Computer Science in the

Netherlands.

Guido Van Rossum published the first version of Python code (version 0.9.0) at

alt.sources in February 1991.

In 2008, Python 3 was released on an almost-unthinkable premise - a complete

overhaul of the language, with no backwards compatibility. The decision was

controversial, and born in part of the desire to clean house on Python.

What is Information Retrieval?

• clean and simple language: Easy-to-read and intuitive code, easy-to-learn minimalistic

syntax. Reading a good Python program feels almost like reading English.

public class Hello {public static void main(String[] args){

System.out.println("Hello world!");}}

Hello.java

12345

print "Hello world!"

hello.py

12345

What is Information Retrieval?

• expressive language: Fewer lines of code, fewer bugs, easier to maintain.

• dynamically typed: No need to define the type of variables, function arguments or return

types.

• Free and Open Source software.

• Portable: due to its open-source nature, Python has been ported to many platforms. All your

Python programs can work on several platforms without requiring any changes.

What is Information Retrieval?

• interpreted: You just run the program directly from the source code. Internally, Python

converts the source code into an intermediate form called bytecodes and then translates this

into the native language of your computer and then runs it.

This also makes your Python programs much more portable, since you can just copy your

Python program onto another computer and it just works!

What is Information Retrieval?

• Object Oriented: Python supports procedure-oriented programming as well as

object-oriented programming. Python has a very powerful but simplistic way of

doing OOP, especially when compared to big languages like C++, C# or Java.

• Interactive: You can actually sit at a Python prompt and interact with the

interpreter directly to write your programs.

• Beginner’s Language: Python is a great language for the beginner-level

programmers and supports the development of a wide range of applications

from simple text processing to WWW browsers to games.

• Databases: Python provides interfaces to all major commercial databases.

What is Information Retrieval?

• The main advantage is ease of programming, minimizing the time required to develop,

debug and maintain the code.

• Besides the standard library, thousands of third-party libraries are readily available on the

Internet, encouraging open source and active development. The official repository for

hosting third-party libraries and utilities for enhancing development in Python is the Python

Package Index (PyPI). Access it at https://pypi.python.org and check out the various

packages. Currently there are over 118,000 packages you can install and start using.

• The pseudo-code nature of Python is one of its greatest strengths. It allows you to

concentrate on the solution to the problem rather than the language itself.

What is Information Retrieval?

This package provides a lot of advantages, especially for Windows users, where installing some

of the packages like numpy and scipy can sometimes cause issues.

Anaconda comes with conda, an open source package and environment management system,

and Spyder (Scientific Python Development Environment), an IDE for writing and executing your

code.

Anaconda is a complete Python distribution with over 700 packages, known as the Anaconda

Python distribution, from Continuum Analytics, which is built specially for data science and

analytics, at https://www.anaconda.com/download/.

Setup

IMPORTANT NOTE: Anaconda cannot be installed under usernames with spaces or accents (e.g.,

c:\simão. Thus, if your username has an accent you should install Anaconda under a different

folder. In order to do so, execute the install file of Anaconda with administrative privilegies

What is Information Retrieval?

Once the installation is complete, start the

jupyter notebook.

Jupyter notebook

What is Information Retrieval?

Jupyter notebook is an HTML-based

notebook environment for Python,

similar to Mathematica or Maple.

Jupyter notebook

The working directory of Jupyter is:

c:\user\NomeUser

What is Information Retrieval?

If you want to change that directory do the following:

(1) On anaconda command line type in the following: jupyter notebook --generate-config

This will generate a file jupyter/jupyter_notebook_config.py on the folder indicated during the

execution of the command.

Open that file and search for: c.NotebookApp.notebook_dir

Specify your new working directory (e.g., ‘H:\\JupyterNotebooks’) and remove the #

IMPORTANT NOTE: note that, the folder where you intend to keep your files (e.g.,

‘H:\\JupyterNotebooks’) cannot have spaces or accents.

Jupyter notebook – Change Default Directory

What is Information Retrieval?

If you want to change that directory do the following:

2. Search for Jupyter Notebook on Windows search feature - right click – open “localização do

ficheiro”

Right click – Propriedades - Atalho

Em iniciar colocar o endereço definido anteriormente (e.g., ‘H:\\JupyterNotebooks’)

No destino (no final) remover: %USERPROFILE%

Jupyter notebook - Change Default Directory

What is Information Retrieval?

If you want to change the browser that the system is opening with:

• Open (once again) the file jupyter_notebook_config.py

• Replace the “#c.NotebookApp.browser” by the following code

• For Chrome:

import webbrowser

webbrowser.register('chrome', None, webbrowser.GenericBrowser('C:\Program Files

(x86)\Google\Chrome\Application\chrome.exe'))

c.NotebookApp.browser = 'chrome'

Jupyter notebook - Change Default Browser

What is Information Retrieval?

If you want to change the browser that the system is opening with:

• Open (once again) the file jupyter_notebook_config.py

• Replace the “#c.NotebookApp.browser” by the following code

• For Firefox:

import webbrowser

webbrowser.register('firefox', None, webbrowser.GenericBrowser('C:\\Program Files

(x86)\\Mozilla Firefox\\firefox.exe'))

c.NotebookApp.browser = 'firefox'

Jupyter notebook – Change Default Browser

What is Information Retrieval?

https://github.com/ipython-contrib/jupyter_contrib_nbextensions

No Anaconda Prompt faça:

pip install jupyter_contrib_nbextensions

jupyter nbextensions_configurator enable --user

jupyter contrib nbextension install --user

Depois no jupyter escolha as opções CodeFolding, Collapsible Headings and Table of Contents(2)

Jupyter extensions

What is Information Retrieval?

under the tab nbextensions on jupyter notebook you will also need to configure “Table of

Contents(2)”

Increase the Maximum level of nested sections to 5

Skip H1 headings

Add a table of Contents cell at the top of the notebooks

Display toc Windows/sidebar at startup

Jupyter extensions – Table of Contents (2)

What is Information Retrieval?

PyCharm Community (https://www.jetbrains.com/pycharm/download/#section=windows)

After installing you might want to have a look at a series of videos available at YouTube (search

for “Getting Started with PyCharm"

I also suggest you to have a look at the following link which discusses:

- Choosing interpreter

- Creating a virtual environment

- Creating a python file

https://www.jetbrains.com/help/pycharm/creating-and-running-your-first-python-project.html

What is Information Retrieval?

Visualize the execution of your code (http://pythontutor.com/):

What is Information Retrieval?

Execute your own code online on jupyter / docker (https://mybinder.org/):

What is Information Retrieval?

• http://www2.ic.uff.br/~vanessa/

• https://www.programiz.com/python-programming

• https://pypi.python.org/pypi

• http://www.openbookproject.net/thinkcs/python/english2e/

• http://mcsp.wartburg.edu/zelle/python/

• https://www.codecademy.com/en/tracks/python

• https://automatetheboringstuff.com/

• https://www.coursera.org/learn/python

• https://developers.google.com/edu/python/?csw=1

• https://docs.python.org/3.3/library/index.html

• http://www.nltk.org/book/

What is Information Retrieval?

• How to Think Like a Computer Scientist: Interactive Edition

http://interactivepython.org/runestone/static/thinkcspy/index.html

• Problem Solving with Algorithms and Data Structures using Python: Interactive Edition

http://interactivepython.org/runestone/static/pythonds/index.html

• Programs, Information and People: Interactive Edition

http://interactivepython.org/runestone/static/pip2/toc.html#

What is Information Retrieval?

• Introducing to Programming using Python by Y. Daniel Liang

• Python for Informatics – Exploring Information, by Charles Severance

• Python for Everybody, by Charles Severanchttp://do1.dr-

chuck.com/pythonlearn/EN_us/pythonlearn.pdfe ()

• Programação em Python - Fundamentos e Resolução de Problemas by Ernesto Costa

• Think Python (free book: http://greenteapress.com/wp/think-python/)

What is Information Retrieval?