automated transcription and learn... · 2020. 8. 11. · lunch & learn jake surman research...

22
Automated Transcription Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services

Upload: others

Post on 26-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Automated TranscriptionLunch & Learn

Jake SurmanResearch Data Specialist, Research Technology Services

Page 2: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Automated Transcription

• Considerations

• Transcription options

• Automated Transcription Services

– Microsoft Stream/Teams

– Zoom

– AWS

– Google Cloud

• Quality comparison

Page 3: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

UNSW RESEARCH INFRASTRUCTURE

Research Technology Services

Compute · Data · CommunityA

bo

ut

Us

The ResTech compute team procides support for those who have problems too big for the

computer at your desk.

UNSW provides a number of platforms for storing, capturing

and sharing your research

data.

The ResTech Community team aims to build a strong

and connected research

network within UNSW.

[email protected]

www.restech.unsw.edu.au

Level 3, Chemical Science (F10)

Sign up to our mailing list

Compute Data Community

Page 4: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

High Performance Computing (HPC)• Free for researchers and HDR candidates• As a service: NCI – Gadi (100 million compute hours)• Katana – local HPC cluster (24 million compute hours)

Cloud Computing• Cloud services: Amazon AWS, Microsoft Azure, NECTAR • Seed money for exploring research in the cloud

Research Data

• Help with Data Management training, issues, information

• Assistance with data moves, storage, planning

[email protected]

Research Technology Services

Page 5: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Research technology training• 40+ courses per year on campus and online• Free to researchers and HDR candidates

Consulting• Help with code and using HPC • Data Classification, Management, and tools help• Advising on, purchasing and configuring HPC equipment

Hacky Hour• Casual meetup 3pm every Thursday in Penny Lane (currently on Teams)• Bring your problems with code, HPC, data• Presentations about research technologies

[email protected]

Page 6: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

RDM InitiativeDivision of Research

• Researcher Development

(Training + Engagement)

• Researcher Technology Services

(Data Team)

• PVC-RI

• IT

• Library

• Data Governance

• Research Integrity

PVC – Research Infrastructure

(Initiative Owner)

People

Tools

Policy

Page 7: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

What data do you have?

In the Chat:

Do you have Audio? Video?

How many files, and how long?

Is your data sensitive? (Medical, Children, Identifying)

Page 8: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

CONSIDERATIONS

Page 9: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Security

• What is the classification of your data?

• Where is your data is being held?

• Who has access to your video/audio upload?

• Who has access to your transcript?

• How long do you need to keep these files?

• Is the data encrypted on disk and in transit?

• Will they use your data or share it with others?

Page 10: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Functionality

• How good is the transcription quality?

• How expensive is it?

• How easy is it to use?

• What format is the transcription in?

• What options do you have?

• When will you get the transcription?

Page 11: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

TRANSCRIPTION OPTIONS

Page 12: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Human transcription

Positives:

• Generally good quality transcription

• Fast enough? Hours to weeks to get results

Negatives:

• Can be expensive ($60+/hour)

• Need to be careful about who you give your files to, where do they store your data,

what tools do they use?

• Confidentiality agreement needed - https://research.unsw.edu.au/forms-and-

templates

• At least one human sees your data

Page 13: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Machine transcription

Positives:

• Very fast, results in minutes to hours

• No humans involved

• Cheap or free

Negatives:

• Quality highly variable

• Can be fiddly to use

• Need to be careful about security, locality, fine print

Page 14: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations
Page 15: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

AUTOMATED TRANSCRIPTION SERVICES

Page 16: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Microsoft Stream/Teams

Positives:

• Easy to get the transcript

• Free

• Fast

• Video and transcript are in a secure location

• Can upload your own files as well as recorded Teams meetings

Negatives:

• Transcript is in a strange format

• Quality OK

Page 17: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Zoom

Positives:

• Free

Negatives:

• Slow (3 days)

• Stored in the USA

• Quality worst of these three options

• Only for meetings in Zoom that you record (can’t upload files)

Page 18: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Amazon Web Services Transcribe

Positives:

• Cheap (1 hour free/month for 12 months, then $1.5/hour)

• Can upload lots of files at once

• Web interface

• Can upload custom dictionary and use “medical” version

Negatives:

• Quality OK

• Takes a bit of work to set up, needs a credit card.

• Need to send a request to opt out of re-using your data for training

Positives:

• Cheap (1 hour free/month for 12 months, then $1.5/hour)

• Can upload lots of files at once

• Web interface

• Can upload custom dictionary and use “medical” version

Negatives:

• Quality OK

• Takes a bit of work to set up, needs a credit card.

• Need to send a request to opt out of re-using your data for training

Page 19: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Google Cloud Speech to Text

Positives:

• Cheap ($1.5/hour)

• Lots of options

Negatives:

• Quality OK

• Takes a lot of work to set up, needs a credit card.

• No web interface, only command-line and programmer API

• Only for audio, mainly FLAC and WAV

Page 20: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Quality ComparisonAmazon Transcribe Stream auto-transcription Zoom auto-transcription

U. N s W supported data platforms for research data. Boot data management is fundamental. When conducting research you NSW provides a number of approved data storage platforms for your research data. Different platforms are suitable for different classifications of data. Choosing a storage platform should depend on the classifications of your data to ensure your research data is secured. Rdm at u W is here to assist contact your friendly rdm at U. N s W team.

Use Unsw supported dataplatforms for research data who data management is fundamentalwhen conducting research. Unsw provides a number of approveddata storage platforms for your research data. Differentplatforms are suitable for different classifications ofdata. Choosing a storage platform should depend on theclassification of your data. To ensure your research data issecured. RDM at Unsw is here to assist. Contact your friendlyRDM at Unsw team.

He's UFW supported data platforms for research data. Data Management is fundamental. When conducting research us who provides a number of approves data storage platforms for your research data. different platforms are suitable for different classifications of data choosing a storage platform should depend on the classification of your data to ensure your research data is secure our DM at us, who is he to assist contact your friendly our DM at the UN FW team.

Page 21: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Quality ComparisonGoogle speech to text Azure Transcription

Use unsw supported data platforms for

research data good data management is

fundamental when conducting research

unsw provides a number of approved

data storage platforms for your research

data different platforms are suitable for

different classifications of data choosing

a storage platform should depend on the

classification of your data to ensure your

research data is secure.

Third RDM at unsw is here to assist contact your friendly RDM at unsw team.

Use Unsw supported data platforms for research data. Good data management is fundamental when conducting research. Unsw provides a number of approved data storage platforms for your research data. Different platforms are suitable for different classifications of data. Choosing a storage platform should depend

on the classification of your data to ensure your research data is secured. Rdm at Unsw. Is here to assist. Contact your friendly rdm at Unsw team.

Page 22: Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data Specialist, Research Technology Services . Automated Transcription • Considerations

Conclusion, Q&A

In the chat:Would you try automated transcription for your research? (Have you already?)

Question Time.

Contact us: [email protected]