audio programming in java a presentation for the vancouver island java users’ group kevin matz...
TRANSCRIPT
Audio Programming in JavaA presentation for the Vancouver Island Java Users’ Group
Kevin Matz2010.09.30
Topics
• Sound basics• Java Media Framework (avoid)• Java Sound API
– Playback– Real-time capture and processing
• MP3 playback with JLayer• What is a .MOD player…• …and how do you build one?
Ph
oto
cre
dit
: Lo
uis
e D
ock
er,
sx
c.h
u
2
What is sound?• Vibrations propagated through the molecules of
the air are detected by your eardrums and interpreted and perceived by your brain as sound
• Sound waves, as plotted on a Cartesian plane, are graphical representations of the patterns of high and low air pressure in a travelling sound wave:
3
Properties of sound waves
4
Mixing waveforms• To play multiple sounds
simultaneously, you can mix the waveforms together by simply adding them together
• Scale the output back to the standard volume to avoid clipping/distortion
• Basically: take the average of samples from all channels at each point in time
5
Recording and reproducing sound• A microphone contains a diaphragm that vibrates
when struck by sound waves, which vibrates a magnet within a coil to electromagnetically convert mechanical vibrations into an electrical signal with a varying amplitude
• “voice-shaped currents” (Alexander Graham Bell)• In a loudspeaker, the electric signal causes an
electromagnet to vibrate a diaphragm or speaker cone, which pushes the surrounding air out in the same pattern, reproducing the sound waves
6
Representing sound recordings digitally• Levels of electrical impulses quantized using an ADC
(analog-to-digital converter) “Pulse-Code Modulation”
• Properties of raw PCM recordings:– Sampling rate determines how frequently the analog
signal is to be sampled (e.g., 22000 Hz)– Bits per sample (e.g., 8 or 16)
• Little-endian vs. big-endian storage for more than 8 bits• Signed vs. unsigned
– Channels: Mono vs. stereo– Encoding: Uncompressed linear amplitudes (Linear PCM)
vs. logarithmic dynamic range compression (μ-Law, A-Law)
• e.g., CD audio: 44100 Hz, 16 bit, stereo, linear
7
Audio in Java:Java Media Framework (JMF)• Pretty much dead
– API not updated since 1999– Sporadic maintenance (periods of years without
updates)– JMF website full of broken links
• Supports few modern media formats• Add-on to the JRE; separate download and install
needed – Windows, Linux, Solaris supported, but not Mac
• MP3 decoder/encoder removed in 2002; decoder only made available as additional plug-in in 2004
8
Java Sound API (javax.sound)
• Part of the Java SE runtime environment since 1.3 (May 2000)
• javax.sound.sampled– Playback, capture, mixing of sampled audio– Natively supports .WAV, .AU, and .AIFF formats– Service provider interface (SPI) allows extensibility
for new audio devices and sound file formats
• javax.sound.midi– MIDI music synthesis and control of MIDI devices– Not covered in this presentation
9
javax.sound: Interfaces• Line
– An element of the “digital audio pipeline” that can carry audio data
– open(), close()– LineListeners can be registered to monitor open/start/stop/close
events
• Port extends Line– Representations of jacks for output to or input from audio
devices– e.g., microphone, CD player, line in, speaker, headphone, line out
• Mixer extends Line– A representation of any audio device with one or more input
and/or output lines– Can be a software implementation of a mixer, i.e., a device that
combines input from multiple input lines onto a single output line
10
javax.sound: Interfaces• DataLine extends Line
– Provides start(), stop(), available(), drain(), etc. to control audio data playback/capture
• SourceDataLine extends DataLine– To output sound data, you write to a SourceDataLine via
write()– Called “source” as it is intended to be an input to a mixer
• TargetDataLine extends DataLine– To capture incoming sound data, you read it from a
TargetDataLine via read()– Called “target” as it is intended to be an output of a mixer
• Clip extends DataLine– A data line that can have data pre-loaded prior to
playback– Supports looping
11
javax.sound: Classes• AudioFormat
– Describes a format in terms of sample rate, bits per sample, sample encoding, channels (mono/stereo), etc.
• AudioFileFormat– Describes format of an audio file (e.g., .WAV, .AU, .MP3) + an
AudioFormat• AudioInputStream extends InputStream
– An InputStream with a specific audio format (suitable for reading from audio files)
– Note: no AudioOutputStream!• AudioSystem
– Main entry point to audio resources– Query and get mixers (audio devices) available on system– Or, get lines directly without dealing with mixers
• Default mixer (audio device) for various line types determined by system properties, or can be specified in file lib/sound.properties in JRE directory
– Open audio files (returns AudioInputStream)
12
javax.sound: Security
• Playback generally always permitted• Recording
– Always prohibited for applets– Prohibited for applications running under a security
manager (e.g., WebStart apps), but can be overridden by user or admin by editing policy file
– Permitted for applications with no security manager
13
Demo 1: Playing a clip
• SimpleClipDemo.java– Simple playback of a .WAV file
• SoundBoardDemo.java– Simple Swing app using threads for simultaneous
playback of multiple clips
14
Demo 2: Capturing and processing audio in real time
• MicrophoneEchoDemo.java– Demo that captures microphone input and plays it
back, adding an echo effect with a one-second delay
15
JavaZoom open-source projects• JLayer
– Library for playing MP3 files– GNU LGPL license– Has its own API separate from Java Sound– jl1.0.1.jar is 106k
• MP3SPI– A Java Sound SPI plug-in so that Java Sound API
treats .MP3 files like any other already-supported format
• VorbisSPI– A Java Sound SPI for Ogg Vorbis files
• Note: See mp3licensing.com
16
Demo 3: Playing an MP3 song in the background with JLayer
• BackgroundMusicDemo.java– Demo using JLayer’s Player class in a separate
thread to play back an MP3 song in the background
17
Tracked music and the .MOD format• “Tracker” programs allow composition of music by entering notes
in a spreadsheet-like grid• .MOD format originated on the Amiga with Karsten Obarski’s
Ultimate Soundtracker (1987) and derivatives such as Protracker (shown below)
Image c
redit
: W
ikip
edia
18
.MOD format• Main features:
– 15 or 31 eight-bit samples (instruments)– 4 channels– Song consists of patterns (64 rows) arranged in an
order– Effect commands
– Arpeggio, portamento (slide up/down), vibrato, …
– Change speed, jump to pattern, …
• Variations on the .MOD format, and later formats (ScreamTracker .S3M, FastTracker .XM, Impulse Tracker .IT) expanded the number of channels and samples, added effects, and added more control over instruments
19
Writing a .MOD player
• We need to solve two major issues:
1. How do we play multiple sounds simultaneously?➔ Easy: Just add the waveforms together➔ Or even easier: Use Java’s mixer functionality
2. If we have a single recording of an instrument at a certain pitch (Middle C), then how do we reproduce the same instrument sound at a different pitch (e.g., an A in octave 5)?➔ Thankfully, this is also easy!
20
How do you play a sample at a different pitch?
• .MOD file assumes samples are recorded such that playing the sample at 8287 samples/sec will render the sample as a middle C
• To play a sample at a different pitch, i.e., at a different frequency… we simply play the sample at a different frequency!– Play the sample faster to get a higher pitch– Play the sample slower to get a lower pitch
• But by what factor should we scale a sample to get a particular note?
21
Table of frequencies for notesOctave 3 Octave 4 Octave 5
C 262 Hz Middle C 523 Hz C 1047 Hz
C# 277 C# 554 C# 1109
D 294 D 587 D 1175
D# 311 D# 622 D# 1245
E 330 E 659 E 1319
F 349 F 698 F 1397
F# 370 F# 740 F# 1480
G 392 G 784 G 1568
G# 415 G# 831 G# 1661
A 440 A 880 A 1760
A# 466 A# 932 A# 1867
B 494 B 988 B 1976
• Frequency ratio (interval) between two consecutive semitones is the 12th root of 2 = = 1.05946
• e.g. 880 Hz * 1.05946 = 932 Hz
22
Playing individual samples at different frequencies while maintaining a constant output frequency
• Re-sampling:
23
Playing a .MOD song (1 of 2)• Keep track of position in “order” list and look up pattern number• Keep track of current row position in the current pattern• Keep track of song speed (tempo), which is controlled by two
settings:– “Tick speed” setting determines how many “ticks” each row is divided
into– Beats per minute (BPM) determines how much time is spent per tick– Time per tick = 2.5 sec / BPM setting; e.g., 2.5 sec / 125 = 0.02 sec per
tick– So BPM = 125 and tick speed = 6 means each row is played for 0.02 * 6
= 0.12 sec; a pattern with 64 rows will take 0.12 * 64 = 7.68 sec to play• For each channel, keep track of:
– Instrument/sample– Note corresponding frequency
• Current position in sample• Skip/stutter parameters
– Volume– Effect (if any)
24
Playing a .MOD song (2 of 2)• For each row in a pattern:
– For the number of ticks according to the “tick speed”:• If this is the first tick:
– For each channel:» Update each channel with new instrument, note,
and/or effect» Execute “one-time” effect commands» Set up parameters for “continuous” effects
• Else:– For each channel:
» Update continuous effects• Generate audio data for this tick by rendering the sample for
each channel and mixing the channels into a single output channel
• Send the audio data to the playback buffer
25
Any questions?
27
Thanks!
• By the way…I’m looking for volunteers to take a survey about adding a new type of commenting construct to the Java language!
http://www.kevinmatz.com/survey
28