![Page 1: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/1.jpg)
NLP in Scala with Breeze and Epic
David HallUC Berkeley
![Page 2: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/2.jpg)
ScalaNLP Ecosystem
Breeze Epic Puck
• Linear Algebra• Scientific Computing• Optimization
• Natural Language Processing
• Structured Prediction
• Super-fast GPUparser for English≈ ≈
Numpy/Scipy PyStruct/NLTK
≈{ }
![Page 3: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/3.jpg)
![Page 4: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/4.jpg)
Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.
PP
NP
VP
VPNP
V S
VP
S
It certainly won’t get there on looks.
Natural Language Processing
![Page 5: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/5.jpg)
Epic for NLP
Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.
PP
NP
VP
VPNP
V S
VP
S
![Page 6: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/6.jpg)
Named Entity Recognition
Person Organization Location Misc
![Page 7: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/7.jpg)
NER with Epic> import epic.models.NerSelector> val nerModel = NerSelector.loadNer("en").get> val tokens = epic.preprocess.tokenize("Almost 20
years ago, Bill Watterson walked away from \"Calvin & Hobbes.\"")
> println(nerModel.bestSequence(tokens).render("O"))
Almost 20 years ago , [PER:Bill Watterson] walked away from `` [LOC:Calvin & Hobbes] . ''
Not a location!
![Page 8: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/8.jpg)
Annotate a bunch of data?
![Page 9: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/9.jpg)
Building an NER system> val data: IndexedSeq[Segmentation[Label, String]]
= ???> val system = SemiCRF.buildSimple(data,
startLabel,
outsideLabel)> println(system.bestSequence(tokens).render("O"))
Almost 20 years ago , [PER:Bill Watterson] walked away from `` [MISC:Calvin & Hobbes] . ''
![Page 10: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/10.jpg)
Gazetteers
http://en.wikipedia.org/wiki/List_of_newspaper_comic_strips_A%E2%80%93F
![Page 11: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/11.jpg)
Gazetteers
![Page 12: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/12.jpg)
Using your own gazetteer> val data: IndexedSeq[Segmentation[Label, String]]
= ???> val myGazetteer = ???> val system = SemiCRF.buildSimple(data,
startLabel, outsideLabel,
gaz = myGazetteer)
![Page 13: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/13.jpg)
Gazetteer• Careful with gazetteers!• If built from training data, system will use it
and only it to make predictions!• So, only known forms will be detected.• Still, can be very useful…
![Page 14: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/14.jpg)
Semi-CR-What?• Semi-Markov Conditional Random Field• Don’t worry about the name.
![Page 15: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/15.jpg)
Semi-CRFs
![Page 16: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/16.jpg)
Semi-CRFs
+ score(Berkeley, CA)+ score(- A bowl of )+ score(Churchill-Brenneis Orchards)+ score(Page mandarins and medjool dates)
score(Chez Panisse)
![Page 17: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/17.jpg)
Features
= w(starts-with-Chez)+w(starts-with-C…)+w(ends-with-P…)+w(starts-sentence)+w(shape:Xxx Xxx) +w(two-words)+w(in-gazetteer)
score(Chez Panisse)
![Page 18: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/18.jpg)
Building your own featuresval dsl = new WordFeaturizer.DSL[L](counts) with SurfaceFeaturizer.DSL import dsl._
word(begin) // word at the beginning of the span + word(end – 1) // end of the span + word(begin – 1) // before (gets things like Mr.) + word (end) // one past the end + prefixes(begin) // prefixes up to some length + suffixes(begin) + length(begin, end) // names tend to be 1-3 words + gazetteer(begin, end)
![Page 19: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/19.jpg)
Using your own featurizer> val data: IndexedSeq[Segmentation[Label, String]]
= ???> val myFeaturizer = ???> val system = SemiCRF.buildSimple(data,
startLabel, outsideLabel,
featurizer = myFeaturizer)
![Page 20: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/20.jpg)
Features• So far, we’ve been able to do everything with
(nearly) no math.• To understand more, need to do some math.
![Page 21: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/21.jpg)
Machine Learning Primer• Training example (x, y)– x: sentence of some sort– y: labeled version
• Goal:– want score(x, y) > score(x, y’), forall y’.
![Page 22: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/22.jpg)
Machine Learning Primer
score(x, y) = wTf(x, y)
![Page 23: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/23.jpg)
Machine Learning Primer
score(x, y) = w.t * f(x, y)
![Page 24: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/24.jpg)
Machine Learning Primer
score(x, y) = w dot f(x, y)
![Page 25: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/25.jpg)
Machine Learning Primer
score(x, y) >= score(x, y’)
![Page 26: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/26.jpg)
Machine Learning Primer
w dot f(x, y) >= w dot f(x, y’)
![Page 27: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/27.jpg)
Machine Learning Primer
w dot f(x, y) >= w dot f(x, y’)
![Page 28: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/28.jpg)
Machine Learning Primer
f(x,y)
f(x,y’)
w
w + f(x,y)
(w + f(x,y)) dot f(x,y) >= w dot f(x,y)
![Page 29: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/29.jpg)
The Perceptron> val featureIndex : Index[Feature] = ???> val labelIndex: Index[Label] = ???> val weights = DenseVector.rand[Double]
(featureIndex.size)
> for ( epoch <- 0 until numEpochs; (x, y) <- data ) { val labelScores = DV.tabulate(labelIndex.size) { yy =>
val features = featuresFor(x, yy)weights.t * new FeatureVector(indexed)// or weights dot new FeatureVector(indexed)
} …}
![Page 30: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/30.jpg)
The Perceptron (cont’d)> val featureIndex : Index[Feature] = ???> val labelIndex: Index[Label] = ???> val weights = DenseVector.rand[Double](featureIndex.size)
> for ( epoch <- 0 until numEpochs; (x, y) <- data ) { val labelScores = ... val y_best = argmax(labelScores)
if (y != y_best) {weights += new FeatureVector(featuresFor(x,
y))weights -= new FeaureVector(featuresFor(x,
y_best)) }}
![Page 31: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/31.jpg)
Structured Perceptron• Can’t enumerate all segmentations! (L * 2n)• But dynamic programs exist to max or sum• … if the feature function has a nice form.
![Page 32: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/32.jpg)
Structured Perceptron> val featureIndex : Index[Feature] = ???> val labelIndex: Index[Label] = ???> val weights = DenseVector.rand[Double]
(featureIndex.size)
> for ( epoch <- 0 until numEpochs; (x, y) <- data ) { val y_best = bestStructure(weights, x) if (y != y_best) { weights += featuresFor(x, y) weights -= featuresFor(x, y_best) } }
![Page 33: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/33.jpg)
Constituency Parsing
Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.
PP
NP
VP
VPNP
V S
VP
S
![Page 34: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/34.jpg)
Multilingual Parser
Avg Ar Ba Fr Ge HeHun Ko Po
Swe
70
75
80
85
90
95"Berkeley"Epic
“Berkeley:” [Petrov & Klein, 2007]; Epic [Hall, Durrett, and Klein, 2014]
![Page 35: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/35.jpg)
Epic Pre-built Models• Parsing
– English, Basque, French, German, Swedish, Polish, Korean
– (working on Arabic, Chinese, Spanish)
• Part-of-Speech Tagging– English, Basque, French, German, Swedish,
Polish
• Named Entity Recognition– English
• Sentence segmentation– English – (ok support for others)
• Tokenization– Above languages
![Page 36: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/36.jpg)
![Page 37: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/37.jpg)
What Is Breeze?
Dense Vectors, Matrices, Sparse Vectors,Counters, Matrix Decompositions
![Page 38: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/38.jpg)
What Is Breeze?
Nonlinear Optimization,Probability Distributions
![Page 39: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/39.jpg)
Getting startedlibraryDependencies ++= Seq( "org.scalanlp" %% "breeze" % "0.8.1",
// optional: native linear algebra libraries // bulky, but faster "org.scalanlp" %% "breeze-natives" % "0.8.1")
scalaVersion := "2.11.1"
![Page 40: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/40.jpg)
Linear Algebra> import breeze.linalg._> val x = DenseVector.zeros[Int](5)
// DenseVector(0, 0, 0, 0, 0)
> val m = DenseMatrix.zeros[Int](5,5)
> val r = DenseMatrix.rand(5,5)
> m.t // transpose> x + x // addition> m * x // multiplication by vector> m * 3 // by scalar> m * m // by matrix> m :* m // element wise mult, Matlab .*
![Page 41: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/41.jpg)
Return Type Selection> import breeze.linalg.{DenseVector => DV}
> val dv = DV(1.0, 2.0)
> val sv = SparseVector.zeros[Double](2)
> dv + sv// res: DenseVector[Double] = DenseVector(1.0, 2.0)
> sv + sv// res: SparseVector[Double] = SparseVector(1.0, 2.0)
![Page 42: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/42.jpg)
Return Type Selection> import breeze.linalg.{DenseVector => DV}
> val dv = DV(1.0, 2.0)
> val sv = SparseVector.zeros[Double](2)
> (dv: Vector[Double]) + (sv: Vector[Double])// res: Vector[Double] = DenseVector(1.0, 2.0)
> (sv: Vector[Double]) + (sv: Vector[Double])// res: Vector[Double] = SparseVector()
Static: VectorDynamic: Dense
Static: VectorDynamic: Sparse
![Page 43: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/43.jpg)
Linear Algebra: Slices> val m = DenseMatrix.zeros[Int](5, 5)
> m(::,1) // slice a column// DV(0, 0, 0, 0, 0)
> m(4,::) // slice a row
> m(4,::) := DV(1,2,3,4,5).t
> m.toString0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5
![Page 44: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/44.jpg)
Linear Algebra: Slices> m(0 to 1, 3 to 4).toString
0 0 2 3
> m(IndexedSeq(3,1,4,2),IndexedSeq(4,4,3,1))
0 0 0 0 0 0 0 0 5 5 4 2 0 0 0 0
![Page 45: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/45.jpg)
Universal Functions> import breeze.numerics._
> log(DenseVector(1.0, 2.0, 3.0, 4.0))// DenseVector(0.0, 0.6931471805599453, // 1.0986122886681098, 1.3862943611198906)
> exp(DenseMatrix( (1.0, 2.0), (3.0, 4.0)))
> sin(Array(2.0, 3.0, 4.0, 42.0))
> // also sin, cos, sqrt, asin, floor, round, digamma, trigamma, ...
![Page 46: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/46.jpg)
UFuncs: In-Place> import breeze.numerics._
> val v = DV(1.0, 2.0, 3.0, 4.0)
> log.inPlace(v)// v == DenseVector(0.0, 0.6931471805599453, // 1.0986122886681098, 1.3862943611198906)
![Page 47: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/47.jpg)
UFuncs: Reductions> sum(DenseVector(1.0, 2.0, 3.0))
// 6.0> sum(DenseVector(1, 2, 3))
// 6> mean(DenseVector(1.0, 2.0, 3.0))
// 2.0> mean(DenseMatrix( (1.0, 2.0, 3.0)
(4.0, 5.0, 6.0)))// 3.5
![Page 48: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/48.jpg)
UFuncs: Reductions> val m = DenseMatrix((1.0, 3.0), (4.0, 4.0))
> sum(m) == 12.0
> mean(m) == 3.0
> sum(m(*, ::)) == DenseVector(4.0, 8.0)> sum(m(::, *)) == DenseMatrix((5.0, 7.0))
> mean(m(*, ::)) == DenseVector(2.0, 4.0)> mean(m(::, *)) == DenseMatrix((2.5, 3.5))
![Page 49: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/49.jpg)
Broadcasting> val dm = DenseMatrix((1.0,2.0,3.0), (4.0,5.0,6.0))
> sum(dm(::, *)) == DenseMatrix((5.0, 7.0, 9.0))
> dm(::, *) + DenseVector(3.0, 4.0)// DenseMatrix((4.0, 5.0, 6.0), (8.0, 9.0, 10.0)))
> dm(::, *) := DenseVector(3.0, 4.0)// DenseMatrix((3.0, 3.0, 3.0), (4.0, 4.0, 4.0))
![Page 50: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/50.jpg)
UFuncs: Implementation> object log extends UFunc
> implicit object logDouble extends log.Impl[Double, Double] { def apply(x: Double) = scala.math.log(x)
}
> log(1.0) // == 0.0
> // elsewhere: magic to automatically make this work:> log(DenseVector(1.0)) // == DenseVector(0.0)> log(DenseMatrix(1.0)) // == DenseMatrix(0.0)> log(SparseVector(1.0)) // == SparseVector()
![Page 51: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/51.jpg)
UFuncs: Implementation> object add1 extends UFunc
> implicit object add1Double extends add1.Impl[Double, Double] { def apply(x: Double) = x + 1.0
}
> add1(1.0) // == 2.0
> // elsewhere: magic to automatically make this work:> add1(DenseVector(1.0)) // == DenseVector(2.0)> add1(DenseMatrix(1.0)) // == DenseMatrix(2.0)> add1(SparseVector(1.0)) // == SparseVector(2.0)
![Page 52: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/52.jpg)
Breeze Benchmarks
Breeze jblas Mahout Apache Commons0
500
1000
1500
2000
2500
3000
135.04
685.5
2626
2151
Singular Value Decomposition
Lower is better
![Page 53: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/53.jpg)
Breeze Benchmarks
Breeze jblas Mahout Apache Commons0
500
1000
1500
2000
2500
3000
26
2562
56
Sparse/Dense Multiply
Lower is better
N/A
![Page 54: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/54.jpg)
Breeze Benchmarks
Breeze jblas Mahout Apache Commons0
5
10
15
20
25
30
0.037 0.075
25.376
Sparse/Dense Addition
Lower is better
N/A
![Page 55: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/55.jpg)
Nonnegative Matrix Factorization
V W H≈ Vij, Wij, Hij >= 0
![Page 56: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/56.jpg)
Nonnegative Matrix Factorization
• Input: matrix V• Output: W * H ≈ V, W and H ≥ 0
[Lee and Seung, 2011]
![Page 57: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/57.jpg)
Breeze NMF val n = V.rows val m = V.cols val r = dims val W = DenseMatrix.rand(n, r) val H = DenseMatrix.rand(r, m)
for(i <- 0 until iters) { H :*= ((W.t * V) :/= (W.t * W * H)) W :*= ((V * H.t) :/= (W * H * H.t)) }
(W, H)
![Page 58: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/58.jpg)
From Breeze to Gust
+
![Page 59: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/59.jpg)
Breeze NMF val n = X.rows val m = X.cols val r = dims val W = DenseMatrix.rand(n, r) val H = DenseMatrix.rand(r, m)
for(i <- 0 until iters) { H :*= ((W.t * X) :/= (W.t * W * H)) W :*= ((X * H.t) :/= (W * H * H.t)) }
(W, H)
![Page 60: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/60.jpg)
Gust NMF val n = X.rows val m = X.cols val r = dims val W = CuMatrix.rand(n, r) val H = CuMatrix.rand(r, m)
for(i <- 0 until iters) { H :*= ((W.t * X) :/= (W.t * W * H)) W :*= ((X * H.t) :/= (W * H * H.t)) }
(W, H)
≥6x speedup on my laptop!
![Page 61: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/61.jpg)
Optimization
(x2 + y – 11)2 + (x + y2 – 7)2
![Page 62: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/62.jpg)
Optimization
trait DiffFunction[T] extends (T=>Double) { /** Calculates both the value
and the gradient at a point */ def calculate(x:T):(Double,T)}
![Page 63: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/63.jpg)
Optimization
val df = new DiffFunction[DV[Double]] { def calculate(values: DV[Double]) = { val gradient = DV.zeros[Double](2) val (x,y) = (values(0),values(1)) val value = pow(x * x + y - 11, 2) + pow(x + y * y - 7, 2) gradient(0) = 4 * x * (x * x + y - 11) + 2 * (x + y * y - 7) gradient(1) = 2 * (x * x + y - 11) + 4 * y * (x + y * y - 7)
(value, gradient)
}}
![Page 64: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/64.jpg)
Optimizationbreeze.optimize.minimize(df, DV(0.0, 0.0))
// DenseVector(3.0000000000012905, 1.999999999997128)
![Page 65: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/65.jpg)
Optimization• Fully configurable• Support for:– LBFGS (+ L2/L1 Regularization)– Stochastic Gradient (+ L2/L1 Regularization)– Truncated Newton– Linear Programs– Conjugate Gradient Descent
![Page 66: NLP in Scala with Breeze and Epic David Hall UC Berkeley](https://reader035.vdocuments.us/reader035/viewer/2022062313/56649d365503460f94a0da91/html5/thumbnails/66.jpg)
Thanks!
Breeze Epic Puck
www.github.com/dlwh/{breeze, epic, puck}