scala collections wizardry - scalapeño
TRANSCRIPT
Scala collections
Sagie Davidovich
@mesagie
singularityworld.com
linkedin.com/in/sagied
Warm up example:
Fibonacci sequenceval fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _)
Key concepts:
• Recursive values
• Streams
• Scan
• Binary place-holder notation
Immutable collections
You’ll know about
• Avoid memory allocation for empty collections
• Optimize for small collections
• Equal-hashCode contract
• Asymptotic behavior of common operations
NilList.empty and Nil are singletons.
No new memory is allocated
Option[A]
Immutable Sets – emptySetemptySet is a singleton too
Immutable Sets – Set1Optimized for sets of size 1
Immutable Sets – Set2Optimized for sets of size 2
Immutable Sets – Set4A HashSet is (finally) instantiated
Immutable Collections
Mutable Collections
One liners
Computing a derivative
def derivative(nums: Iterable[Double]) =
nums.sliding(2)
.map (pair => pair._2 - pair._1)
What can be improved in this solution?
Bonus question: change a few characters to find the max slope
Counting occurrences (histogram)
"encyclopedia" groupBy identity mapValues (_.size)
Map (
e -> 2, n -> 1, y -> 1, a -> 1, i -> 1,
l -> 1, p -> 1, c -> 2, o -> 1, d -> 1
)
Word n-grams
val range = 1 to 3
val text = "hello sweet world"
val tokenize = (s: String) => s.split(" ")
range flatMap (size => tokenize(text) sliding size)
Result:
Vector(Array(hello), Array(sweet), Array(world), Array(hello, sweet), Array(sweet, world), Array(hello, sweet, world))
Are all members of a greater than corresponding members of b
val a = List(2,3,4)
val b = List(1,2,3)
// O(n^2) and not very elegant.
(0 until a.size) forall (i => a(i) > b(i))
// O(n) but creates tuples and a temporary list. Yet, more elegant.
a zip b forall (x=> x._1 > x._2)
// same as above but doesn't create a temporary list (lazy)
a.view zip b forall (x=> x._1 > x._2)
// O(n), without tuple or temporary list creation, and even more elegant.
(a corresponds b)(_ > _)
Strings are collections. How come?
“abc”.max
@inline implicit def augmentString(x: String) = new StringOps(x)
String <% StringOps <: StringLike <: IndexedSeqOptimized …
Complexity of collection operations
• Linear:
– Unary: O(n):
• Mappers: map, collect
• Reducers: reduce, foldLeft, foldRight
• Others: foreach, filter, indexOf, reverse, find, mkString
– Binary: O(n+ m):
• union, diff, and intersect
Immutable Collections time complexity
head tail apply update prepend appendList C C L L C LStream C C L L C LVector eC eC eC eC eC eCStack C C L L C LQueue aC aC L L L CRange C C C - - -String C L C L L L
Mutable Collections time complexity
head tail apply update prepend append insert
ArrayBuffer C L C C L aC LListBuffer C L L L C C LStringBuilde
r C L C C L aC LMutableList C L L L C C LQueue C L L L C C LArraySeq C L C C - - -Stack C L L L C L LArrayStack C L C C aC L LArray C L C C - - -
Bonus question
What’s the complexity of Range.sum?
Range
Equals-hashCode contract
(a equals b) (a.hashCode == b.hashCode)
All Scala collection implement the contract
Bad idea: Set[Array[Int]]
Good idea: Set[Vector[Int]]
Bad Idea: Set[ArrayBuffer[Int]]
Bad Idea: Set[collection.mutable._]
Good Idea: Set[collection.immutable._]
More on collections equality
val (a, b) = (1 to 3, List(1, 2, 3))
a == b // true
Q: Wait, how efficient is Range.hashCode?
A: O(n)override def hashCode = util.hashing.MurmurHash3.seqHash(seq)
Challenge yourself:
Is there a closed (o(1)) formula for a range hashCode?
Java interoperability
Implicit (less boilerplate):
import collection.javaConversions._
javaCollection.filter(…)
Explicit (better control):
Import collection.javaConverters._
javaCollection.asScala.filter(…)
scalaCollection.asJava
The power of type-level programminggraph path-finding in compile time
import scala.language.implicitConversions
// Verticescase class A(l: List[Char])case class B(l: List[Char])case class C(l: List[Char])case class D(l: List[Char])case class E(l: List[Char])
// Edgesimplicit def ad[A1 <% A](x: A1) = D(x.l :+ 'A')implicit def bc[B1 <% B](x: B1) = C(x.l :+ 'B')implicit def ce[C1 <% C](x: C1) = E(x.l :+ 'C')implicit def ea[E1 <% E](x: E1) = A(x.l :+ 'E')
def pathFrom(end:D) = end
pathFrom(B(Nil)) // res0: D = D(List(B, C, E, A))
Want to go Pro?
• Shapeless (Miles Sabin)
– Polytypic programming & Heterogenous lists
– github.com/milessabin/shapeless
• Scalaxy (Olivier Chafik)
– Macros for boosting performance of collections
– github.com/ochafik/Scalaxy