Mutable Ideas

Sample RestFB usage to access Facebook Graph

VRIO, the VRIO framework, is an internal tool of analysis in the context of business management. VRIO is an acronym for the four question framework you ask about a resource or capability to…

Blended characters are indexed both as separators and valid characters. For instance, assume that & is configured as blended and AT&T; occurs in an indexed document. Three different keywords will…

#StrangeLoop : Simple Made Easy

Ref: https://thestrangeloop.com/sessions/simple-made-easy
Rich Hickey

Simplicity is prerequisite for reliability

Applying Simple (one fold/braid):

  • One role
  • One task
  • One concept
  • One dimension

But not

  • One instance
  • One operation

About lack of interleaving not cardinality

Objective

Applying Easy (lie near)

  • At hand: on our HD, in our toolset, IDE, apt, gem install …
  • Near to understanding/skill set (familiar)
  • Near our capabilities
  • Easy is relative!!! (for whom?)

I gonna use it because is simple ==> easy ==> for me, because is familiar. So the objective conversation is lost.

Construct vs. Artifact

We focus on experience of use of construct

  • progr. convenience
  • progr. replaceability

Rather than the long term results of use

  • soft. quality, correctness
  • maintenance, change

We must access constructors by their artifacts

Limits

We can only hope to make reliable those things we can understand

Consider a few things at a time

Intertwined things must be considered together

Complexity undermines our understanding

Change

Change req. analysis & decisions

What will be impacted?

Where it need to be made?

ability to reason abt your program is critical to changing w/o fail

  • I can make changes because I have tests.
  • vs.
  • Who drives around banging their car on every guard-rails 

Development Speed

Emphasizing ease gives early speed

Ignoring complexity you will slow you down.

Many complicating constructs are:

  • succinctly describe
  • familiar
  • available 

Any complexity it yields are taken as accidental

Making things Easy

  • Bring to hand by installing
  • Become familiar by learning, trying
  • But mental capability?
  • > you’re not put your brain near our complexity
  • > you need to simplifying then

“Parens are Hard!”

  • Not at hand for most (IDE)
  • Nor familiar
  • But are they simple?
  • Not in CL/Scheme
  • > overloaded for calls
  • > for those that bored trying, this is a valid complexity complaint

What’s in your TOOLKIT? [foto]

Complect (to braid together) <> Compose (to place together)

Partitioning and stratification don’t imply simplicity, but are enabled by it

State is never simple

  • Complects value and time
  • It is easy, in the at-hand and familiar sense

Clojure & Haskell refs compose value and time

Environmental complexity: Individual polices don’t compose.

Abstraction => draw away from phisical (Interfaces)

Who, What, When, Where, Why and How

  • What: Operations, Small sets. (Don’t complect with: How)
  • Who: build many subcomponents (Don’t complect with: details/dependencies)
  • How: Implement logic. Connect to abstractions and entities via polymorphisms.

Information IS simple

  • Don’t ruin it
  • By hiding it behind a micro-language
  • Represent data as data!

Simplicity is the ultimate sophistication - Leonardo da Vinci

#StrangeLoop : Product Engineering

Ref: https://thestrangeloop.com/sessions/product-engineering

Product Idea ===> ????? ===> Profit

3 Traits of prod. engineering

  • Over-arching
  • Top-Down
  • Empathetic

3 Laws

  • You can force ppl
  • New must be better old

Product is found in intersect Liberal Arts x Tech (recursive from the base of word technology).

Prod. Eng: is about ppl: LAZY, STUPID, IMPATIENT, SELFISH

Sharpen Focus: 

Great products: 80% boring + 20% revolucionary

Path from: IPOD 1 to IPAD … simplistic and boring

Originality is not as important as quality

Ppl are in love w/ their own ideas, have to let go your ideas!!!

  1. “What problem areyou trying to solve?”
  2. How does it solve the prob?
  3. Is it actually better?
  4. Why?

Consider your customers, system are not design for computers, but for ppl.

  • Start at the end by making a commercial - so you can explain what prob solves and show for customers, hiring ppl, investors, etc
  • Get it right 1st, don’t spawn to others platforms (ie: iOS, Android)
  • You cannot adapt culture, you must adapt to the platform
  • Building a Team, is also on mutli-platform

n plataforms = n+1 teams

your best product tester, is not your mother, but your archenemies

Build the product your test user expects.

Real Art Ship: Plan, Design, Ship on time

Cut feature, not quality

Shipping the rough draft is an amateur mistake (parallelism with novel writers)

Fear Social Debit as mucho or more than Tech Debt

When is ready? When is good enough? Show something to someone and receive emotional response. (the response / hook)

Hook is the thing that hook ppl to make ppl buy your product!

Join the community: we’re more alike then differente.

http://appsterdam.rs (get families together to understand the behavior of app makers …)

#StrangeLoop : Have Your Cake and Eat It Too: Meta-Programming Java

Ref: https://thestrangeloop.com/sessions/have-your-cake-and-eat-it-too-meta-programming-java

Howard Lewis is the creator of http://tapestry.apache.org/
http://howardlewisship.com/

Meta-programming is:

  • Code-reuse
  • but without inheritance
  • remove boilerplate

memoize: is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs

AspectJ was the promise in 2003.

Classic java: The solution to overwhelming problems is to add complexity to it.

ASM 3.3.1 = to create bytecode on the fly.

Leaky Abstraction can be a problem, there are load manager for it.

Meeting Plastic: http://java.dzone.com/articles/meeting-plastic-simple

Platic Demos: https://github.com/hlship/plastic-demos

Testing your PlasticClasses with Spock: http://code.google.com/p/spock/ 

Flexible API design: NameConvention (ceremony vs. essence)

Common usage: Annotate fields and change values


#StrangeLoop : Distributed Data Analysis with Hadoop and R

Ref: https://thestrangeloop.com/sessions/distributed-data-analysis-with-hadoop-and-r
Source code: https://github.com/jseidman/hadoop-R
Slides-deck: http://www.slideshare.net/jseidman

Visualization Poster Competition: http://stat-computer.org/dataexpo/2009

Using hadoop download tom make made quicker!

Histogram Distribution (lattice library)

Hadoop Streaming

Acces Hadoop interactive (hive) from the R environment.

  • Seems to still have some roughs

RHipe

  • Active community
  • Based on Hadoop streaming source
  • Can be somewhat confusing and intimidating (take the time to understand it, worths!)
  • Uses GBP (google buffer protocols)
  • must be installed on all your nodes
  • Warning: Diff from Java API: Reduce gonna be called until no more values to that key

rmr (introduced by: http://www.revolutionanalytics.com/)

  • “stay true to MR and true to R”
  • need more samples, but have interesting things like K-means

Segue 

  • Running multiple simulations in parallel
  • Runs on EMR (not internal clusters)

Apache Mahout

Ricardo (research project at IBM), paper is good, but no source available

Avoid using R for general purpose MR use.

How to test outside of Hadoop

cat DATAFILE | ./map.R | sort | ./reduce.R

Book: R meets Big Data - a basket of strategies to help you use R for large-scale analysis and computation.

#StrangeLoop : Distributed Systems: The Stuff Nobody Told (greplin)

Ref: https://thestrangeloop.com/sessions/distributed-systems-the-stuff-nobody-told-you

Adds 5.000 docs/sec: greplin > twitter
50 servers! 

How to deal with Distributed Systems:

  • We’re admitted we’re powerless
  • Great power greater than ourselves could solve it

- Zookeeper
Live saver! Create cluster and uses it as a whole thing.
http://research.yahoo.com/pub/3274

- Nagios
Handle complexity of configuring and maintaing

Make an inventory of our system!
Yammer Metrics: https://github.com/codahale/metrics
Graphite: Performs charting

- Greplin exception catcher
Capture exceptions among all your servers. Unifying back tracers.

Add tracks for everything! It should been

Made a list we had harmed and amends by fixing underlying problems.

  • fixit Friday
  • 3 day call (emergency responsable)
  • internal libraries => make ease to use it, ppl will use it. 
  • Greplin Exception Catcher: monitors all user-facing 500s 

 Avoid Introducing Accidental Complexing

  • DNS takes forever to updates (DNS == EVIL). Moved everything to Zookeper.
  • Amazon Elastic Block Store == EVIL.

Lucene own implementation, each user has his own Lucene Index, allowing them to “warm-up” an specific user.

RabbitMQ couldn’t keep up with our throughput

Redis Master + 2 Slaves becomes a better solutions, we monitor how long SLAVES are behind of MASTER, if it’s more than 100ms someone gets paged!

Processing pipeline:

WebCrawalers ==> 
Redis Queue ==>
Analyzers (PDF extraction, add social info to email) ==>
Sharded Lucene Indexes (00’s servers).

Production Blog to notify about what’s going on in production!!!

  • yammer metrics
  • riak
  • redis

Zookeper Utils: https://github.com/Greplin/greplin-zookeeper-utils

Nagios-Utils: https://github.com/Greplin/greplin-nagios-utils

@smanek
 shaneal@greplin.com

#StrangeLoop : We Really Don’t Know How to Compute

Ref: https://thestrangeloop.com/sessions/we-really-dont-know-how-to-compute

Gene compact (1GB) and flexible (from human to a cow w/ few changes)

World of computation are expensive, software are not been optimized: “memory is free”.

Reasonable answer!!!

Since ASM you could make your program create program.

Monad allows you to carry information and other information as a parallel plumming.

#StrangeLoop : Transactions without Transactions

Ref: https://thestrangeloop.com/sessions/transactions-without-transactions

Transactions in real life doesn’t works as with simplistic sample of a transfer between bank accounts.

Transactions Procedure

  • preserves invariants at the beginning and the end (but not necessarily during) of the process
  • decompose problems into a set of atomic work units

Paper: Gregor Hohpe, Your coffe shop doesn’t use two-phase commit.
(pages 64-66)

Jim Gray. The transaction concept: Virtues and Limitations, June 1981.
Compensatory transaction as an app-level reaction to a failed invariant.

Requirements:

  • Atomic R, W, M semantics across a single unit.
  • Durable W
  • Consistents R
  • App: provides soft. isolation

Paper: Building transactions with google big table (percolator)

[Selling tickets slides]

Soft. isolation frameworks or patterns???

Event Sourcing
Capture all changes to an application state as a sequence of events.

http://martinfowler.com/eaaDev/EventSourcing.html 

#StrangeLoop : Distributed STM: A new programming model for the

Ref: https://thestrangeloop.com/sessions/distributed-stm-a-new-programming-model-for-the-cloud

Transaction Memory: like db but in memory, CAS.

Strengths: avoid locks, composable, rollback
Weakness: Performance, Interactions IO, Debugging

ObjectFabric Solution: As a SCM, using snapshot and merge

DTM: Developers only declare intents
“replicate this with server”
“make this durable”

Get Started: https://github.com/ObjectFabric/ObjectFabric

Two ways to model the world:

  • Shared state mutated by transactions
  • Immutable messages between stateless processors 

Instead of using messages, use a graph of objects replicating your business domain and sincronize it through STM, listen to callbacks and updates.

Conflicts mgmt: currently only supports ABORT

#StrangeLoop : Scalaz

Ref: https://thestrangeloop.com/sessions/scalaz-purely-functional-programming-in-scala
http://scalaz.org

Compositionality:  Understand the parts you can understand the whole

ScalaZ roadshow:

=== equality type-safe

Monoids compositions are interesting!

Useful to sum maps and trees:
Monoid[v] => Monoid[Map[K,V]]

Validation, you can use them composed and validate all info at once.

ValidationNEL: return a list w/ all failures or the result

|@| ==> Oinc operator (looks like a pig ;) 

Applicative Functors: any M[A] can be composed with |@| if there is Applicative[M] in implicite scope. You can create you own compositions

State: Can be used to IO

#StrangeLoop : Storm: Twitter’s scalable realtime computation system

Ref: https://thestrangeloop.com/sessions/storm-twitters-scalable-realtime-computation-system

Nathan Marz, Twitter

Messages locality makes sense for batch updates

[twitter diagram 1]

Consistency hashing => take the hash and MOD for the # queues and then all will be at same Q. - Doesn’t scale: it creates a problem to increase Workers, must reconf all workers that are pushing to Qs.

RT == Constantly availability, Hadoop can’t guarantee this!

Storm Use Cases

  • Stream processing
  • Distributed RPC
  • Continuous computations

Storm Concepts

  • Streams (Tupple - Unbonded seq)
  • Spouts (Source of Streams, ie: Kestrel, Stream API)
  • Bouts (Process In-streams, and generates out-streams)
  • Functions
  • Filtering
  • Topologies (How they subscribe)

Compute Potential Reach of a URL => Computation intensive

No Q on Storm, everything goes directly through 0MQ.

Storm tracks tuple trees for you.

https://github.com/nathanmarz/storm
https://github.com/nathanmarz/storm-starter

How to install ZeroMQ/JZMQ in MacOSX: 
http://blog.pmorelli.com/getting-zeromq-and-jzmq-running-on-mac-os-x

#StrangeLoop : Functional Thinking

Ref:
https://thestrangeloop.com/sessions/functional-thinking
http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html

Objects took more than a decade to be mainstream, because ppl look to then and seen weardness.

FP makes code understandable by minimizing moving parts.

FP thinks about results not steps.
SQL: is more likely it, you don’t tell to server where to find the data just what you want as result.

FP Concepts

  • 1st class/higher-order functions
  • pure functions
  • strict evaluation
  • recursion

Functional Java

p => means predicted (returns T/F)

Recursive FP you don’t have to mind about moving parts, variables and states.

How to begin FP:

Immutability over state transitions
http://www.ibm.com/developerworks/java/library/j-jtp02183/index.html

Immutability score: Try to change all your vars to final and recompile.

declarative over imperative

paradigm over tool => you can start now

[book ref: http://www.amazon.com/Productive-Programmer-Theory-Practice-OReilly/dp/0596519788]

#StrangeLoop : Category Theory, Monads, and Duality in (Big) Data

Ref: https://thestrangeloop.com/sessions/category-theory-monads-and-duality-in-big-data

Bigdata is not about the data size.

NoSQL and SQL are not at oposites sides —> coSQL. They are each other complement.

that co-exists in harmony and can transmute into each other (ORM mapper)

open => NoSQL, closed => SQL

“I do consider assignments statements and pointer variables to be among CS most treasures.” Donald Knuth

We have been using K/V forever, pointer in C for ex.

Problems with SQL:

Query denormalizaded

Query can only return a single base

NULL isn’t arbitrary:

1+NULL = NULL
SUM(1, NULL) = 1 

LINQ/ORM: You want to deal your relation data as it was an Object

Putting your data into Databases is like rolling a rock up and let it roll down mountain all day long … [Naranath the Lunatic]

Mathematics Category

Category Theory for amateurs

Interface for function, now functions become more abstract.

Consequences of duality

Fork me on GitHub