Sample RestFB usage to access Facebook Graph
VRIO, the VRIO framework, is an internal tool of analysis in the context of business management. VRIO is an acronym for the four question framework you ask about a resource or capability to…
Blended characters are indexed both as separators and valid characters. For instance, assume that & is configured as blended and AT&T; occurs in an indexed document. Three different keywords will…
Ref: https://thestrangeloop.com/sessions/simple-made-easy
Rich Hickey
Simplicity is prerequisite for reliability
Applying Simple (one fold/braid):
- One role
- One task
- One concept
- One dimension
But not
- One instance
- One operation
About lack of interleaving not cardinality
Objective
Applying Easy (lie near)
- At hand: on our HD, in our toolset, IDE, apt, gem install …
- Near to understanding/skill set (familiar)
- Near our capabilities
- Easy is relative!!! (for whom?)
I gonna use it because is simple ==> easy ==> for me, because is familiar. So the objective conversation is lost.
Construct vs. Artifact
We focus on experience of use of construct
- progr. convenience
- progr. replaceability
Rather than the long term results of use
- soft. quality, correctness
- maintenance, change
We must access constructors by their artifacts
Limits
We can only hope to make reliable those things we can understand
Consider a few things at a time
Intertwined things must be considered together
Complexity undermines our understanding
Change
Change req. analysis & decisions
What will be impacted?
Where it need to be made?
ability to reason abt your program is critical to changing w/o fail
- I can make changes because I have tests.
- vs.
- Who drives around banging their car on every guard-rails
Development Speed
Emphasizing ease gives early speed
Ignoring complexity you will slow you down.
Many complicating constructs are:
- succinctly describe
- familiar
- available
Any complexity it yields are taken as accidental
Making things Easy
- Bring to hand by installing
- Become familiar by learning, trying
- But mental capability?
- > you’re not put your brain near our complexity
- > you need to simplifying then
“Parens are Hard!”
- Not at hand for most (IDE)
- Nor familiar
- But are they simple?
- Not in CL/Scheme
- > overloaded for calls
- > for those that bored trying, this is a valid complexity complaint
What’s in your TOOLKIT? [foto]
Complect (to braid together) <> Compose (to place together)
Partitioning and stratification don’t imply simplicity, but are enabled by it
State is never simple
- Complects value and time
- It is easy, in the at-hand and familiar sense
Clojure & Haskell refs compose value and time
Environmental complexity: Individual polices don’t compose.
Abstraction => draw away from phisical (Interfaces)
Who, What, When, Where, Why and How
- What: Operations, Small sets. (Don’t complect with: How)
- Who: build many subcomponents (Don’t complect with: details/dependencies)
- How: Implement logic. Connect to abstractions and entities via polymorphisms.
Information IS simple
- Don’t ruin it
- By hiding it behind a micro-language
- Represent data as data!
Simplicity is the ultimate sophistication - Leonardo da Vinci
Ref: https://thestrangeloop.com/sessions/product-engineering
Product Idea ===> ????? ===> Profit
3 Traits of prod. engineering
- Over-arching
- Top-Down
- Empathetic
3 Laws
- You can force ppl
- New must be better old
Product is found in intersect Liberal Arts x Tech (recursive from the base of word technology).
Prod. Eng: is about ppl: LAZY, STUPID, IMPATIENT, SELFISH
Sharpen Focus:
Great products: 80% boring + 20% revolucionary
Path from: IPOD 1 to IPAD … simplistic and boring
Originality is not as important as quality
Ppl are in love w/ their own ideas, have to let go your ideas!!!
- “What problem areyou trying to solve?”
- How does it solve the prob?
- Is it actually better?
- Why?
Consider your customers, system are not design for computers, but for ppl.
- Start at the end by making a commercial - so you can explain what prob solves and show for customers, hiring ppl, investors, etc
- Get it right 1st, don’t spawn to others platforms (ie: iOS, Android)
- You cannot adapt culture, you must adapt to the platform
- Building a Team, is also on mutli-platform
n plataforms = n+1 teams
your best product tester, is not your mother, but your archenemies
Build the product your test user expects.
Real Art Ship: Plan, Design, Ship on time
Cut feature, not quality
Shipping the rough draft is an amateur mistake (parallelism with novel writers)
Fear Social Debit as mucho or more than Tech Debt
When is ready? When is good enough? Show something to someone and receive emotional response. (the response / hook)
Hook is the thing that hook ppl to make ppl buy your product!
Join the community: we’re more alike then differente.
http://appsterdam.rs (get families together to understand the behavior of app makers …)
Ref: https://thestrangeloop.com/sessions/have-your-cake-and-eat-it-too-meta-programming-java
Howard Lewis is the creator of http://tapestry.apache.org/
http://howardlewisship.com/
Meta-programming is:
- Code-reuse
- but without inheritance
- remove boilerplate
memoize: is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs
AspectJ was the promise in 2003.
Classic java: The solution to overwhelming problems is to add complexity to it.
ASM 3.3.1 = to create bytecode on the fly.
Leaky Abstraction can be a problem, there are load manager for it.
Meeting Plastic: http://java.dzone.com/articles/meeting-plastic-simple
Platic Demos: https://github.com/hlship/plastic-demos
Testing your PlasticClasses with Spock: http://code.google.com/p/spock/
Flexible API design: NameConvention (ceremony vs. essence)
Common usage: Annotate fields and change values
Ref: https://thestrangeloop.com/sessions/distributed-data-analysis-with-hadoop-and-r
Source code: https://github.com/jseidman/hadoop-R
Slides-deck: http://www.slideshare.net/jseidman
Visualization Poster Competition: http://stat-computer.org/dataexpo/2009
Using hadoop download tom make made quicker!
Histogram Distribution (lattice library)
Hadoop Streaming
Acces Hadoop interactive (hive) from the R environment.
- Seems to still have some roughs
RHipe
- Active community
- Based on Hadoop streaming source
- Can be somewhat confusing and intimidating (take the time to understand it, worths!)
- Uses GBP (google buffer protocols)
- must be installed on all your nodes
- Warning: Diff from Java API: Reduce gonna be called until no more values to that key
rmr (introduced by: http://www.revolutionanalytics.com/)
- “stay true to MR and true to R”
- need more samples, but have interesting things like K-means
Segue
- Running multiple simulations in parallel
- Runs on EMR (not internal clusters)
Apache Mahout
Ricardo (research project at IBM), paper is good, but no source available
Avoid using R for general purpose MR use.
How to test outside of Hadoop
cat DATAFILE | ./map.R | sort | ./reduce.R
Book: R meets Big Data - a basket of strategies to help you use R for large-scale analysis and computation.
Ref: https://thestrangeloop.com/sessions/distributed-systems-the-stuff-nobody-told-you
Adds 5.000 docs/sec: greplin > twitter
50 servers!
How to deal with Distributed Systems:
- We’re admitted we’re powerless
- Great power greater than ourselves could solve it
- Zookeeper
Live saver! Create cluster and uses it as a whole thing.
- http://research.yahoo.com/pub/3274
- Nagios
Handle complexity of configuring and maintaing
Make an inventory of our system!
Yammer Metrics: https://github.com/codahale/metrics
Graphite: Performs charting
- Greplin exception catcher
Capture exceptions among all your servers. Unifying back tracers.
Add tracks for everything! It should been
Made a list we had harmed and amends by fixing underlying problems.
- fixit Friday
- 3 day call (emergency responsable)
- internal libraries => make ease to use it, ppl will use it.
- Greplin Exception Catcher: monitors all user-facing 500s
Avoid Introducing Accidental Complexing
- DNS takes forever to updates (DNS == EVIL). Moved everything to Zookeper.
- Amazon Elastic Block Store == EVIL.
Lucene own implementation, each user has his own Lucene Index, allowing them to “warm-up” an specific user.
RabbitMQ couldn’t keep up with our throughput
Redis Master + 2 Slaves becomes a better solutions, we monitor how long SLAVES are behind of MASTER, if it’s more than 100ms someone gets paged!
Processing pipeline:
WebCrawalers ==>
Redis Queue ==>
Analyzers (PDF extraction, add social info to email) ==>
Sharded Lucene Indexes (00’s servers).
Production Blog to notify about what’s going on in production!!!
- yammer metrics
- riak
- redis
Zookeper Utils: https://github.com/Greplin/greplin-zookeeper-utils
Nagios-Utils: https://github.com/Greplin/greplin-nagios-utils
@smanek
shaneal@greplin.com
Ref: https://thestrangeloop.com/sessions/we-really-dont-know-how-to-compute
Gene compact (1GB) and flexible (from human to a cow w/ few changes)
World of computation are expensive, software are not been optimized: “memory is free”.
Reasonable answer!!!
Since ASM you could make your program create program.
Monad allows you to carry information and other information as a parallel plumming.
Ref: https://thestrangeloop.com/sessions/transactions-without-transactions
Transactions in real life doesn’t works as with simplistic sample of a transfer between bank accounts.
Transactions Procedure
- preserves invariants at the beginning and the end (but not necessarily during) of the process
- decompose problems into a set of atomic work units
Paper: Gregor Hohpe, Your coffe shop doesn’t use two-phase commit.
(pages 64-66)
Jim Gray. The transaction concept: Virtues and Limitations, June 1981.
Compensatory transaction as an app-level reaction to a failed invariant.
Requirements:
- Atomic R, W, M semantics across a single unit.
- Durable W
- Consistents R
- App: provides soft. isolation
Paper: Building transactions with google big table (percolator)
[Selling tickets slides]
Soft. isolation frameworks or patterns???
Event Sourcing
Capture all changes to an application state as a sequence of events.
http://martinfowler.com/eaaDev/EventSourcing.html
Ref: https://thestrangeloop.com/sessions/distributed-stm-a-new-programming-model-for-the-cloud
Transaction Memory: like db but in memory, CAS.
Strengths: avoid locks, composable, rollback
Weakness: Performance, Interactions IO, Debugging
ObjectFabric Solution: As a SCM, using snapshot and merge
DTM: Developers only declare intents
“replicate this with server”
“make this durable”
Get Started: https://github.com/ObjectFabric/ObjectFabric
Two ways to model the world:
- Shared state mutated by transactions
- Immutable messages between stateless processors
Instead of using messages, use a graph of objects replicating your business domain and sincronize it through STM, listen to callbacks and updates.
Conflicts mgmt: currently only supports ABORT
Ref: https://thestrangeloop.com/sessions/scalaz-purely-functional-programming-in-scala
http://scalaz.org
Compositionality: Understand the parts you can understand the whole
ScalaZ roadshow:
=== equality type-safe
Monoids compositions are interesting!
Useful to sum maps and trees:
Monoid[v] => Monoid[Map[K,V]]
Validation, you can use them composed and validate all info at once.
ValidationNEL: return a list w/ all failures or the result
|@| ==> Oinc operator (looks like a pig ;)
Applicative Functors: any M[A] can be composed with |@| if there is Applicative[M] in implicite scope. You can create you own compositions
State: Can be used to IO
Ref: https://thestrangeloop.com/sessions/storm-twitters-scalable-realtime-computation-system
Nathan Marz, Twitter
Messages locality makes sense for batch updates
[twitter diagram 1]
Consistency hashing => take the hash and MOD for the # queues and then all will be at same Q. - Doesn’t scale: it creates a problem to increase Workers, must reconf all workers that are pushing to Qs.
RT == Constantly availability, Hadoop can’t guarantee this!
Storm Use Cases
- Stream processing
- Distributed RPC
- Continuous computations
Storm Concepts
- Streams (Tupple - Unbonded seq)
- Spouts (Source of Streams, ie: Kestrel, Stream API)
- Bouts (Process In-streams, and generates out-streams)
- Functions
- Filtering
- Topologies (How they subscribe)
Compute Potential Reach of a URL => Computation intensive
No Q on Storm, everything goes directly through 0MQ.
Storm tracks tuple trees for you.
https://github.com/nathanmarz/storm
https://github.com/nathanmarz/storm-starter
How to install ZeroMQ/JZMQ in MacOSX:
http://blog.pmorelli.com/getting-zeromq-and-jzmq-running-on-mac-os-x
Ref:
https://thestrangeloop.com/sessions/functional-thinking
http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html
Objects took more than a decade to be mainstream, because ppl look to then and seen weardness.
FP makes code understandable by minimizing moving parts.
FP thinks about results not steps.
SQL: is more likely it, you don’t tell to server where to find the data just what you want as result.
FP Concepts
- 1st class/higher-order functions
- pure functions
- strict evaluation
- recursion
p => means predicted (returns T/F)
Recursive FP you don’t have to mind about moving parts, variables and states.
How to begin FP:
Immutability over state transitions
http://www.ibm.com/developerworks/java/library/j-jtp02183/index.html
Immutability score: Try to change all your vars to final and recompile.
declarative over imperative
paradigm over tool => you can start now
[book ref: http://www.amazon.com/Productive-Programmer-Theory-Practice-OReilly/dp/0596519788]
Ref: https://thestrangeloop.com/sessions/category-theory-monads-and-duality-in-big-data
Bigdata is not about the data size.
NoSQL and SQL are not at oposites sides —> coSQL. They are each other complement.
that co-exists in harmony and can transmute into each other (ORM mapper)
open => NoSQL, closed => SQL
“I do consider assignments statements and pointer variables to be among CS most treasures.” Donald Knuth
We have been using K/V forever, pointer in C for ex.
Problems with SQL:
Query denormalizaded
Query can only return a single base
NULL isn’t arbitrary:
1+NULL = NULL
SUM(1, NULL) = 1
LINQ/ORM: You want to deal your relation data as it was an Object
Putting your data into Databases is like rolling a rock up and let it roll down mountain all day long … [Naranath the Lunatic]
Interface for function, now functions become more abstract.
Consequences of duality