The need of emerging technologies states
A successful software engineer knows and uses design patterns, actively refactors code, writes unit tests and religiously seeks simplicity. Beyond the basic methods, there are concepts that good software engineers know about. These transcend programming languages and projects – they are not design patterns, but rather broad areas that you need to be familiar with. The top 10 concepts are:
The most important concept in software
is interface. Any good software is a model
of a real (or imaginary) system. Understanding how to model the problem
in terms of correct and simple interfaces is crucial. Lots of systems suffer from the extremes:
clumped, lengthy code
with little abstractions, or an overly designed system with unnecessary complexity and unused code.
Among the many books, Agile Programming by Dr Robert Martin stands out because
of focus on modeling correct interfaces.
In modeling, there are ways you can
iterate towards the right solution. Firstly, never add methods that might be useful in the
future. Be minimalist, get away with as little as possible. Secondly, don’t be afraid to recognize today that what
you did yesterday
wasn’t right. Be willing to change things. Thirdly, be patient and enjoy the process.
you will arrive at a system that feels right. Until then, keep iterating and don’t settle.
Naming conventions and basic templates are the most overlooked software
patterns, yet probably the most powerful.
Naming conventions enable software automation. For example, Java Beans framework is based on a simple naming convention for getters and setters.
And canonical URLs in del.icio.us: http://del.icio.us/tag/software take the user to the page
that has all items tagged software.
Many social software utilise naming conventions in a similar way.
For example, if your user name is johnsmith then likely your avatar is johnsmith.jpg and your rss feed
Naming conventions are also used in testing, for example JUnit automatically recognizes all the
methods in the class that start with prefix test.
The templates are not C++ or Java language constructs. We’re talking about template files
that contain variables and then allow binding of objects, resolution, and rendering the result for the client.
Cold Fusion was one of the first to popularize templates for web applications.
Java followed with JSPs, and recently Apache developed handy
general purpose templating for Java called Velocity. PHP can be used as its own templating engine because
it supports eval function (be careful with security). For XML programming it is standard to use XSL language
to do templates.
From generation of HTML pages to sending standardized support emails, templates are
an essential helper in any
modern software system.
Layering is probably the simplest way to discuss software architecture. It first got serious attention
when John Lakos
published his book about Large-scale C++ systems.
Lakos argued that software consists of layers. The book introduced the concept of layering.
The method is this. For each software component, count the number of other components
it relies on. That is the metric of
how complex the component is.
Lakos contended a good software follows the shape of a pyramid; i.e., there’s a progressive increase in the
cummulative complexity of each component, but not in the immediate complexity. Put
differently, a good software system consists of
small, reusable building blocks, each carrying its own responsibility. In a good system, no cyclic dependencies between
components are present and the whole system is a stack of layers of functionality,
forming a pyramid.
Lakos’s work was a precursor to many developments in software engineering, most notably Refactoring.
The idea behind refactoring is continuously sculpting the software to ensure it’is structurally sound and flexible. Another major
contribution was by Dr Robert Martin from Object Mentor, who
wrote about dependecies and acyclic architectures
Among tools that help engineers deal with
system architecture are Structure 101 developed
by Headway software, and SA4J developed by my former company, Information Laboratory,
and now available from IBM.
There are just a handful of things
engineers must know about algorithmic complexity. First is big O notation. If something
takes O(n) it’s linear in the size of data. O(n^2) is quadratic. Using this notation, you should know that search through a list is
O(n) and binary search (through a sorted list) is log(n). And sorting of n items would take n*log(n) time.
Your code should (almost) never have multiple nested loops (a loop inside a loop
inside a loop). Most of the code written today should use Hashtables, simple lists and singly nested loops.
Due to abundance
of excellent libraries, we are not as focused on efficiency these days. That’s fine, as tuning can happen later on, after you
get the design right.
Elegant algorithms and performance is something you shouldn’t ignore. Writing
compact and readable code helps ensure your algorithms are clean and simple.
behind hashing is fast access to
data. If the data is stored sequentially, the time to
find the item is proportional to the size of the list. For each element, a hash function calculates a number, which is
used as an index into the table.
Given a good hash function that uniformly spreads data along the table, the
look-up time is constant. Perfecting hashing
is difficult and to deal with that hashtable implementations support collision resolution.
Beyond the basic storage of data, hashes are also important in distributed systems.
The so-called uniform hash is used to evenly allocate data among computers in a cloud database.
A flavor of this technique is part of Google’s indexing service; each URL is hashed to particular computer.
Memcached similarly uses a hash function.
Hash functions can be complex and sophisticated, but modern libraries have good defaults. The important thing
is how hashes work and
how to tune them for maximum performance benefit.
No modern web system
runs without a cache, which is an in-memory store that holds a subset of information
typically stored in the database. The need for cache
comes from the fact that generating results based on the database is costly. For example, if you have a website that
lists books that were popular last week, you’d want to compute this information once and
place it into cache. User requests fetch data from the cache instead of hitting the database and
regenerating the same information.
Caching comes with a cost. Only some subsets of information can be stored in memory.
The most common data pruning strategy is
to evict items that are least recently used (LRU). The prunning needs to be efficient, not to slow down the application.
A lot of modern web applications, including Facebook, rely on a distributed caching system called Memcached, developed by Brad Firzpatrick
when working on LiveJournal. The idea
was to create a caching system that utilises spare memory capacity on the network. Today, there are
Memcached libraries for many popular languages, including Java and PHP.
Concurrency is one
topic engineers notoriously get wrong, and understandibly so, because the brain does
juggle many things at a time and in schools linear thinking is emphasized. Yet concurrency
is important in any modern system.
Concurrency is about parallelism, but inside the application. Most modern languages have an
of concurrency; in Java, it’s implemented using Threads.
A classic concurrency example is the producer/consumer, where the producer
generates data or tasks, and places it for worker threads to consume and execute. The complexity in concurrency programming stems from the fact
Threads often needs to operate on the common data. Each Thread has its own sequence of execution, but accesses common data.
One of the most sophisticated concurrency libraries has been developed
by Doug Lea and is now part of core Java.
In our recent post Reaching For The Sky Through Compute Clouds
we talked about how commodity cloud computing is
changing the way we deliver large-scale web applications. Massively parallel, cheap cloud computing reduces both costs and time to market.
Cloud computing grew out of parallel computing, a concept that many problems
can be solved faster by running the computations in parallel.
After parallel algorithms came grid computing, which ran parallel computations on idle desktops.
One of the first examples was SETI@home project out of Berkley, which used spare CPU cycles to
crunch data coming from space. Grid computing is widely adopted by financial companies, which run massive
risk calculations. The concept of under-utilized resources, together with the rise of J2EE platform,
gave rise to the precursor of cloud computing: application server virtualization. The idea was to run applications
on demand and change what is available depending on the time of day and user activity.
Today’s most vivid example of cloud computing is Amazon Web Services, a package
available via API. Amazon’s offering includes a cloud service (EC2), a database for storing and serving large media files
(S3), an indexing service (SimpleDB), and the Queue service (SQS). These first blocks already empower
an unprecedented way of doing large-scale computing, and surely the best is yet to come.
With the rise of hacking and data sensitivity, the security is paramount. Security is
a broad topic that includes authentication,
authorization, and information transmission.
is about verifying user identity. A typical website prompts for a password. The authentication
typically happens over SSL (secure socket layer), a way to transmit encrypted information over HTTP.
Authorization is about permissions and is important in corporate systems, particularly
those that define workflows. The recently developed OAuth
protocol helps web services to enable users to open access to their private information. This is
how Flickr permits access to individual photos or data sets.
Another security area is network protection. This concerns operating systems, configuration and monitoring
to thwart hackers. Not only network is vulnerable, any piece of software is. Firefox browser,
marketed as the most secure, has to patch the code continuously. To write secure code for your system requires understanding specifics and potential problems.
Relational Databases have
recently been getting a bad name because they cannot scale well to support massive web services.
Yet this was one of the most fundamental achievements in computing that has carried us for
two decades and will remain for a long time. Relational databases are excellent for
order management systems, corporate databases and P&L data.
At the core of the relational database is the concept of representing information in
records. Each record is added to
a table, which defines the type of information. The database offers a way to search
the records using a query language, nowadays SQL. The database
offers a way to correlate information from multiple tables.
The technique of data normalization is about correct ways of partitioning
the data among tables to minimize data redundancy and maximize the speed of retrieval.
Further the development of technologies lies behind the work of emerging technologies
At heart, R is a programming language, but it’s more of a standard bearer for the world’s current obsession with using statistics to unlock patterns in large blocks of data. R was designed by statisticians and scientists to make their work easier. It comes with most standard functions used in data analysis and many of the most useful statistical algorithms are already implemented as freely distributed libraries. It’s got most of what data scientists need to do data-driven science.
Java isn’t a new language. It’s often everyone’s first language, thanks to its role as the lingua franca for AP Computer Science. There are billions of JAR files floating around running the world.
But Java 8 is a bit different. It comes with new features aimed at offering functional techniques that can unlock the parallelism in your code. You don’t have to use them. You could stick with all the old Java because it still works. But if you don’t use it, you’ll be missing the chance to offer the Java virtual machine (JVM) even more structure for optimizing the execution. You’ll miss the chance to think functionally and write cleaner, faster, and less buggy code.
Apple saw an opportunity when programming newbies complained about the endless mess of writing in Objective C. So they introduced Swift and strongly implied that it would replace Objective C for writing for the Mac or the iPhone. They recognized that creating header files and juggling pointers was antiquated. Swift hides this information, making it much more like writing in a modern language like Java or Python. Finally, the language is doing all the scut work, just like the modern code.
The language specification is broad. It’s not just a syntactic cleanup of Objective C. There are plenty of new features, so many that they’re hard to list. Some coders might even complain that there’s too much to learn, and Swift will make life more complicated for teams who need to read each other’s code. But let’s not focus too much on that. iPhone coders can now spin out code as quickly as others. They can work with a cleaner syntax and let the language do the busy work.
When Google set out to build a new language to power its server farms, it decided to build something simple by throwing out many of the more clever ideas often found in other languages. They wanted to keep everything, as one creator said, “simple enough to hold in one programmer’s head.” There are no complex abstractions or clever metaprogramming in Go—just basic features specified in a straightforward syntax.
This can make things easier for everyone on a team because no one has to fret when someone else digs up a neat idea from the nether reaches of the language specification.
Jokers may claim that CoffeeScript is little more than a way to rest your right hand’s pinkie, but they’re missing the point. Cleaner code is easier to read, and we all benefit when we can parse the code quickly in our brain. CoffeeScript makes it easier for everyone to understand the code, and that benefits everyone.
For many programmers, there’s nothing like the very clean, simple world of C. The syntax is minimal and the structure maps cleanly to the CPU. Some call it portable Assembly. Even for all these advantages, some C programmers feel like they’re missing out on the advantages built into newer languages.
That’s why D is being built. It’s meant to update all the logical purity of C and C++ while adding in modern conveniences such as memory management, type inference, and bounds checking.
Just like CoffeeScript, Less.js is really just a preprocessor for your files, one that makes it easier to create elaborate CSS files. Anyone who has tried to build a list of layout rules for even the simplest website knows that creating basic CSS requires plenty of repetition; Less.js handles all this repetition with loops, variables, and other basic programming constructs. You can, for instance, create a variable to hold that shade of green used as both a background and a highlight color. If the boss wants to change it, you only need to update one spot.
There are more elaborate constructs such as mixins and nested rules that effectively create blocks of standard layout commands that can be included in any number of CSS classes. If someone decides that the bold typeface needs to go, you only need to fix it at the root and Less.js will push the new rule into all the other definitions.
Once upon a time, MATLAB was a hardcore language for hardcore mathematicians and scientists who needed to juggle complex systems of equations and find solutions. It’s still that, and more of today’s projects need those complex skills. So MATLAB is finding its way into more applications as developers start pushing deeper into complex mathematical and statistical analysis. The core has been tested over the decades by mathematicians and now it’s able to help mere mortals.
The Internet of Things is coming. More and more devices have embedded chips just waiting to be told what to do. Arduino isn’t so much a new language as a set of C or C++ functions that you string together. The compiler does the rest of the work.
Many of these functions will be a real novelty for programmers, especially programmers used to creating user interfaces for general computers. You can read voltages, check the status of pins on the board, and of course, control just how those LEDs flash to send inscrutable messages to the people staring at the device.
Most people take the power of their video cards for granted. They don’t even think about how many triangles the video card is juggling, as long as their world is a complex, first-person shooter game. But if they would only look under the hood, they would find a great deal of power ready to be unlocked by the right programmer. The CUDAlanguage is a way for Nvidia to open up the power of their graphics processing units (GPUs) to work in ways other than killing zombies or robots.
The key challenge to using CUDA is learning to identify the parallel parts of your algorithm. Once you find them, you can set up the CUDA code to blast through these sections using all the inherent parallel power of the video card. Some jobs, like mining Bitcoins, are pretty simple, but other challenges, like sorting and molecular dynamics, may take a bit more thinking. Scientists love using CUDA code for their large, multidimensional simulations.
Everyone who’s taken an advanced course in programming languages knows the academic world loves the idea of functional programming, which insists that each function have well-defined inputs and outputs but no way of messing with other variables. There are dozens of good functional languages, and it would be impossible to add all of them here. Scala is one of the best-known, with one of the larger user bases. It was engineered to run on the JVM, so anything you write in Scala can run anywhere that Java runs—which is almost everywhere.
There are good reasons to believe that functional programming precepts, when followed, can build stronger code that’s easier to optimize and often free of some of the most maddening bugs. Scala is one way to dip your toe into these waters.
Scala isn’t the only functional language with a serious fan base. One of the most popular functional languages, Haskell, is another good place for programmers to begin. It’s already being used for major projects at companies like Facebook. It’s delivering real performance on real projects, something that often isn’t the case for academic code.