Meaning of Leaky Abstraction?

2019-03-08 07:35发布

站内文章 / 后端开发

76 0

迷人小祖宗

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

What does the term "Leaky Abstraction" mean? (Please explain with examples. I often have a hard time grokking a mere theory.)

回答1:

Here's a meatspace example:

Automobiles have abstractions for drivers. In its purest form, there's a steering wheel, accelerator and brake. This abstraction hides a lot of detail about what's under the hood: engine, cams, timing belt, spark plugs, radiator, etc.

The neat thing about this abstraction is that we can replace parts of the implementation with improved parts without retraining the user. Let's say we replace the distributor cap with electronic ignition, and we replace the fixed cam with a variable cam. These changes improve performance but the user still steers with the wheel and uses the pedals to start and stop.

It's actually quite remarkable... a 16 year old or an 80 year old can operate this complicated piece of machinery without really knowing much about how it works inside!

But there are leaks. The transmission is a small leak. In an automatic transmission you can feel the car lose power for a moment as it switches gears, whereas in CVT you feel smooth torque all the way up.

There are bigger leaks, too. If you rev the engine too fast, you may do damage to it. If the engine block is too cold, the car may not start or it may have poor performance. And if you crank the radio, headlights, and AC all at the same time, you'll see your gas mileage go down.

回答2:

It simply means that your abstraction exposes some of the implementation details, or that you need to be aware of the implementation details when using the abstraction. The term is attributed to Joel Spolsky, circa 2002. See the wikipedia article for more information.

A classic example are network libraries that allow you to treat remote files as local. The developer using this abstraction must be aware that network problems may cause this to fail in ways that local files do not. You then need to develop code to handle specifically errors outside the abstraction that the network library provides.

回答3:

Wikipedia has a pretty good definition for this

A leaky abstraction refers to any implemented abstraction, intended to reduce (or hide) complexity, where the underlying details are not completely hidden

Or in other words for software it's when you can observe implementation details of a feature via limitations or side effects in the program.

A quick example would be C# / VB.Net closures and their inability to capture ref / out parameters. The reason they cannot be captured is due to an implementation detail of how the lifting process occurs. This is not to say though that there is a better way of doing this.

回答4:

Here's an example familiar to .NET developers: ASP.NET's Page class attempts to hide the details of HTTP operations, particularly the management of form data, so that developers don't have to deal with posted values (because it automatically maps form values to server controls).

But if you wander beyond the most basic usage scenarios the Page abstraction begins to leak and it becomes hard to work with pages unless you understand the class' implementation details.

One common example is dynamically adding controls to a page - the value of dynamically-added controls won't be mapped for you unless you add them at just the right time: before the underlying engine maps the incoming form values to the appropriate controls. When you have to learn that, the abstraction has leaked.

回答5:

Well, in a way it is a purely theoretical thing, though not unimportant.

We use abstractions to make things easier to comprehend. I may operate on a string class in some language to hide the fact that I'm dealing with an ordered set of characters that are individual items. I deal with an ordered set of characters to hide the fact that I'm dealing with numbers. I deal with numbers to hide the fact that I'm dealing with 1s and 0s.

A leaky abstraction is one that doesn't hide the details its meant to hide. If call string.Length on a 5-character string in Java or .NET I could get any answer from 5 to 10, because of implementation details where what those languages call characters are really UTF-16 data-points which can represent either 1 or .5 of a character. The abstraction has leaked. Not leaking it though means that finding the length would either require more storage space (to store the real length) or change from being O(1) to O(n) (to work out what the real length is). If I care about the real answer (often you don't really) you need to work on the knowledge of what is really going on.

More debatable cases happen with cases like where a method or property lets you get in at the inner workings, whether they are abstraction leaks, or well-defined ways to move to a lower level of abstraction, can sometimes be a matter people disagree on.

回答6:

I'll continue in the vein of giving examples by using RPC.

In the ideal world of RPC, a remote procedure call should look like a local procedure call (or so the story goes). It should be completely transparent to the programmer such that when they call SomeObject.someFunction() they have no idea if SomeObject (or just someFunction for that matter) are locally stored and executed or remotely stored and executed. The theory goes that this makes programming simpler.

The reality is different because there's a HUGE difference between making a local function call (even if you're using the world's slowest interpreted language) and:

calling through a proxy object
serializing your parameters
making a network connection (if not already established)
transmitting the data to the remote proxy
having the remote proxy restore the data and call the remote function on your behalf
serializing the return value(s)
transmitting the return values to the local proxy
reassembling the serialized data
returning the response from the remote function

In time alone that's about three orders (or more!) of magnitude difference. Those three+ orders of magnitude are going to make a huge difference in performance that will make your abstraction of a procedure call leak rather obviously the first time you mistakenly treat an RPC as a real function call. Further a real function call, barring serious problems in your code, will have very few failure points outside of implementation bugs. An RPC call has all of the following possible problems that will get slathered on as failure cases over and above what you'd expect from a regular local call:

you might not be able to instantiate your local proxy
you might not be able to instantiate your remote proxy
the proxies may not be able to connect
the parameters you send may not make it intact or at all
the return value the remote sends may not make it intact or at all

So now your RPC call which is "just like a local function call" has a whole buttload of extra failure conditions you don't have to contend with when doing local function calls. The abstraction has leaked again, even harder.

In the end RPC is a bad abstraction because it leaks like a sieve at every level -- when successful and when failing both.

回答7:

An example in the django ORM many-to-many example:

Notice in the Sample API Usage that you need to .save() the base Article object a1 before you can add Publication objects to the many-to-many attribute. And notice that updating the many-to-many attribute saves to the underlying database immediately, whereas updating a singular attribute is not reflected in the db until the .save() is called.

The abstraction is that we are working with an object graph, where single-value attributes and mult-value attributes are just attributes. But the implementation as a relational database backed data store leaks... as the integrity system of the RDBS appears through the thin veneer of an object interface.

回答8:

The fact that at some point, which will guided by your scale and execution, you will be needed to get familiar with the implementation details of your abstraction framework in order to understand why it behave that way it behave.

For example, consider this SQL query:

SELECT id, first_name, last_name, age, subject FROM student_details;

And it's alternative:

SELECT * FROM student_details;

Now, they do look like a logically equivalent solutions, but the performance of the first one is better due the individual column names specification.

It's a trivial example but eventually it comes back to Joel Spolsky quote:

All non-trivial abstractions, to some degree, are leaky.

At some point, when you will reach a certain scale in your operation, you will want to optimize the way your DB (SQL) works. To do it, you will need to know the way relational databases works. It was abstracted to you in the beginning, but it's leaky. You need to learn it at some point.

回答9:

Assume, we have the following code in a library:

Object[] fetchDeviceColorAndModel(String serialNumberOfDevice)
{
    //fetch Device Color and Device Model from DB.
    //create new Object[] and set 0th field with color and 1st field with model value. 
}

When the consumer calls the API, they get an Object[]. The consumer has to understand that the first field of the object array has color value and second field is the model value. Here the abstraction has leaked from library to the consumer code.

One of the solutions is to return an object which encapsulates Model and Color of the Device. The consumer can call that object to get the model and color value.

DeviceColorAndModel fetchDeviceColorAndModel(String serialNumberOfTheDevice)
{
    //fetch Device Color and Device Model from DB.
    return new DeviceColorAndModel(color, model);
}

回答10:

Leaky abstraction is all about encapsulating state. very simple example of leaky abstraction:

$currentTime = new DateTime();

$bankAccount1->setLastRefresh($currentTime);
$bankAccount2->setLastRefresh($currentTime);
$currentTime->setTimestamp($aTimestamp);

class BankAccount {
    // ...

    public function setLastRefresh(DateTimeImmutable $lastRefresh)
    {
        $this->lastRefresh = $lastRefresh;
    } }

and the right way(not leaky abstraction):

class BankAccount
{
    // ...

    public function setLastRefresh(DateTime $lastRefresh)
    {
        $this->lastRefresh = clone $lastRefresh;
    }
}

more description here.