Would one have to know the machine architecture to

2019-04-28 18:16发布

站内文章 / Java

54 0

爷的心禁止访问

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Let's say I'm programming in Java or Python or C++ for a simple problem, could be to build an TCP/UDP echo server or computation of factorial. Do I've to bother about the architecture details, i.e., if it is 32 or 64-bit?

IMHO, unless I'm programming something to do with fairly low-level stuff then I don't have to bother if its 32 or 64 bit. Where am I going wrong? Or am I correct???

回答1:

correct for most circumstances

The runtime/language/compiler will abstract those details unless you are dealing directly with word sizes or binary at a low level.

Even byteorder is abstracted by the NIC/Network stack in the kernel. It is translated for you. When programming sockets in C, you do sometimes have to deal with byte ordering for the network when sending data ... but that doesn't concern 32 or 64 bit differences.

When dealing with blobs of binary data, mapping them from one architecture to another (as an overlay to a C struct for example) can cause problems as others have mentioned, but this is why we develop architecture independent protocols based on characters and so on.

In-fact things like Java run in a virtual machine that abstracts the machine another step!

Knowing a bit about the instruction set of the architecture, and how the syntax is compiled to that can help you understand the platform and write cleaner, tighter code. I know I grimace at some old C code after studying compilers!

回答2:

Knowing how things work, be it how the virtual machine works, and how it works on your platform, or how certain C++ constructs are transformed into assembly will always make you a better programmer, because you will understand why things should be done the way they are.

You need to understand things like memory to know what cache-misses are and why those might affect your program. You should know how certain things are implemented, even though you might only use an interface or high-level way to get to it, knowing how it works will make sure you're doing it in the best way.

For packet work, you need to understand how data is stored on platforms and how sending that across the network to a different platform might change how the data is read (endian-ness).

Your compiler will make best use of the platform you're compiling on, so as long as you stick to standards and code well, you can ignore most things and assume the compiler will whip out what's best.

So in short, no. You don't need to know the low level stuff, but it never hurts to know.

回答3:

The last time I looked at the Java language spec, it contained a ridiculous gotcha in the section on integer boxing.

Integer a = 100;
Integer b = 100;

System.out.println(a == b);

That is guaranteed to print true.

Integer a = 300;
Integer b = 300;

System.out.println(a == b);

That is not guaranteed to print true. It depends on the runtime. The spec left it completely open. It's because boxing an int between -128 and 127 returns "interned" objects (analogous to the way string literals are interned), but the implementer of the language runtime is encouraged to raise that limit if they wish.

I personally regard that as an insane decision, and I hope they've fixed it since (write once, run anywhere?)

回答4:

You sometimes must bother.

You can be surprised when these low-level details suddenly jump out and bite you. For example, Java standardized double to be 64 bit. However, Linux JVM uses the "extended precision" mode, when the double is 80 bit as long as it's in the CPU register. This means that the following code may fail:

double x = fun1();
double y = x;

System.out.println(fun2(x));

assert( y == x );

Simply because y is forced out of the register into memory and truncated from 80 to 64 bits.

回答5:

In Java and Python, architecture details are abstracted away so that it is in fact more or less impossible to write architecture-dependant code.

With C++, this is an entirely different matter - you can certainly write code that does not depend on architecture details, but you have be careful to avoid pitfalls, specifically concerning basic data types that are are architecture-dependant, such as int.

回答6:

As long as you do things correctly, you almost never need to know for most languages. On many, you never need to know, as the language behavior doesn't vary (Java, for example, specifies the runtime behavior precisely).

In C++ and C, doing things correctly includes not making assumptions about int. Don't put pointers in int, and when you're doing anything with memory sizes or addresses use size_t and ptrdiff_t. Don't count on the size of data types: int must be at least 16 bits, almost always is 32, and may be 64 on some architectures. Don't assume that floating-point arithmetic will be done in exactly the same way on different machines (the IEEE standards have some leeway in them).

Pretty much all OSes that support networking will give you some way to deal with possible endianness problems. Use them. Use language facilities like isalpha() to classify characters, rather than arithmetic operations on characters (which might be something weird like EBCDIC). (Of course, it's now more usual to use wchar_t as character type, and use Unicode internally.)

回答7:

If you are programming in Python or in Java, the interpreter and the virtual machine respectively abstract this layer of the architecture. You then need not to worry if it's running on a 32 or 64 bits architecture.

The same cannot be said for C++, in which you'll have to ask yourself sometimes if you are running on a 32 or 64 bits machine

回答8:

You will need to care about "endian-ness" only if you send and receive raw C structs over the wire like

ret = send(socket, &myStruct, sizeof(myStruct));

However this is not a recommended practice.

It's recommended that you define a protocol between the parties such it doesn't matter the parties' machine architectures.

回答9:

In C++, you have to be very careful if you want to write code that works indifferently on 32 or 64 bits. Many people wrongly assume that int can store a pointer, for example.

回答10:

With java and .net you don't really have to bother with it unless you are doing very low level stuff like twiddling bits. If you are using c, c++, fortran you might get by but I would actually recommend using things like "stdint.h" where you use definitive declares like uint64_t and uint32_t so as to be explicit. Also, you will need to build with particularly libraries depending on how you are linking, for example a 64 bit system might use gcc in a default 64 bit compile mode.

回答11:

A 32 bit machine will allow you to have a maximum of 4 GB of addressable virtual memory. (In practice, it's even less than that, usually 2 GB or 3 GB depending on the OS and various linker options.) On a 64 bit machine, you can have a HUGE virtual address space (in any practical sense, limited only by disk) and a pretty damn big RAM.

So if you are expecting 6GB data sets for some computation (let's say something that needs incoherent access and can't just be streamed a bit at a time), on a 64 bit architecture you could just read it into RAM and do your stuff, whereas on a 32 bit architecture you need a fundamentally different way to approach it, since you simply do not have the option of keeping the entire data set resident.