I Googled for a long time but I still don't understand how it works as most of the explanation are very technical and there are no illustrations to make it clearer. My primary confusion is that what is its'difference with virtual memory?
I hope this question will have a very good explanation here so that other people who ask the same question can find it here when they Google it.
I have to admit, those two concepts can seem quite complicated and similar at the beginning. Sometimes they are also taught confusingly. A good reference in my opinion can be found on osdev.org: Segmentation Paging
For sake of completion, I'll try to explain it here too, but I cannot guarantee correctness, as I have not developed OS for some months.
Segmentation in old 16bit days
Segmentation is the older of both concepts and it is in my opinion the more confusing. Segmentation works on - as the name says - segments. A segment is a continuous block of memory of a specific size. To access memory within each segment we need an offset. This makes a total of two address components, which are in fact stored in two registers. One idea of segmentation was to enlarge memory having only 16-bit registers. The other was some sort of protection, but not as elaborate as that one of paging.
Because we use two registers to access memory now, we can split memory into chunks - as said above, the so called segments. Consider a memory of 1MB (2^20). This can be split into 65536 (2^16, because 16 bits registers) segments of each 16 bytes. Of course, we also have 16 bits registers for the offset. Addressing 16 bytes with 16 bits is quite useless, so it was decided that segments can overlap (which I think also had performance and programming reasons back then).
The following formula is used to access 1MB of memory with segmentation:
Physical address = (A * 0x10) + B
This means the segment will be 16 times the offset. This also means that the address 0x0100 can be accessed in many ways, e.g. by A=0x010 and B=0x0, but also by A=0x0 and B=0x0100.
This was segmentation in the old 16bit days.
If you look at assembler programs or try something yourself, you'll see they even have so called registers in assembler: CS and DS (code segment and data segment).
Segmentation in 32bit days
Later a so called Global Descriptor Table (GDT) was introduced. This is a global table (at a specific position in your RAM) in which segment numbers and memory addresses and several other options for each segment are given. This brings us nearer to the concept of paging, but it's still not the same.
So now the programmer himself can decide where segments should start. A new concept also was that in the GDT one could decide how long a segment should be. So not each segment had to be 64kB long (2^16, because of 16 bit registers), but the limit could be defined by the programmer. You could have overlapping segments or also purely separated segments.
When accessing A:B now (still two registers used for accessing memory), A will be the entry in the GDT. So we'll look up the A'th entry in the GDT and see at which memory location the segment starts and how large it is. We then check if B (offset) is within the allowed memory area.
Paging
Now paging is not so different from the newer segmentation approach, but at paging each page has a fixed size. So the limit is no longer programmable, each page has (currently) 4kb. Furthermore, unlike at segmentation, the logical address space can be continuous without the physical addresses being continuous.
Paging also uses tables to look up stuff and you still split the logical address into parts. The first part is the number of the entry in the page table, the second part is the offset. However, now the offset has a fixed length of 12 bits to access 4kb. You can also have more than two parts, then multiple page tables will be used. Two level page tables are quite common, for 64bit systems I think even three level page tables are common.
Ending
I hope I was able to explain it at least a bit, but I think my exaplanation was also confusing. Best thing is to dive into kernel programming and try to implement the most basic stuff when booting an OS. Then you'll find out everything, because due to backwards compability everything is still on our modern PCs.
I direct you to
http://en.wikipedia.org/wiki/Virtual_memory
and
http://en.wikipedia.org/wiki/Segmented_memory
Segmentation is starting to die out. I suspect the paging will as well in the future.
Edit: Let me add a clarification
Segmentation and paging are two difference means of memory management but they typically do two things. At the risk of oversimplification:
Segmentation allows a process to access more memory than the natural pointer size would allow.
Paging allows a process to access more memory than the system physically supports.
Segments:
The PDP-11 was a 16 bit system. That allows addressing 64K of memory. Late PDP-11 systems had much more memory than that. A process could map different segments of physical memory into that 64K. A process could only access 64K of memory but it could change which memory it could access within that 64K.
The 8086 and successors brought segmenting to a high art. Using an even more complex system of base registers, a process could access larger areas of memory.
Paging:
Is a system where process sees continuous (or relatively so) range of memory addresses that are divided into pages. For example, the VAX processors had 32 bit addresses (allowing access to 4GB of memory theoretically) while the computer typically had 8, 16 32MB of memory. A process could access much more memory than the system physically had (plus multiple processes).
These systems presented a continuous range of of memory to the process (virtual memory) divided into pages (around 512-2048 bytes), defined by a set of tables and mapped to disk storage. If a process accessed a page that was not in memory, it triggered a hardware exception. The operating system would intercept that exception, allocate a new page of physical memory, and load the memory from disk then restart the instruction.
If the operating system needed more memory to handle these requests it would page out memory it had already loaded. If the data was read only, usually this would be loaded from the executable image and would not have to paged out. The page could just be marked invalid. If it were read/write memory, the page would be written to a page file for storage until needed again.
The 32-bit Intel chips introduced a bizarre system that combined segments and paging. Segments were used for data protection. The 64-bit processor modes do away with that.