When to use symbols instead of strings in Ruby?

2019-01-04 06:19发布

If there are at least two instances of the same string in my script, should I instead use a symbol?

标签: ruby symbols
5条回答
来,给爷笑一个
2楼-- · 2019-01-04 06:24

Put simply, a symbol is a name, composed of characters, but immutable. A string, on the contrary, is an ordered container for characters, whose contents are allowed to change.

查看更多
爷、活的狠高调
3楼-- · 2019-01-04 06:26

Here is a nice strings vs symbols benchmark I found at codecademy:

require 'benchmark'

string_AZ = Hash[("a".."z").to_a.zip((1..26).to_a)]
symbol_AZ = Hash[(:a..:z).to_a.zip((1..26).to_a)]

string_time = Benchmark.realtime do
  1000_000.times { string_AZ["r"] }
end

symbol_time = Benchmark.realtime do
  1000_000.times { symbol_AZ[:r] }
end

puts "String time: #{string_time} seconds."
puts "Symbol time: #{symbol_time} seconds."

The output is:

String time: 0.21983 seconds.
Symbol time: 0.087873 seconds.
查看更多
兄弟一词,经得起流年.
4楼-- · 2019-01-04 06:31

TL;DR

A simple rule of thumb is to use symbols every time you need internal identifiers. For Ruby < 2.2 only use symbols when they aren't generated dynamically, to avoid memory leaks.

Full answer

The only reason not to use them for identifiers that are generated dynamically is because of memory concerns.

This question is very common because many programming languages don't have symbols, only strings, and thus strings are also used as identifiers in your code. You should be worrying about what symbols are meant to be, not only when you should use symbols. Symbols are meant to be identifiers. If you follow this philosophy, chances are that you will do things right.

There are several differences between the implementation of symbols and strings. The most important thing about symbols is that they are immutable. This means that they will never have their value changed. Because of this, symbols are instantiated faster than strings and some operations like comparing two symbols is also faster.

The fact that a symbol is immutable allows Ruby to use the same object every time you reference the symbol, saving memory. So every time the interpreter reads :my_key it can take it from memory instead of instantiate it again. This is less expensive than initializing a new string every time.

You can get a list all symbols that are already instantiated with the command Symbol.all_symbols:

symbols_count = Symbol.all_symbols.count # all_symbols is an array with all 
                                         # instantiated symbols. 
a = :one
puts a.object_id
# prints 167778 

a = :two
puts a.object_id
# prints 167858

a = :one
puts a.object_id
# prints 167778 again - the same object_id from the first time!

puts Symbol.all_symbols.count - symbols_count
# prints 2, the two objects we created.

For Ruby versions before 2.2, once a symbol is instantiated, this memory will never be free again. The only way to free the memory is restarting the application. So symbols are also a major cause of memory leaks when used incorrectly. The simplest way to generate a memory leak is using the method to_sym on user input data, since this data will always change, a new portion of the memory will be used forever in the software instance. Ruby 2.2 introduced the symbol garbage collector, which frees symbols generated dynamically, so the memory leaks generated by creating symbols dynamically it is not a concern any longer.

Answering your question:

Is it true I have to use a symbol instead of a string if there is at least two the same strings in my application or script?

If what you are looking for is an identifier to be used internally at your code, you should be using symbols. If you are printing output, you should go with strings, even if it appears more than once, even allocating two different objects in memory.

Here's the reasoning:

  1. Printing the symbols will be slower than printing strings because they are cast to strings.
  2. Having lots of different symbols will increase the overall memory usage of your application since they are never deallocated. And you are never using all strings from your code at the same time.

Use case by @AlanDert

@AlanDert: if I use many times something like %input{type: :checkbox} in haml code, what should I use as checkbox?

Me: Yes.

@AlanDert: But to print out a symbol on html page, it should be converted to string, shouldn't it? what's the point of using it then?

What is the type of an input? An identifier of the type of input you want to use or something you want to show to the user?

It is true that it will become HTML code at some point, but at the moment you are writing that line of your code, it is mean to be an identifier - it identifies what kind of input field you need. Thus, it is used over and over again in your code, and have always the same "string" of characters as the identifier and won't generate a memory leak.

That said, why don't we evaluate the data to see if strings are faster?

This is a simple benchmark I created for this:

require 'benchmark'
require 'haml'

str = Benchmark.measure do
  10_000.times do
    Haml::Engine.new('%input{type: "checkbox"}').render
  end
end.total

sym = Benchmark.measure do
  10_000.times do
    Haml::Engine.new('%input{type: :checkbox}').render
  end
end.total

puts "String: " + str.to_s
puts "Symbol: " + sym.to_s

Three outputs:

# first time
String: 5.14
Symbol: 5.07
#second
String: 5.29
Symbol: 5.050000000000001
#third
String: 4.7700000000000005
Symbol: 4.68

So using smbols is actually a bit faster than using strings. Why is that? It depends on the way HAML is implemented. I would need to hack a bit on HAML code to see, but if you keep using symbols in the concept of an identifier, your application will be faster and reliable. When questions strike, benchmark it and get your answers.

查看更多
时光不老,我们不散
5楼-- · 2019-01-04 06:40
  1. A Ruby symbol is an object with O(1) comparison

To compare two strings, we potentially need to look at every character. For two strings of length N, this will require N+1 comparisons (which computer scientists refer to as "O(N) time").

def string_comp str1, str2
  return false if str1.length != str2.length
  for i in 0...str1.length
    return false if str1[i] != str2[i]
  end
  return true
end
string_comp "foo", "foo"

But since every appearance of :foo refers to the same object, we can compare symbols by looking at object IDs. We can do this with a single comparison (which computer scientists refer to as "O(1) time").

def symbol_comp sym1, sym2
  sym1.object_id == sym2.object_id
end
symbol_comp :foo, :foo
  1. A Ruby symbol is a label in a free-form enumeration

In C++, we can use "enumerations" to represent families of related constants:

enum BugStatus { OPEN, CLOSED };
BugStatus original_status = OPEN;
BugStatus current_status  = CLOSED;

But because Ruby is a dynamic language, we don't worry about declaring a BugStatus type, or keeping track of the legal values. Instead, we represent the enumeration values as symbols:

original_status = :open
current_status  = :closed

3.A Ruby symbol is a constant, unique name

In Ruby, we can change the contents of a string:

"foo"[0] = ?b # "boo"

But we can't change the contents of a symbol:

:foo[0]  = ?b # Raises an error
  1. A Ruby symbol is the keyword for a keyword argument

When passing keyword arguments to a Ruby function, we specify the keywords using symbols:

# Build a URL for 'bug' using Rails.
url_for :controller => 'bug',
        :action => 'show',
        :id => bug.id
  1. A Ruby symbol is an excellent choice for a hash key

Typically, we'll use symbols to represent the keys of a hash table:

options = {}
options[:auto_save]     = true
options[:show_comments] = false
查看更多
戒情不戒烟
6楼-- · 2019-01-04 06:41
  • use symbols as hash key identifiers

    {key: "value"}

  • symbols allow you to call the method in a different order

     def write(file:, data:, mode: "ascii")
          # removed for brevity
     end
     write(data: 123, file: "test.txt")
  • freeze to keep as a string and save memory

    label = 'My Label'.freeze

查看更多
登录 后发表回答