PostgreSQL 9.4 中使用 jsonb

时间 2019-12-07

标签 postgresql 9.4 使用 jsonb 栏目 Postgre SQL 繁體版

原文原文链接

转载翻译自 http://nandovieira.com/using-postgresql-and-jsonb-with-ruby-on-railsjavascript

PostgreSQL 9.4 引入了jsonb，一个新的列类型用于存储文档到你的关系数据库中。jsonb和json在更高的层面上看起来几乎是同样的，但在存储实现上是不一样的。html

使用jsonb的优点在于你能够轻易的整合关系型数据和非关系型数据，在性能方面，能够比大多数相似于MongoDB这样的非关系数据库更好java

理解json和jsonb之间的不一样

所以，两种列类型之间的区别是什么？当咱们比较写入数据速度时，因为数据存储的方式的缘由，jsonb会比json稍微的慢一点。python

json存储完整复制过来的文本输入，必须一遍又一遍的解析在你调用任何函数的时候。它不支持索引，但你能够为查询建立表达式索引。git
jsonb存储的二进制格式，避免了从新解析数据结构。它支持索引，这意味着你能够不使用指定的索引就能查询任何路径。github

其余的不一样包括，json列会每次都解析存储的值，这意味着键的顺序要和输入的时候同样。但jsonb不一样，以二进制格式存储且不保证键的顺序。所以，若是你有软件须要依赖键的顺序，jsonb可能不是你的应用的最佳选择。sql

让咱们运行一个简单的基准测试。在这个例子中，我使用下面这样一个json数据结构：数据库

json{
  "twitter": "johndoe1",
  "github": "johndoe1",
  "bio": "Lorem ipsum dolor sit amet, consectetur adipisicing elit. Labore impedit 
          aliquam sapiente dolore magni aliquid ipsa ad, enim, esse ut reprehenderit 
          quaerat deleniti fugit eaque. Vero eligendi voluptatibus atque, asperiores.",
  "blog": "http://johndoe1.example.com",
  "interests": [
    "music",
    "movies",
    "programming"
  ],
  "age": 42,
  "newsletter": true
}

插入30000条彻底同样的记录，我相信jsonb在插入复杂结构时会慢一些。express

Rehearsal ------------------------------------------------
insert jsonb   2.690000   0.590000   3.280000 ( >12.572343)
insert json    2.690000   0.590000   3.280000 ( 12.766534)
--------------------------------------- total: 6.560000sec

-----------------------------------------user     system      total        real
insert jsonb   2.680000   0.590000   3.270000 ( 13.206602)
insert json    2.650000   0.580000   3.230000 ( 12.577138)

真正的差距在查询json/jsonb列的时候。首先让咱们看看这张表和索引。json

sql
CREATE TABLE users (
  id serial not null,
  settings jsonb not null default '{}',
  preferences json not null default '{}'
);

CREATE INDEX settings_index ON users USING gin (settings);
CREATE INDEX twitter_settings_index ON users ((settings->>'github'));
CREATE INDEX preferences_index ON users ((preferences->>'github'));

注意咱们有一个GIN索引在settings列上，两个给出的路径（github）表达式索引。在30000条数据中搜索Github用户名为john30000的记录（最后一个插入的记录），会给出如下数字：

Rehearsal -----------------------------------------------------------------
read jsonb (index column)       0.030000   0.030000   0.060000 (  3.673465)
read jsonb (expression index)   0.010000   0.010000   0.020000 (  0.087105)
read json (expression index)    0.010000   0.020000   0.030000 (  0.080121)
read json (no index)            0.060000   0.030000   0.090000 (113.206747)
-------------------------------------------------------- total: 0.200000sec

-----------------------------------------user     system      total        real
read jsonb (index column)       0.010000   0.020000   0.030000 (  0.092476)
read jsonb (expression index)   0.010000   0.010000   0.020000 (  0.078916)
read json (expression index)    0.010000   0.010000   0.020000 (  0.081908)
read json (no index)            0.050000   0.040000   0.090000 (110.761944)

和你看到的那样，表达式索引在两种数据类型中的性能几乎彻底同样，因此它们在这里并无实际的意义。剩下的两列不一样的地方在于在查询列时有没有索引；jsonb能在整列创建GIN/GIST索引，而json不能创建这样的索引。这也是为何这json查询速度这么慢的缘由。

让咱们检查下在没有索引的状况下查询分析器查询数据。

sql
EXPLAIN
SELECT *
FROM users
WHERE settings @> '{"twitter": "john30000"}' LIMIT 1;

--                                      QUERY PLAN
-- -------------------------------------------------------------------------------------
--  Limit  (cost=28.23..31.96 rows=1 width=468)
--    ->  Bitmap Heap Scan on users  (cost=28.23..140.07 rows=30 width=468)
--          Recheck Cond: (settings @> '{"twitter": "john30000"}'::jsonb)
--          ->  Bitmap Index Scan on settings_index  (cost=0.00..28.23 rows=30 width=0)
--                Index Cond: (settings @> '{"twitter": "john30000"}'::jsonb)

EXPLAIN
SELECT *
FROM users
WHERE preferences->>'twitter' = 'john30000' LIMIT 1;

--                                QUERY PLAN
-- -------------------------------------------------------------------------
--  Limit  (cost=0.00..25.23 rows=1 width=468)
--    ->  Seq Scan on users  (cost=0.00..3784.00 rows=150 width=468)
--          Filter: ((preferences ->> 'twitter'::text) = 'john30000'::text)

最重要的是，json作的是顺序扫描，这意味着PostgreSQL将根据顺序一条一条往下找，直到找到符合条件的数据，同时记住查找这些数据时，每条记录中的JSON内容都会被解析，这将致使在复杂结构中查询速度变慢。

但这些不会发生jsonb列中，这种查找使用了索引，却并无像使用表达式索引那样将速度优化的很好。

jsonb有一个须要注意的点是，jsonb会一直顺序检索若是你使用->>操做符在一个没有表达式索引的路径上。

sql
EXPLAIN
SELECT *
FROM users
WHERE settings->>'twitter' = 'johndoe30000' LIMIT 1;

--                                QUERY PLAN
-- -------------------------------------------------------------------------
--  Limit  (cost=0.00..25.23 rows=1 width=468)
--    ->  Seq Scan on users  (cost=0.00..3784.00 rows=150 width=468)
--          Filter: ((settings ->> 'twitter'::text) = 'johndoe30000'::text)
-- (3 rows)

所以，在你不提早知道查询哪一个json数据中的键或者查询全部json路径的状况下，请确保你定义了GIN/GIST索引和使用@>（或者其余有利于索引的操做符）

json转化为jsonb

若是你已经使用了json格式或者text格式的列存储JSON数据，你能够将他们转化为jsonb，于是你能够依靠列索引。

sql
BEGIN;
ALTER TABLE users ADD COLUMN preferences_jsonb jsonb DEFAULT '{}';
UPDATE users set preferences_jsonb = preferences::jsonb;
ALTER TABLE users ALTER COLUMN preferences_jsonb SET NOT NULL;
ALTER TABLE users RENAME COLUMN preferences TO preferences_json;
ALTER TABLE users RENAME COLUMN preferences_jsonb TO preferences;

-- Don't remove the column until you're sure everything is working.
-- ALTER TABLE users DROP COLUMN preferences_json;

COMMIT;

如今你已经知道了json是如何工做的，让咱们看看在Ruby on Rails中是怎么使用的。

在Ruby on Rails中使用jsonb

Rails从4.2版本开始支持jsonb，使用他跟使用string或text类型的列同样简单，在下面的代码中，你将看到如何添加jsonb类型的列到已经存在的表中。

ruby
# db/migrate/*_create_users.rb
class CreateUsers < ActiveRecord::Migration
  def change
    enable_extension 'citext'

    create_table :users do |t|
      t.text :name, null: false
      t.citext :username, null: false
      t.jsonb :preferences, null: false, default: '{}'
    end

    add_index  :users, :preferences, using: :gin
  end
end

# db/migrate/*_add_jsonb_column_to_users.rb
class AddJsonbColumnToUsers < ActiveRecord::Migration
  def change
    add_column :users, :preferences, :jsonb, null: false, default: '{}'
    add_index  :users, :preferences, using: :gin
  end
end

注意，咱们已经定义了GIN类型的索引，若是你想对给出的路径建立表达式索引，你必须使用execute。在这个例子中，Rails不知道怎么使用ruby来转化这个索引，因此你最好选择将格式转为SQL。

ruby
# config/initializers/active_record.rb
Rails.application.config.active_record.schema_format = :sql

# db/migrate/*_add_index_to_preferences_path_on_users.rb
class AddIndexToPreferencesPathOnUsers < ActiveRecord::Migration
  def change
    execute <<-SQL
      CREATE INDEX user_prefs_newsletter_index ON users ((preferences->>'newsletter'))
    SQL
  end
end

你的模型不须要作任何配置。你只须要建立支持json序列化的记录来提供对象。

ruby
user = User.create!({
  name: 'John Doe',
  username: 'johndoe',
  preferences: {
    twitter: 'johndoe',
    github: 'johndoe',
    blog: 'http://example.com'
  }
})

# Reload record from database to enforce serialization.
user.reload

# Show preferences.
user.preferences
#=> {"blog"=>"http://example.com", "github"=>"johndoe", "twitter"=>"johndoe"}

# Get blog.
user.preferences['blog']
#=> http://example.com

能够看到全部的键都是以string形式返回。你也可使用通用的序列化方式，你就能够经过符号来访问JSON对象。

ruby
# app/models/user.rb
class User < ActiveRecord::Base
  serialize :preferences, HashSerializer
end

# app/serializers/hash_serializer.rb
class HashSerializer
  def self.dump(hash)
    hash.to_json
  end

  def self.load(hash)
    (hash || {}).with_indifferent_access
  end
end

另外一个比较有意思的是ActiveRecord特性就是store_accessor。若是你更改一些属性比较频繁，你能够建立accessor，这样你能够赋值给属性来代替JSON传值。这也使得数据验证和建立表单更加简单。所以，若是咱们建立一个表单来保存博客url、Github和Twitter帐户，你能够像下面这样使用：

ruby
class User < ActiveRecord::Base
  serialize :preferences, HashSerializer
  store_accessor :preferences, :blog, :github, :twitter
end

如今你能够简单的赋值给这些属性了。

ruby
user = User.new(blog: 'http://example.org', github: 'johndoe')

user.preferences
#=> {"blog"=>"http://example.org", "github"=>"johndoe"}

user.blog
#=> http://example.org

user.preferences[:github]
#=> johndoe

user.preferences['github']
#=> johndoe

定义了 store accessors 后，你能够像正常其余属性同样，定义数据验证和建立表单

查询jsonb列

如今是时候使用一些查询操做。关于PostgreSQL的更多操做，请阅读完整的文档列表

同时，记得使用注释你执行的查询语句；这有助于你更好的去作索引优化。

订阅新闻邮件的用户

ruby
# preferences->newsletter = true
User.where('preferences @> ?', {newsletter: true}.to_json)

对Ruby感兴趣的用户

ruby
# preferences->interests = ['ruby', 'javascript', 'python']
User.where("preferences -> 'interests' ? :language", language: 'ruby')

这个查询不会用到列索引；若是你想查询数组，请确保你建立了表达式索引。

ruby
CREATE INDEX preferences_interests_on_users ON users USING GIN ((preferences->'interests'))

设置了Twitter和Github帐号的用户

ruby
# preferences->twitter AND preferences->github
User.where('preferences ?& array[:keys]', keys: ['twitter', 'github'])

设置Twitter或Github帐号的用户

ruby
# preferences->twitter OR preferences->github
User.where('preferences ?| array[:keys]', keys: ['twitter', 'github'])

住在洛杉矶/加利福尼亚的用户

ruby
# preferences->state = 'SP' AND preferences->city = 'São Paulo'
User.where('preferences @> ?', {city: 'San Francisco', state: 'CA'}.to_json)

关于hstore

hstore列不容许嵌套的结构，它将全部的值以字符串形式存储，因此必需要在数据库层或者应用程序层将数据强制转化为字符串类型。而在json/jsonb类型的列上不会遇到这个问题，数值类型(integers/float)，布尔类型，数组，字符串和空类型均可以接受，甚至你想的任何方式的数据嵌套。

所以推荐你尽早放弃hstore而去使用jsonb，但要记住的是你必须使用PostgreSQL 9.4以上版本才行。

我之前写的hstore，想知道更多相关的内容就点击查看。

总结

PostgreSQL是一个很是强大的数据库，幸运的是ActiveRecord能跟上PostgreSQL的更新，为jsonb和hstore特性引入了内置支持。

而像表达式索引这样的支持也在不断的改善。将ActiveRecord的序列化改成SQL没什么大不了的，但却使的索引变得更加简单。

ruby
# This doesn't exist, but it would be nice to have it!
add_index :users, "(settings->>'github')", raw: true

在每个新版本中，使用Rails和PostgreSQL都比过去更加容易，变得更加出色。所以，尝试使用最新的Rails版本，付出老是会很快获得回报的。