How to use COUNT() to parse individual words in a

2019-08-27 01:23发布

问题:

So my query:

SELECT Tags, COUNT(Tags) AS Listings
FROM Job
WHERE datepart(year, dateposted)=2013
GROUP BY Tags
ORDER BY Listings DESC

Outputs:

+----------------------+----------+
|         Tags         | Listings |
+----------------------+----------+
| java c++             |       41 |
| software development |       41 |
| java c++ c#          |       31 |
|                      |       25 |
| sysadmin             |       25 |
| see jd               |       24 |
| java c++ ood         |       23 |
| java                 |       23 |
+----------------------+----------+

I want it to come out like so:

+----------------------+----------+
|         Tags         | Listings |
+----------------------+----------+
| java                 |       118|
| c++                  |       95 |
| ood                  |       23 |
| see                  |       24 |
| jd                   |       24 |
| software development |       41 |
| sysadmin             |       25 |
| c#                   |       31 |
+----------------------+----------+

How can I count each individual word in the field instead of the entire field? The tags column is nvarchar.

回答1:

First, your table structure is awful. Storing data in a list like that is going to cause you headaches similar to what you are trying to do right now.

The problem with a split function is you have no idea what software development or other multi-word tags are - Is that one word or two?

I think the only way you will solve this is by creating a table with your tags or using a derived table similar to the following:

;with cte (tag) as
(
  select 'java' union all
  select 'c++' union all
  select 'software development' union all
  select 'sysadmin' union all
  select 'ood' union all
  select 'jd' union all
  select 'see' union all
  select 'c#'
)
select c.tag, count(j.tags) listings
from cte c
inner join job j
  on j.tags like '%'+c.tag+'%'
group by c.tag

See SQL Fiddle with Demo. Using this you can get a result:

|                  TAG | LISTINGS |
|                 java |        9 |
|                  c++ |       10 |
| software development |        4 |
|             sysadmin |        2 |
|                  ood |        6 |
|                   jd |        3 |
|                  see |        2 |
|                   c# |        1 |

The issue with the above as was pointed out in the comments is how to decide if you have a tag software and development, those will match with the above query.

The best solution that you would have to this problem would be to store the tags in a separate table similar to:

create table tags
(
  tag_id int,
  tag_name varchar(50)
);

Then you could use a JOIN table to connect your jobs to the tag:

create table tag_job
(
  job_id int,
  tag_id int
);

Once you have a set up similar to this then it becomes much easier to query your data:

select t.tag_name,
  count(tj.tag_id) listings
from tags t
inner join tag_job tj
  on t.tag_id = tj.tag_id
group by t.tag_name

See demo



回答2:

You will probably need to split out the individual words.

Here's a good series on splitters in SQL Server: SqlServerCentral.com

I don't see how you will be able to differentiate "software development" as a single tag though. If you have a list of acceptable tags elsewhere, you could probably use that perform a count.

If you have a list of Available Tags, here is one approach that doesn't require a split.

Sql Fiddle Example

There could be an issue with this approach if you have a tag that is contained in another. I.e. 'software' and 'software development'



回答3:

This is how I solved my issue.

SELECT TOP 50 Tags.s Tag, COUNT(Tags.s) AS Listings
FROM Job
CROSS APPLY [dbo].[SplitString](Tags,' ') Tags
WHERE NOT Job.Tags IS NULL and datepart(year,job.datecreated) = 2013
GROUP BY Tags.s
ORDER BY Listings DESC