So my query:
SELECT Tags, COUNT(Tags) AS Listings
FROM Job
WHERE datepart(year, dateposted)=2013
GROUP BY Tags
ORDER BY Listings DESC
Outputs:
+----------------------+----------+
| Tags | Listings |
+----------------------+----------+
| java c++ | 41 |
| software development | 41 |
| java c++ c# | 31 |
| | 25 |
| sysadmin | 25 |
| see jd | 24 |
| java c++ ood | 23 |
| java | 23 |
+----------------------+----------+
I want it to come out like so:
+----------------------+----------+
| Tags | Listings |
+----------------------+----------+
| java | 118|
| c++ | 95 |
| ood | 23 |
| see | 24 |
| jd | 24 |
| software development | 41 |
| sysadmin | 25 |
| c# | 31 |
+----------------------+----------+
How can I count each individual word in the field instead of the entire field? The tags column is nvarchar.
First, your table structure is awful. Storing data in a list like that is going to cause you headaches similar to what you are trying to do right now.
The problem with a split function is you have no idea what software development
or other multi-word tags are - Is that one word or two?
I think the only way you will solve this is by creating a table with your tags or using a derived table similar to the following:
;with cte (tag) as
(
select 'java' union all
select 'c++' union all
select 'software development' union all
select 'sysadmin' union all
select 'ood' union all
select 'jd' union all
select 'see' union all
select 'c#'
)
select c.tag, count(j.tags) listings
from cte c
inner join job j
on j.tags like '%'+c.tag+'%'
group by c.tag
See SQL Fiddle with Demo. Using this you can get a result:
| TAG | LISTINGS |
| java | 9 |
| c++ | 10 |
| software development | 4 |
| sysadmin | 2 |
| ood | 6 |
| jd | 3 |
| see | 2 |
| c# | 1 |
The issue with the above as was pointed out in the comments is how to decide if you have a tag software
and development
, those will match with the above query.
The best solution that you would have to this problem would be to store the tags in a separate table similar to:
create table tags
(
tag_id int,
tag_name varchar(50)
);
Then you could use a JOIN table to connect your jobs to the tag:
create table tag_job
(
job_id int,
tag_id int
);
Once you have a set up similar to this then it becomes much easier to query your data:
select t.tag_name,
count(tj.tag_id) listings
from tags t
inner join tag_job tj
on t.tag_id = tj.tag_id
group by t.tag_name
See demo
You will probably need to split out the individual words.
Here's a good series on splitters in SQL Server:
SqlServerCentral.com
I don't see how you will be able to differentiate "software development" as a single tag though. If you have a list of acceptable tags elsewhere, you could probably use that perform a count.
If you have a list of Available Tags, here is one approach that doesn't require a split.
Sql Fiddle Example
There could be an issue with this approach if you have a tag that is contained in another. I.e. 'software' and 'software development'
This is how I solved my issue.
SELECT TOP 50 Tags.s Tag, COUNT(Tags.s) AS Listings
FROM Job
CROSS APPLY [dbo].[SplitString](Tags,' ') Tags
WHERE NOT Job.Tags IS NULL and datepart(year,job.datecreated) = 2013
GROUP BY Tags.s
ORDER BY Listings DESC